perf(cubestore): early compaction split#11079
Conversation
…multiplier, size-aware split - Import: cap a CSV data frame by estimated Arrow buffer size in addition to row count (CUBESTORE_WAL_SPLIT_SIZE_THRESHOLD, default 100MB) so wide rows bound peak ingest memory instead of accumulating 524K-row frames. - Compaction: CUBESTORE_COMPACTION_CHUNKS_THRESHOLD_MULTIPLIER (f64, default 3.0) scales the persisted-chunk count and total-size thresholds, letting more chunks accumulate and merge in one streaming k-way pass. - Compaction split now sizes children by the bytes actually written (existing main table + pending chunk bytes) so a partition splits by file size in a single pass instead of under-splitting and re-splitting. Ignores 4 tests whose partition-count assertions assume the old lagged split-by-file-size semantics (FIXME(early-split)).
|
Claude finished @waralexrom's task in 1m 7s —— View job Review: perf(cubestore): early compaction split1 high, 2 medium, 2 low Full review (click to expand)
SummaryThe core change has three parts:
The shape is reasonable and the eager-split-by-bytes logic in High — three tests left
|
281cc13 to
11eb290
Compare
|
Claude finished @waralexrom's task in 1m 32s —— View job Claude Code ReviewVerdict: Approach is sound (size-aware import split + eager compaction split by total bytes), but landing it disables four behavioral tests and silently rescales two existing env vars. Both should be addressed before merge. 1 high, 1 medium, 2 low. DetailsTasks
High
Medium
Low
Nits / observations
|
| } | ||
|
|
||
| fn compaction_chunks_total_size_threshold(&self) -> u64 { | ||
| self.compaction_chunks_total_size_threshold | ||
| (self.compaction_chunks_total_size_threshold as f64 | ||
| * self.compaction_chunks_threshold_multiplier) | ||
| .floor() as u64 | ||
| } | ||
|
|
||
| fn compaction_chunks_count_threshold(&self) -> u64 { | ||
| self.compaction_chunks_count_threshold | ||
| (self.compaction_chunks_count_threshold as f64 | ||
| * self.compaction_chunks_threshold_multiplier) |
There was a problem hiding this comment.
Multiplier silently rescales two pre-existing env vars.
The multiplier (default 3.0) is applied inside the accessors for both compaction_chunks_total_size_threshold() and compaction_chunks_count_threshold(). That means every existing deployment that has tuned CUBESTORE_CHUNKS_TOTAL_SIZE_THRESHOLD or CUBESTORE_CHUNKS_COUNT_THRESHOLD will now see those values multiplied by 3 with no opt-in — a 4-chunk count threshold becomes 12, an 8MB size threshold becomes 24MB. That's a meaningful behavioral change for an upgrade.
A few options worth considering:
- Apply the multiplier only at the call sites that need the early-split widening, leaving the trait getters returning the configured value verbatim.
- Default the multiplier to
1.0and let operators opt in. - At minimum, document this in the changelog / config docs so the rescale isn't a surprise.
Also worth a comment near the field explaining what the multiplier is widening and why — the name reads like a simple knob, but it changes the contract of two other knobs.
| let line = line?; | ||
| let is_data_row = parser.visit_line(line.as_str(), |insert_pos, column, value| { | ||
| let builder = builders[insert_pos].as_mut(); | ||
| estimated_bytes += | ||
| ImportFormat::estimate_arrow_value_size(column.get_column_type(), value); | ||
| match value { | ||
| None => { | ||
| append_value(builder, column.get_column_type(), &TableValue::Null); |
There was a problem hiding this comment.
Minor: estimated_bytes accumulates on non-data rows.
The visit_line closure increments estimated_bytes unconditionally, but num_rows += 1 is gated on is_data_row. If headers / non-data lines ever invoke the per-column callback, their bytes silently leak into the running total and never get reset on continue. In practice the leak is tiny relative to a 100MB threshold so this is cosmetic, but it would be cleaner to either accumulate into a per-line scratch and commit it only when is_data_row, or reset both counters in the same place.
| let new_partitions_count_by_file_size = { | ||
| let total_file_size = | ||
| partition.get_row().file_size().unwrap_or(0) + chunks_total_file_size; | ||
| if total_file_size > 0 { | ||
| let threshold = self.config.partition_size_split_threshold_bytes(); | ||
| (div_ceil(partition_file_size, threshold) as usize).min(16) | ||
| (div_ceil(total_file_size, threshold) as usize).min(16) | ||
| } else { | ||
| 1 | ||
| }; | ||
| } | ||
| }; |
There was a problem hiding this comment.
Note that chunks_total_file_size here is capped by the in-memory size threshold break above (line 537) — if pending chunks exceed that threshold, only the chunks taken into this compaction pass are counted, not the full pending set. That's likely the intended behavior (we split by what we're actually writing), and matches the comment, but it does mean a partition with a huge backlog of pending chunks will still under-split on any single pass relative to the eventual total. Worth a one-line note acknowledging this, since the comment currently implies the split reflects "the bytes actually being written" without flagging the cap.
…t budget wait Greppable by "csv-import-timing": logs the moment a CREATE TABLE request is received, and the CSV import job start both before and after the rate-limiter budget wait (with wait_ms), so the budget-wait gap is readable without trace.
|
Claude finished @waralexrom's task in 50s —— View job Review: latest push (info-level timing logs)Verdict: New commit is a small diagnostic-logging-only change. No new substantive findings. The four prior unresolved review threads still apply to the current diff. DetailsWhat's new since the last reviewCommit
The substantive logic of the PR (size-aware split, threshold multiplier, WAL byte cap) is unchanged. Observations on the logging
Prior threads still applicable (not re-posting)All four open Claude threads remain unresolved and the underlying code is unchanged:
Re-affirming the four prior threads rather than posting duplicates. · · Branch |
… logs Inline DefaultHasher loc_hash on every csv-import-timing line (create_table loc_hashes, import job before/after budget wait) plus job completed/failed lines, so a location ties create_table to its import job lifecycle.
|
Claude finished @waralexrom's task in 43s —— View job Review: latest push (
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## cubestore-compaction-improvement #11079 +/- ##
=================================================================
Coverage 58.50% 58.50%
=================================================================
Files 216 216
Lines 17269 17269
Branches 3524 3524
=================================================================
Hits 10103 10103
Misses 6651 6651
Partials 515 515
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
…it tests Removes the temporary [csv-import-timing] info logs and re-enables the four tests ignored after the size-aware split change: - partition_compaction_decimal96, inactive_partitions_cleanup: pin partition_size_split_threshold_bytes high so the split stays row-based (these cover decimal handling / inactive-file GC, not size split). - over_10k_join: raise max_joined_partitions above the now-larger partition count (the test asserts join correctness, not the cap). - partition_split_by_file_size: assert the first compaction already splits (eager split-by-file-size), keeping the per-partition size invariant.
|
Claude finished @waralexrom's task in 51s —— View job Review: latest push (
|
Check List