feat(table/write): merge partial-update rows at flush#380
Open
TheR1sing3un wants to merge 3 commits into
Open
Conversation
The partial-update writer kept every row of a key in the flushed file, deferring all merging to the read side. Java's MergeTreeWriter runs the merge function over the write buffer before flushing, so a data file never holds two rows of one key — an invariant split planning and statistics can rely on (a file's physical row count equals its logical row count). Mirror that: at flush, group sorted rows by primary key and emit one row per group with the same semantics as the read-side PartialUpdateMergeFunction — every column keeps its latest non-null value ordered by (sequence fields, system sequence), the merged row carries the group's highest sequence number, and retract rows are rejected. Changelog files (input producer) still record the pre-merge rows, matching Java's rawConsumer. Cross-commit merging is unchanged: files from different commits still overlap on key range and go through the sort-merge reader.
…tics Java reuses one MergeFunction across write flush, compaction, and reads, giving engine semantics a single source of truth. The Rust write side is a vectorized re-implementation of the read-side streaming merge, so feed the same key groups through merge_partial_update_rows and the read-side PartialUpdateMergeFunction and assert identical output, preventing the two implementations from drifting.
…on-flush # Conflicts: # crates/paimon/src/table/kv_file_writer.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
The partial-update writer kept every row of a key in the flushed data file, deferring all merging to the read side. Java's
MergeTreeWriter#flushWriteBufferruns the merge function over the write buffer before flushing, so a data file never holds two rows of one key — an invariant that split planning and statistics rely on (a file's physical row count equals its logical row count).The missing invariant surfaced in #374 (comment): a single-file partial-update split marked raw convertible reported its physical row count as exact, inflating
COUNT(*)through exact scan statistics and starving LIMIT pushdown. PR #374 works around it by keeping partial-update splits non-raw-convertible; this PR fixes the root cause so that gating can be relaxed in a follow-up.Brief change log
KeyValueFileWriter::flushmerges each key group down to one row formerge-engine=partial-update, mirroring JavaMergeTreeWriter#flushWriteBufferwith the same semantics as the read-sidePartialUpdateMergeFunction:changelog-producer=input) still record the pre-merge rows, matching Java'srawConsumer.Tests
test_merge_partial_update_rows_latest_non_null_per_column: per-column latest-non-null across a key group, all-null column stays null, merged_SEQUENCE_NUMBERis the group maxtest_merge_partial_update_rows_rejects_retract: DELETE rows error at flush like the read sidetest_pk_partial_update_merges_within_single_commit: three partial updates of one key in one INSERT produce a single physical row — SELECT and COUNT(*) agreetest_flush_merge_matches_read_side_partial_update_merge: feeds the same key groups through the flush-time merge and the read-sidePartialUpdateMergeFunction, asserting identical output — locks the two implementations together (Java reuses oneMergeFunctionfor write flush, compaction, and reads; this test provides the equivalent guarantee for the vectorized write-side implementation)API and Format
No API change. Data files written for partial-update tables now contain one row per key per flush (same physical schema); files written by older versions keep working — the reader still sort-merges every split.
Documentation
No documentation change needed.