feat(compaction): introduce RowAddrRemap structure to avoid remap OOM caused by HashMap#7237
Open
zhangyue19921010 wants to merge 2 commits into
Open
feat(compaction): introduce RowAddrRemap structure to avoid remap OOM caused by HashMap#7237zhangyue19921010 wants to merge 2 commits into
zhangyue19921010 wants to merge 2 commits into
Conversation
… caused by HashMap
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes: #7150
Compact Row-Address Remapping for Compaction
Compaction rewrites rows into new fragments, so indices that store physical row addresses need an old-address to new-address mapping without building an
O(total rows)HashMap<u64, Option<u64>>.Layout
Old Rows
old_fragment_id -> (old_offsets, old_rows_before)old_offsets: rewritten old row offsets in this old fragment.old_rows_before: rewritten row count before this old fragment.New Rows
Ordered new-fragment ranges:
(fragment_id, new_rows_before, physical_rows)new_rows_before: rewritten row count before this new fragment.Lookup
An address whose fragment was not rewritten returns
None.For an address whose fragment was rewritten:
Read
(old_offsets, old_rows_before)from the old-row layout.If
offsetis not inold_offsets, returnSome(None)because the row was deleted.Otherwise,
old_offsets.rank(offset) - 1is this row's 0-based position among rewritten old rows in this old fragment.Add
old_rows_beforeto getk, the row's 0-based position among all rewritten old rows.In the new-row layout, find the range:
where:
The new address is:
Ordering
Compact remap does not store each old-to-new row mapping. It computes
kfrom the old-row layout, then maps it to the k-th row written to the new fragments.This requires the reader-to-writer pipeline to preserve row order.
old_frag_idsmust match the order old fragments are read.new_fragsmust match the order new rows are written.Current compaction satisfies this because it scans selected fragments in order and writes the resulting stream without reordering rows.