Skip to content

feat(data): coalesce position deletes into range inserts#645

Open
Baunsgaard wants to merge 1 commit intoapache:mainfrom
Baunsgaard:coalesce-position-deletes-loader
Open

feat(data): coalesce position deletes into range inserts#645
Baunsgaard wants to merge 1 commit intoapache:mainfrom
Baunsgaard:coalesce-position-deletes-loader

Conversation

@Baunsgaard
Copy link
Copy Markdown
Contributor

Add ForEachPositionDelete (the C++ equivalent of Java's PositionDeleteRangeConsumer) and route DeleteLoader through it, replacing the per-position PositionDeleteIndex::Delete(pos) call. The function sniffs a 1024-position prefix and dispatches to either run coalescing (CRoaring addRange) or bulk addMany grouped by high-32-bit key.

Also rework DeleteLoader::LoadPositionDelete to read Arrow batches via nanoarrow's ArrowArrayView directly. When the delete file's referenced_data_file matches the target (V2 writer hint), positions are passed as a zero-copy span; otherwise a per-batch staging vector filters by path.

Local microbenchmarks: 2.2x-10.6x for ForEachPositionDelete and 2.1x-2.5x end-to-end through LoadPositionDeletes. Equivalent of apache/iceberg#16052.

Add ForEachPositionDelete (the C++ equivalent of Java's
PositionDeleteRangeConsumer) and route DeleteLoader through it,
replacing the per-position PositionDeleteIndex::Delete(pos) call. The
function sniffs a 1024-position prefix and dispatches to either run
coalescing (CRoaring addRange) or bulk addMany grouped by
high-32-bit key.

Also rework DeleteLoader::LoadPositionDelete to read Arrow batches via
nanoarrow's ArrowArrayView directly. When the delete file's
referenced_data_file matches the target (V2 writer hint), positions
are passed as a zero-copy span; otherwise a per-batch staging vector
filters by path.

Local microbenchmarks: 2.2x-10.6x for ForEachPositionDelete and
2.1x-2.5x end-to-end through LoadPositionDeletes. Equivalent of
apache/iceberg#16052.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant