fix: [DO NOT MERGE] Iceberg scan duplicates rows when splitting a single-row-group Parquet file into multiple byte-range tasks by mbutrovich · Pull Request #4621 · apache/datafusion-comet

mbutrovich · 2026-06-10T18:16:53Z

Which issue does this PR close?

Closes #4590.

Rationale for this change

The native Iceberg scan returns duplicate rows when an Iceberg FileScanTask splits a data file into byte ranges smaller than a row group. Spark/Iceberg planning tiles a file by split-size regardless of row-group layout, so a single row group can be covered by several FileScanTasks. The root cause is in iceberg-rust's ArrowReader: it selected a row group for every task byte range that overlapped it, so each overlapping split read the row group and its rows were duplicated.

Comet only forwards the start/length Spark planned; serialization and the native round-trip are correct. The fix belongs in iceberg-rust, which now assigns each row group to the single split whose range contains the row group's midpoint (matching parquet-java semantics). See apache/iceberg-rust#2614 and the fix in apache/iceberg-rust#2615.

The Parquet path (native_datafusion) is unaffected: arrow-rs/DataFusion already use midpoint ownership, confirmed by the regression test added here.

What changes are included in this PR?

Bump the iceberg / iceberg-storage-opendal dependency to a rev that includes the row-group midpoint fix.
Adapt IcebergScanExec to the updated iceberg-rust API: ArrowReaderBuilder::new now takes an iceberg::Runtime. execute() builds the reader on the Spark executor thread (outside any tokio context), so the runtime is constructed from Comet's shared global runtime via IcebergRuntime::new(get_runtime()) rather than Runtime::current().
Add regression tests for both scan paths.

Note

Draft: native/Cargo.toml currently points at mbutrovich/iceberg-rust branch filescantask_midpoint so CI can exercise apache/iceberg-rust#2615 before it merges. This will be repointed to apache/iceberg-rust at the merged SHA before this PR leaves draft.

How are these changes tested?

Two new tests, both verified to fail before the fix and pass after:

CometIcebergNativeSuite "native Iceberg scan does not duplicate a row group split by byte range" — writes a single-row-group file, registers it as an Iceberg table, reads it with split-size/file-open-cost of 64 to force sub-row-group splits, and asserts the matching row is returned exactly once. Returned 2 rows before the fix.
CometNativeReaderSuite "native scan does not duplicate a row group split by byte range" — the same scenario on the native_datafusion Parquet path, guarding against regression. Asserts the file is actually split into multiple partitions and still returns each row once.

test apache/iceberg-rust#2615

0a5c58a

mbutrovich mentioned this pull request Jun 10, 2026

fix(reader): filter_row_groups_by_byte_range duplicates rows for sub-row-group file splits apache/iceberg-rust#2615

Open

Merge branch 'main' into iceberg_midpoint_fix

ae160b2

mbutrovich added this to the 0.17.0 milestone Jun 10, 2026

clean up opendal feature selection.

32ca9a9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: [DO NOT MERGE] Iceberg scan duplicates rows when splitting a single-row-group Parquet file into multiple byte-range tasks#4621

fix: [DO NOT MERGE] Iceberg scan duplicates rows when splitting a single-row-group Parquet file into multiple byte-range tasks#4621
mbutrovich wants to merge 3 commits into
apache:mainfrom
mbutrovich:iceberg_midpoint_fix

mbutrovich commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mbutrovich commented Jun 10, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant