perf: replace CometBatchIterator FFI input path with the Arrow C Stream Interface by mbutrovich · Pull Request #4572 · apache/datafusion-comet

mbutrovich · 2026-06-02T19:47:08Z

Which issue does this PR close?

Closes #3770.

Note: peeled off the draft experimental PR #4393 (which is not intended to merge), and for now also carries the commits from #4507 (native shuffle optimizations), so the diff is temporarily inflated and will shrink once #4507 merges. The description below covers only this PR's own scope (the Arrow C Stream Interface input path), not the #4507 native shuffle work.

Rationale for this change

The JVM-to-native input path used a bespoke CometBatchIterator plus a per-batch FFI deep copy, guarded by an arrow_ffi_safe flag, because the JVM could reuse or mutate a batch's buffers after handing it off. Every batch crossing the boundary was copied.
The Arrow C Stream Interface is the canonical, zero-copy way to hand an Arrow stream across FFI with proper ownership transfer, so both the deep copy and the flag become unnecessary.

What changes are included in this PR?

JVM exports each per-partition Iterator[ColumnarBatch] as an org.apache.arrow.c.ArrowArrayStream (Data.exportArrayStream); native takes ownership via from_raw. CometBatchIterator.java and the arrow_ffi_safe proto field/plumbing are removed.
CometExecIterator / CometExecRDD now pass an Array[Object] of already-exported ArrowArrayStream (or CometShuffleBlockIterator) slots instead of CometBatchIterator.
New ArrowReader implementations bridging Spark data to Arrow: RowArrowReader (InternalRow), SparkColumnarArrowReader (non-Arrow Spark ColumnarBatch), ColumnarBatchArrowReader (Arrow-backed ColumnarBatch, with VSR ownership transfer).
New CometNativeArrowSource trait: an operator supplies one per-partition reader and gets both the JVM columnar path (doExecuteColumnar) and the native C Stream path (doExecuteAsArrowStream). Implemented by CometLocalTableScanExec and CometSparkToColumnarExec.
Native AlignedArrowStreamReader wraps arrow-rs's stream reader to align buffers per imported batch (the JVM exports 8-byte-aligned buffers, which trip arrow-rs's alignment assertion). This is a temporary workaround: upstream Call align_buffers() in from_ffi, remove redundant call from arrow-pyarrow arrow-rs#10030 fixes it and ships in arrow 59.0.0, after which this reader can be dropped. scan.rs drops the per-batch deep copy.
reconcileStreamSchema advertises the truthful first-batch Arrow schema (not the consumer's declared types) so native ScanExec's boundary cast fires; logs one deduped warning per type drift (e.g. width_bucket return-type drift).
Unrelated to the Arrow C Stream work but too entangled to peel off cleanly from feat: enable CometLocalTableScanExec by default #4393: CometLocalTableScanExec now mixes in DataTypeSupport and runs a schema-level fallback in convert, so a LocalTableScanExec whose schema carries a type with no ArrowWriter coverage (Spark 4.1 TimeType, intervals, etc.) falls back to Spark instead of failing at the boundary. NullType is allow-listed since ArrowWriter handles it.

How are these changes tested?

Existing suites exercise the input path end to end (CometExecSuite, CometShuffleSuite, ParquetReadSuite, the fuzz suites).
New CometArrowStreamSuite covering stream export and schema reconciliation, added to the Linux and macOS PR build workflows.
New CometExecSuite cases for CometLocalTableScanExec: a TimeType schema-level fallback check (Spark 4.1+), plus two Arrow-buffer leak checks (project consumer and collect_list) that fail via the allocator leak detector if the per-batch buffers leak.

…nonical Arrow C Stream Interface (JVM Data.exportArrayStream <-> native ArrowArrayStreamReader), eliminating the per-batch FFI deep copy and the arrow_ffi_safe flag.

…le_localtablescan

# Conflicts: # spark/src/main/scala/org/apache/spark/sql/comet/operators.scala

# Conflicts: # spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala # spark/src/test/scala/org/apache/comet/exec/CometNativeShuffleSuite.scala

… on the Scala side.

mbutrovich · 2026-06-05T00:05:59Z

Ran TPC-H SF1000 with Comet on Spark 3.5.8:

Query	main (s)	#4507 (s)	#4572 (s)
TPCH-01	10.495	10.151	10.722
TPCH-02	23.455	22.177	21.629
TPCH-03	12.366	14.064	10.347
TPCH-04	14.067	13.360	11.927
TPCH-05	30.023	29.119	27.642
TPCH-06	1.128	0.710	0.942
TPCH-07	16.368	15.073	13.730
TPCH-08	36.566	35.680	33.435
TPCH-09	44.722	41.845	40.223
TPCH-10	29.848	29.106	26.074
TPCH-11	15.373	14.837	15.752
TPCH-12	7.919	7.486	6.638
TPCH-13	10.194	10.103	9.761
TPCH-14	2.631	2.672	2.389
TPCH-15	11.613	12.524	12.202
TPCH-16	11.207	9.778	8.745
TPCH-17	31.002	28.755	29.156
TPCH-18	62.856	56.601	56.502
TPCH-19	9.253	9.319	8.556
TPCH-20	9.015	8.610	7.722
TPCH-21	74.530	69.337	71.159
TPCH-22	10.640	9.574	9.813
Total	475.272	450.879	435.066

Looks like it further improves on #4507.

# Conflicts: # spark/src/main/scala/org/apache/spark/sql/comet/CometExecRDD.scala # spark/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/CometNativeShuffleInputRDD.scala # spark/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/CometNativeShuffleWriter.scala # spark/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/CometShuffleExchangeExec.scala # spark/src/main/scala/org/apache/spark/sql/comet/operators.scala

mbutrovich and others added 30 commits May 21, 2026 16:44

enable CometLocalTableScanExec by default

3763370

Merge branch 'main' into enable_localtablescan

6830b09

add NullType to toArrowType

810e5d5

add NullType to shuffles

174c939

fix windowexec test and nulltype. fix timetype issues

3790c10

Fix TimeType test.

18cd14b

fix null value type in map in native shuffle

fc40d59

Merge branch 'main' into enable_localtablescan

92cc260

Merge branch 'main' into enable_localtablescan

cf0c1df

avoid reuse in LocalTableScanExec

8c088a7

Merge branch 'main' into enable_localtablescan

0cdfafe

Replace Comet's bespoke CometBatchIterator JNI input path with the ca…

bd04fb4

…nonical Arrow C Stream Interface (JVM Data.exportArrayStream <-> native ArrowArrayStreamReader), eliminating the per-batch FFI deep copy and the arrow_ffi_safe flag.

Merge branch 'main' into enable_localtablescan

2742b49

Merge branch 'main' into enable_localtablescan

04597c0

Unpack dictionaries.

b6db996

Merge branch 'main' into enable_localtablescan

5ca923f

Fix shading issue.

cf7bb6e

Merge remote-tracking branch 'origin/enable_localtablescan' into enab…

2560e6f

…le_localtablescan

Try again to fix shading issue.

82c9a1b

Fix alignment issue for FFI Decimal128 with ArrowArrayStreamReader

6adf124

Merge branch 'refs/heads/main' into enable_localtablescan

3da08dc

Fix schema mismatch in CometArrowStream.

0e08018

Fix nullability mismatch in CometArrowStreamSuite.

a5046e3

Fix format.

8f1c35a

Passes CometFuzzTestSuite, CometNativeShuffleSuite, CometExecSuite.

5c41215

Passes CometFuzzTestSuite, CometNativeShuffleSuite, CometExecSuite.

8c99fc5

Cleanup, update docs.

443a1c7

remove non-ascii

07e7944

handle arrow type mismatches on child stream in native shuffle.

cc7c5be

stash

c76a263

mbutrovich and others added 17 commits May 29, 2026 18:54

Undo stricter tests since they're not happy on Spark 3.x.

0b71de4

Merge branch 'main' into opt_native_shuffle

a6744eb

Remove unintended change.

1ecfd8a

Merge remote-tracking branch 'apache/main' into opt_native_shuffle

51d4d42

# Conflicts: # spark/src/main/scala/org/apache/spark/sql/comet/operators.scala

Merge branch 'main' into opt_native_shuffle

4dba7ea

Merge branch 'main' into enable_localtablescan

14633f7

# Conflicts: # spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala # spark/src/test/scala/org/apache/comet/exec/CometNativeShuffleSuite.scala

add CometArrowStreamSuite to CI workflows

6afbdce

fix withInfo use

deb697a

Merge branch 'main' into enable_localtablescan

51a3a0d

Merge branch 'main' into opt_native_shuffle

b43b5da

Merge branch 'main' into enable_localtablescan

f7fd3bd

Merge branch 'main' into opt_native_shuffle

bfbae18

1f5b757

Don't enable LocalTableScan by default, cruft from apache#4393.

6f913f9

cleanup

d162399

cleanup

9f32157

cleanup

bf63e3d

mbutrovich added this to the 0.17.0 milestone Jun 2, 2026

mbutrovich self-assigned this Jun 2, 2026

mbutrovich and others added 6 commits June 2, 2026 16:16

Address PR feedback from apache#4507.

04c0825

Remove inadvertent test change brought over from apache#4393.

6aa6621

Merge branch 'main' into arrow-stream-reader

c5d7f3b

Merge branch 'main' into arrow-stream-reader

95ae067

Merge branch 'main' into arrow-stream-reader

249a5e9

Fix batch size calculation in native shuffle writer, and task metrics…

b4cb826

… on the Scala side.

mbutrovich and others added 4 commits June 5, 2026 09:32

Move SchemaAlignExec under shuffle.

e1a9491

Merge branch 'main' into arrow-stream-reader

9a64c3f

Merge branch 'main' into arrow-stream-reader

793bbd5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: replace CometBatchIterator FFI input path with the Arrow C Stream Interface#4572

perf: replace CometBatchIterator FFI input path with the Arrow C Stream Interface#4572
mbutrovich wants to merge 65 commits into
apache:mainfrom
mbutrovich:arrow-stream-reader

mbutrovich commented Jun 2, 2026 •

edited

Loading

Uh oh!

mbutrovich commented Jun 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mbutrovich commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

mbutrovich commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mbutrovich commented Jun 2, 2026 •

edited

Loading

mbutrovich commented Jun 5, 2026 •

edited

Loading