fix: Native_datafusion reports correct files and bytes scanned by 0lai0 · Pull Request #3798 · apache/datafusion-comet

0lai0 · 2026-03-26T09:18:32Z

Which issue does this PR close?

Rationale for this change

In CometScanExec, calling getFilePartitions() unconditionally executes sendDriverMetrics(). Because getFilePartitions() can be evaluated multiple times during planning (e.g., converting to CometNativeScanExec) and execution (e.g., fetching partitions), the SQLMetric accumulators like numFiles and filesSize were being duplicated. This led to incorrect double-counted values rendering in the Spark UI.

What changes are included in this PR?

Replaced metrics(...).add() with metrics(...).set() in CometScanExec to ensure idempotency when reporting metrics.
Wrapped the driver metric updates and Spark listener event dispatching inside a lazy val. This prevents both double-counting during Catalyst transformations (makeCopy) and sending redundant UI events.

How are these changes tested?

Added a dedicated end-to-end unit test in CometExecSuite.
The test writes a dummy Parquet dataset, sequentially triggers multiple UI actions (count and collect) to force severe plan evaluations, and strictly asserts that numFiles is exactly 2 without any duplication.

andygrove

LGTM. Thanks @0lai0.

mbutrovich

So basically it's an artifact of wrapping CometNativeScan in CometScan, which we hopefully won't do in the future anyway.

Thanks for the fix in the meantime, @0lai0!

comphead

Thanks @0lai0 I'll quickly check it out today

comphead

Somehow on UI I can now see 0

number of files read: 0
size of files read: 0.0 B

comphead · 2026-03-27T00:11:00Z

spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala

+      spark.range(100).repartition(2).write.mode("overwrite").parquet(path)
+
+      withSQLConf(
+        CometConf.COMET_ENABLED.key -> "true",


Please include --conf spark.comet.scan.impl=native_datafusion

0lai0 · 2026-03-27T07:17:07Z

Thank you all for the feedback. I’ll investigate this matter and fix it.

Native_datafusion reports correct files and bytes scanned

573872a

andygrove approved these changes Mar 26, 2026

View reviewed changes

andygrove requested a review from comphead March 26, 2026 13:07

mbutrovich reviewed Mar 26, 2026

View reviewed changes

comphead reviewed Mar 26, 2026

View reviewed changes

comphead reviewed Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Native_datafusion reports correct files and bytes scanned#3798

fix: Native_datafusion reports correct files and bytes scanned#3798
0lai0 wants to merge 1 commit intoapache:mainfrom
0lai0:reports_scanned_twice

0lai0 commented Mar 26, 2026

Uh oh!

andygrove left a comment

Uh oh!

mbutrovich left a comment •

edited

Loading

Uh oh!

comphead left a comment

Uh oh!

comphead left a comment •

edited

Loading

Uh oh!

comphead Mar 27, 2026

Uh oh!

0lai0 commented Mar 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

0lai0 commented Mar 26, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

mbutrovich left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

comphead left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

comphead Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

0lai0 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mbutrovich left a comment •

edited

Loading

comphead left a comment •

edited

Loading

0lai0 commented Mar 27, 2026 •

edited

Loading