[AURON #2175] Add native support for the _file metadata column by weimingdiit · Pull Request #2184 · apache/auron

weimingdiit · 2026-04-08T08:46:32Z

Which issue does this PR close?

Rationale for this change

This PR adds native support for Iceberg metadata columns in Auron, starting with _file.

Previously, Iceberg scans fell back whenever metadata columns were projected. With this change, queries that read _file can remain on the native Iceberg scan path.
Iceberg metadata columns are useful in real workloads for debugging, lineage, and inspection queries. However, Auron previously treated metadata columns as unsupported and fell back to Spark.

This PR improves native Iceberg scan coverage by supporting metadata columns that can be represented as file-level constant values, while still falling back for unsupported row-level metadata columns.

What changes are included in this PR?

This PR:

adds native support for the Iceberg _file metadata column
keeps unsupported metadata columns such as _pos on the fallback path
extends IcebergScanPlan to distinguish between:
- file-backed data columns
- metadata columns materialized outside the file payload
updates IcebergScanSupport to stop rejecting all metadata columns unconditionally
passes supported metadata values through the native Iceberg scan path as per-file constant values
updates NativeIcebergTableScanExec to project both normal data columns and supported metadata columns
adds integration tests in AuronIcebergIntegrationSuite

Scope of support in this PR

This PR intentionally takes a conservative approach.

Supported in native scan:

_file

Still falls back:

_pos
other unsupported metadata columns that require row-level metadata handling

Why this design?

_file is a file-level metadata column: every row coming from the same file shares the same value. That makes it a good fit for the existing native file-scan path by treating it as a per-file constant column.

In contrast, _pos is row-level metadata and cannot be represented correctly with the same mechanism, so it remains unsupported in native execution for now.

How was this patch tested?

CI.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds native execution support for the Iceberg _file metadata column by treating it as a per-file constant value, allowing queries projecting _file to remain on the native Iceberg scan path (while still falling back for row-level metadata like _pos).

Changes:

Extends IcebergScanPlan to split projected columns into file-backed columns vs. supported metadata columns.
Updates native scan execution to materialize _file via constant “partition values” per file.
Adds integration tests verifying native scan correctness for _file projections and fallback for unsupported metadata columns.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
thirdparty/auron-iceberg/src/test/scala/org/apache/auron/iceberg/AuronIcebergIntegrationSuite.scala	Adds integration tests and a helper to compare Spark vs. native results and assert native operator usage.
thirdparty/auron-iceberg/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeIcebergTableScanExec.scala	Plumbs supported metadata columns through native scan by emitting per-file constant values.
thirdparty/auron-iceberg/src/main/scala/org/apache/spark/sql/auron/iceberg/IcebergScanSupport.scala	Allows `_file` metadata column in native planning while still rejecting unsupported metadata columns like `_pos`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

...rg/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeIcebergTableScanExec.scala

...rty/auron-iceberg/src/test/scala/org/apache/auron/iceberg/AuronIcebergIntegrationSuite.scala

...rty/auron-iceberg/src/main/scala/org/apache/spark/sql/auron/iceberg/IcebergScanSupport.scala

Signed-off-by: weimingdiit <weimingdiit@gmail.com>

github-actions bot added the thirdparty-iceberg label Apr 8, 2026

weimingdiit changed the title ~~[AURON #2175][iceberg] Add native support for the _file metadata column~~ [AURON #2175] Add native support for the _file metadata column Apr 8, 2026

weimingdiit force-pushed the feat/metadata_columns_support_native_iceberg branch from 281a4e2 to 9feb4d6 Compare April 9, 2026 04:44

weimingdiit marked this pull request as ready for review April 9, 2026 05:57

slfan1989 requested a review from Copilot April 9, 2026 07:00

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Copilot started reviewing on behalf of slfan1989 April 9, 2026 08:06 View session

[AURON apache#2175] Add native support for the _file metadata column

f512739

Signed-off-by: weimingdiit <weimingdiit@gmail.com>

weimingdiit force-pushed the feat/metadata_columns_support_native_iceberg branch from 9feb4d6 to f512739 Compare April 9, 2026 12:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AURON #2175] Add native support for the _file metadata column#2184

[AURON #2175] Add native support for the _file metadata column#2184
weimingdiit wants to merge 1 commit intoapache:masterfrom
weimingdiit:feat/metadata_columns_support_native_iceberg

weimingdiit commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

weimingdiit commented Apr 8, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Scope of support in this PR

Why this design?

How was this patch tested?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants