Skip to content

[AURON #2175] Add native support for the _file metadata column#2184

Open
weimingdiit wants to merge 1 commit intoapache:masterfrom
weimingdiit:feat/metadata_columns_support_native_iceberg
Open

[AURON #2175] Add native support for the _file metadata column#2184
weimingdiit wants to merge 1 commit intoapache:masterfrom
weimingdiit:feat/metadata_columns_support_native_iceberg

Conversation

@weimingdiit
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #2175

Rationale for this change

This PR adds native support for Iceberg metadata columns in Auron, starting with _file.

Previously, Iceberg scans fell back whenever metadata columns were projected. With this change, queries that read _file can remain on the native Iceberg scan path.
Iceberg metadata columns are useful in real workloads for debugging, lineage, and inspection queries. However, Auron previously treated metadata columns as unsupported and fell back to Spark.

This PR improves native Iceberg scan coverage by supporting metadata columns that can be represented as file-level constant values, while still falling back for unsupported row-level metadata columns.

What changes are included in this PR?

This PR:

  • adds native support for the Iceberg _file metadata column
  • keeps unsupported metadata columns such as _pos on the fallback path
  • extends IcebergScanPlan to distinguish between:
    • file-backed data columns
    • metadata columns materialized outside the file payload
  • updates IcebergScanSupport to stop rejecting all metadata columns unconditionally
  • passes supported metadata values through the native Iceberg scan path as per-file constant values
  • updates NativeIcebergTableScanExec to project both normal data columns and supported metadata columns
  • adds integration tests in AuronIcebergIntegrationSuite

Scope of support in this PR

This PR intentionally takes a conservative approach.

Supported in native scan:

  • _file

Still falls back:

  • _pos
  • other unsupported metadata columns that require row-level metadata handling

Why this design?

_file is a file-level metadata column: every row coming from the same file shares the same value. That makes it a good fit for the existing native file-scan path by treating it as a per-file constant column.

In contrast, _pos is row-level metadata and cannot be represented correctly with the same mechanism, so it remains unsupported in native execution for now.

How was this patch tested?

CI.

@weimingdiit weimingdiit changed the title [AURON #2175][iceberg] Add native support for the _file metadata column [AURON #2175] Add native support for the _file metadata column Apr 8, 2026
@weimingdiit weimingdiit force-pushed the feat/metadata_columns_support_native_iceberg branch from 281a4e2 to 9feb4d6 Compare April 9, 2026 04:44
@weimingdiit weimingdiit marked this pull request as ready for review April 9, 2026 05:57
@slfan1989 slfan1989 requested a review from Copilot April 9, 2026 07:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds native execution support for the Iceberg _file metadata column by treating it as a per-file constant value, allowing queries projecting _file to remain on the native Iceberg scan path (while still falling back for row-level metadata like _pos).

Changes:

  • Extends IcebergScanPlan to split projected columns into file-backed columns vs. supported metadata columns.
  • Updates native scan execution to materialize _file via constant “partition values” per file.
  • Adds integration tests verifying native scan correctness for _file projections and fallback for unsupported metadata columns.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
thirdparty/auron-iceberg/src/test/scala/org/apache/auron/iceberg/AuronIcebergIntegrationSuite.scala Adds integration tests and a helper to compare Spark vs. native results and assert native operator usage.
thirdparty/auron-iceberg/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeIcebergTableScanExec.scala Plumbs supported metadata columns through native scan by emitting per-file constant values.
thirdparty/auron-iceberg/src/main/scala/org/apache/spark/sql/auron/iceberg/IcebergScanSupport.scala Allows _file metadata column in native planning while still rejecting unsupported metadata columns like _pos.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: weimingdiit <weimingdiit@gmail.com>
@weimingdiit weimingdiit force-pushed the feat/metadata_columns_support_native_iceberg branch from 9feb4d6 to f512739 Compare April 9, 2026 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement native support for Iceberg _file metadata column

2 participants