Skip to content

Implement native support for Iceberg _file metadata column #2175

@weimingdiit

Description

@weimingdiit

Describe
Iceberg scans in Auron currently fall back when the projected schema contains Iceberg metadata columns, even for cases where file-level metadata such as _file could be provided natively.

This prevents queries like select _file from iceberg_table or select data_col, _file from iceberg_table from using the native Iceberg scan path.

Describe the solution you'd like
Add native support for Iceberg metadata columns that can be represented as file-level constant values, starting with _file.

A possible approach is:

  • stop treating all Iceberg metadata columns as unsupported by default
  • distinguish between:
    • file-backed data columns
    • metadata columns that can be materialized outside the file payload
  • extend IcebergScanPlan to carry both the file schema and metadata/extra column schema
  • pass supported metadata values through the native file-scan path, for example via per-file partition/constant values
  • project both normal data columns and supported metadata columns from the native Iceberg scan
  • continue to fall back for unsupported row-level metadata columns such as _pos

Additional context
This should be implemented conservatively.

A good initial scope is:

  • support _file
  • keep _pos and other row-level metadata columns on the fallback path

Suggested regression coverage:

  • select _file from iceberg_table uses NativeIcebergTableScan
  • select id, _file from iceberg_table uses NativeIcebergTableScan
  • select _pos from iceberg_table still falls back
  • native results match vanilla Spark results

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions