Skip to content

[AURON #2163] Support native Iceberg scans with residual filters via scan pruning and post-scan native filter#2164

Merged
slfan1989 merged 1 commit intoapache:masterfrom
weimingdiit:feat/iceberg-native-support
Apr 10, 2026
Merged

[AURON #2163] Support native Iceberg scans with residual filters via scan pruning and post-scan native filter#2164
slfan1989 merged 1 commit intoapache:masterfrom
weimingdiit:feat/iceberg-native-support

Conversation

@weimingdiit
Copy link
Copy Markdown
Contributor

@weimingdiit weimingdiit commented Apr 4, 2026

Which issue does this PR close?

Closes #2163

Rationale for this change

The previous behavior was too conservative for Iceberg scans with residual filters. Even when the scan could still be executed natively and the remaining filter logic could be handled above the scan, the planner would fall back entirely.

This PR improves native coverage for Iceberg reads by:

  • preserving correctness for unsupported predicates
  • increasing native scan applicability for common filter patterns
  • reusing the existing native filter path instead of requiring full scan-level predicate support up front

This is an incremental improvement to Iceberg native execution, not full Iceberg feature coverage.

What changes are included in this PR?

This PR:

  • removes the unconditional fallback for Iceberg scans with non-alwaysTrue residual filters
  • extends IcebergScanPlan to carry pruningPredicates
  • extracts Iceberg scan filter expressions and converts a supported subset into Spark expressions
  • converts those Spark expressions into native scan pruning predicates
  • passes pruning predicates down through NativeIcebergTableScanExec
  • keeps unsupported predicates on the upper NativeFilter path
  • adds integration coverage for:
    • equality-based pruning
    • IN-based pruning
    • partial pushdown where only part of the predicate is pushed to scan pruning

Supported predicate scope in this PR

The scan-pruning conversion added here supports a limited subset of Iceberg expressions, including:

  • AND
  • OR
  • NOT
  • IS NULL
  • IS NOT NULL
  • IS NAN
  • NOT NAN
  • comparison predicates such as =, !=, <, <=, >, >=
  • IN
  • NOT IN

The current implementation intentionally avoids pushing some types through scan pruning, including:

  • StringType
  • BinaryType
  • DecimalType

Unsupported predicates are not pushed into scan pruning and are instead left for post-scan native filtering.

How was this patch tested?

Integration coverage was added in AuronIcebergIntegrationSuite

@weimingdiit weimingdiit force-pushed the feat/iceberg-native-support branch from 22c1061 to 8116033 Compare April 4, 2026 10:13
@github-actions github-actions bot removed the spark-ui label Apr 4, 2026
@weimingdiit weimingdiit force-pushed the feat/iceberg-native-support branch 2 times, most recently from e19e1a9 to 4fe640b Compare April 4, 2026 10:33
@weimingdiit weimingdiit marked this pull request as ready for review April 4, 2026 15:37
@slfan1989 slfan1989 requested a review from Copilot April 5, 2026 03:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves native execution coverage for Iceberg table reads by allowing native Iceberg scans to proceed even when Iceberg reports residual filters, and by pushing a supported subset of Iceberg filter expressions into native scan-pruning predicates while leaving unsupported parts to be evaluated via the existing native post-scan filter path.

Changes:

  • Extend IcebergScanPlan to carry pruningPredicates derived from Iceberg scan filter expressions.
  • Pass pruning predicates into NativeIcebergTableScanExec so native Parquet/ORC scans can apply scan pruning.
  • Add/adjust Iceberg integration tests to validate pruning predicate population and partial pushdown behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
thirdparty/auron-iceberg/src/test/scala/org/apache/auron/iceberg/AuronIcebergIntegrationSuite.scala Adds integration coverage for pruning predicates and partial pushdown scenarios.
thirdparty/auron-iceberg/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeIcebergTableScanExec.scala Threads pruningPredicates into the native scan protobuf nodes.
thirdparty/auron-iceberg/src/main/scala/org/apache/spark/sql/auron/iceberg/IcebergScanSupport.scala Removes residual-filter fallback and adds filter-expression extraction + conversion into native scan pruning predicates.
spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConverters.scala Ensures rename-column handling treats NativeIcebergTableScan as a scan requiring renames.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…s via scan pruning and post-scan native filter
@weimingdiit weimingdiit force-pushed the feat/iceberg-native-support branch from 9ec5a13 to eae6bfe Compare April 5, 2026 06:44
@slfan1989 slfan1989 merged commit 0cbfeed into apache:master Apr 10, 2026
123 checks passed
@slfan1989
Copy link
Copy Markdown
Contributor

@weimingdiit Thanks for the contribution! @yew1eb Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve native Iceberg scan coverage by pushing supported residual filters into scan pruning

4 participants