Skip to content

feat: support file-level parquet row selections#22940

Open
haohuaijin wants to merge 4 commits into
apache:mainfrom
haohuaijin:row-selection-access-plan
Open

feat: support file-level parquet row selections#22940
haohuaijin wants to merge 4 commits into
apache:mainfrom
haohuaijin:row-selection-access-plan

Conversation

@haohuaijin

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

  • Add public ParquetRowSelection.
  • Add ParquetAccessPlan::try_new_from_overall_row_selection.
  • Allow Parquet opener setup to read either ParquetAccessPlan or ParquetRowSelection.
  • Reject using both extension types on the same file.
  • Validate that the selection row count matches the file row count.
  • Document the new extension path in ParquetSource.

Are these changes tested?

Yes. This PR adds tests for:

  • converting a file-level selection into row-group access
  • rejecting invalid selection row counts
  • creating an initial plan from ParquetRowSelection
  • rejecting both ParquetAccessPlan and ParquetRowSelection on the same file

Are there any user-facing changes?

Yes. This adds a new public ParquetRowSelection type for callers that want to attach a file-level Parquet RowSelection to a PartitionedFile.

@github-actions github-actions Bot added the datasource Changes to the datasource crate label Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support file-level Parquet RowSelection

1 participant