bench(parquet): add row filter strategy baseline cases by hhhizzz · Pull Request #10135 · apache/arrow-rs

hhhizzz · 2026-06-12T09:59:40Z

Which issue does this PR close?

Part of [Parquet] Better heuristics to pick between RowSelection and Mask filter representation #8846.
Part of [EPIC] Faster performance for parquet predicate evaluation for non selective filters #7456.
Split out from Optimize parquet row filter auto strategy with adaptive fallback #9956.

Rationale for this change

This PR is the first smaller PR split out from #9956 ("Optimize parquet row filter auto strategy with adaptive fallback").

The goal is to land the benchmark coverage first, before changing row-filter planning or execution behavior. This gives follow-up PRs a stable benchmark baseline already on main, making it easier to compare each later behavior change against the same benchmark cases.

Planned split from #9956:

Add benchmark baseline cases. This PR.
Split row-selection strategy / sparse mask correctness changes.
Add post-filter execution primitives.
Add Auto policy / adaptive materialization core.
Add policy refinements for projected predicates, fixed-prefix guards, and cacheable predicate cases.

What changes are included in this PR?

This PR adds benchmark coverage only. The diff is limited to benchmark targets under parquet/benches, with no changes to production reader code or public APIs.

It extends arrow_reader_row_filter with:

strategy comparison cases for:
- manual full-scan post-filtering;
- current RowSelectionPolicy::Auto;
- explicit Selectors;
- explicit Mask;
focused row-filter shapes inspired by ClickBench and TPC-DS workloads;
projected-predicate cases;
count-only / filter-only / fixed-width / variable-width projection cases;
nested whole-root output benchmark coverage;
projected scan focus cases that do not construct a RowFilter.

It also extends row_selection_cursor with shape-focused selector/mask cases that vary:

selected-run length;
selectivity;
primitive vs variable-width payloads.

This PR intentionally does not change production reader behavior.

Are these changes tested?

Yes. This PR was validated with:

cargo fmt -- parquet/benches/arrow_reader_row_filter.rs parquet/benches/row_selection_cursor.rs
cargo check -p parquet --bench row_selection_cursor --features arrow
cargo check -p parquet --bench arrow_reader_row_filter --features arrow,async
git diff --check

No benchmark result is claimed in this PR. The purpose is to add baseline benchmark coverage so later PRs can report comparable performance evidence.

Are there any user-facing changes?

No. This only changes benchmark code.

This reverts commit f11b48e.

This reverts commit faa058f.

…baseline

bench(parquet): add row filter baseline cases

4e29a81

github-actions Bot added the parquet Changes to the parquet crate label Jun 12, 2026

Qiwei Huang and others added 7 commits June 12, 2026 18:54

ci: install cargo-msrv with locked dependencies

f11b48e

Revert "ci: install cargo-msrv with locked dependencies"

c8a627c

This reverts commit f11b48e.

ci: install cargo-msrv with locked dependencies

faa058f

Revert "ci: install cargo-msrv with locked dependencies"

d95eddf

This reverts commit faa058f.

Expose parquet benchmark helpers and streamline post-filtering

40727a5

Refactor parquet row selection shape-focus benchmarks

a5c636b

Merge remote-tracking branch 'origin/main' into parquet-reader-bench-…

5b4a6e8

…baseline

sdf-jkl mentioned this pull request Jun 13, 2026

Optimize parquet row filter auto strategy with adaptive fallback #9956

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench(parquet): add row filter strategy baseline cases#10135

bench(parquet): add row filter strategy baseline cases#10135
hhhizzz wants to merge 8 commits into
apache:mainfrom
hhhizzz:codex/parquet-reader-bench-baseline

hhhizzz commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hhhizzz commented Jun 12, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant