Skip to content

bench(parquet): add row filter strategy baseline cases#10135

Open
hhhizzz wants to merge 8 commits into
apache:mainfrom
hhhizzz:codex/parquet-reader-bench-baseline
Open

bench(parquet): add row filter strategy baseline cases#10135
hhhizzz wants to merge 8 commits into
apache:mainfrom
hhhizzz:codex/parquet-reader-bench-baseline

Conversation

@hhhizzz

@hhhizzz hhhizzz commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

This PR is the first smaller PR split out from #9956 ("Optimize parquet row filter auto strategy with adaptive fallback").

The goal is to land the benchmark coverage first, before changing row-filter planning or execution behavior. This gives follow-up PRs a stable benchmark baseline already on main, making it easier to compare each later behavior change against the same benchmark cases.

Planned split from #9956:

  1. Add benchmark baseline cases. This PR.
  2. Split row-selection strategy / sparse mask correctness changes.
  3. Add post-filter execution primitives.
  4. Add Auto policy / adaptive materialization core.
  5. Add policy refinements for projected predicates, fixed-prefix guards, and cacheable predicate cases.

What changes are included in this PR?

This PR adds benchmark coverage only. The diff is limited to benchmark targets under parquet/benches, with no changes to production reader code or public APIs.

It extends arrow_reader_row_filter with:

  • strategy comparison cases for:
    • manual full-scan post-filtering;
    • current RowSelectionPolicy::Auto;
    • explicit Selectors;
    • explicit Mask;
  • focused row-filter shapes inspired by ClickBench and TPC-DS workloads;
  • projected-predicate cases;
  • count-only / filter-only / fixed-width / variable-width projection cases;
  • nested whole-root output benchmark coverage;
  • projected scan focus cases that do not construct a RowFilter.

It also extends row_selection_cursor with shape-focused selector/mask cases that vary:

  • selected-run length;
  • selectivity;
  • primitive vs variable-width payloads.

This PR intentionally does not change production reader behavior.

Are these changes tested?

Yes. This PR was validated with:

cargo fmt -- parquet/benches/arrow_reader_row_filter.rs parquet/benches/row_selection_cursor.rs
cargo check -p parquet --bench row_selection_cursor --features arrow
cargo check -p parquet --bench arrow_reader_row_filter --features arrow,async
git diff --check

No benchmark result is claimed in this PR. The purpose is to add baseline benchmark coverage so later PRs can report comparable performance evidence.

Are there any user-facing changes?

No. This only changes benchmark code.

@github-actions github-actions Bot added the parquet Changes to the parquet crate label Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant