Skip to content

refactor(bench): drop value::jsonb projection across ORE/exact/match#17

Merged
coderdan merged 2 commits into
mainfrom
dan/drop-jsonb-projection-cast
May 24, 2026
Merged

refactor(bench): drop value::jsonb projection across ORE/exact/match#17
coderdan merged 2 commits into
mainfrom
dan/drop-jsonb-projection-cast

Conversation

@coderdan
Copy link
Copy Markdown
Contributor

What

Adds a custom EqlV2Encrypted sqlx Type + Decode in src/lib.rs
that knows how to read the eql_v2_encrypted Postgres composite
directly. With the harness now decoding the column natively, every bench
scenario that previously projected value::jsonb can SELECT value raw.

Also deletes the pathological range_lt_natural_ordered_10 ORE scenario
and renames its sibling range_lt_hybrid_ordered_10range_lt_ordered_10.

Why

Two distinct sort-key traps interact in the ORE bench, and we want a
clean baseline that triggers neither.

1. Projection-pushdown folds the cast into the sort key. When the
SELECT casts the same column the query orders on, Postgres pushes the
cast into the inner scan output and re-uses it as the sort key —
Sort Key: ((value)::jsonb) rather than Sort Key: value. No index on
eql_v2_encrypted (default opclass) or on any extractor expression
(ore_block_u64_8_256(value)) matches that. Index-for-sort optimisation
is silently lost.

This was harmless under our previous extractor-form ORDER BY (sort key
is ore_block_u64_8_256(value), projection is (value)::jsonb
structurally distinct, no folding). But removing the cast across the
board means a future scenario that adds ORDER BY value (or
ORDER BY (value -> 'sel') at the field level) can't accidentally walk
into the trap.

2. The natural-form ORDER BY value scenario was fundamentally
unfixable.
A btree directly on value would let the planner use it
for sort, but the encrypted JSON body becomes the btree entry and trips
the 2712-byte btree page-key limit on realistic payloads. So ORDER BY value had to fall back to Sort + bitmap/seq scan — 62 s at the 10M
tier. That's the documented anti-pattern from
docs/reference/query-performance.md §4; a public bench shouldn't
headline pathological cases.

With natural gone, the hybrid_ordered_10 qualifier no longer has a
contrast partner — range_lt_ordered_10 reads more naturally.

Full writeup of the projection trap is now in §4.2 of the EQL perf guide
(encrypt-query-language PR #212).

Files changed

  • src/lib.rsEqlV2Encrypted Type + Decode (composite → Json<EqlCiphertext>).
    execute / execute_and_decrypt return Vec<(i32, EqlV2Encrypted)>.
    sample_plaintext_string probe drops its cast too.
  • benches/ore.rs — 5 scenarios re-projected; deleted natural;
    renamed hybrid. Preamble updated with the projection-pushdown note.
  • benches/exact.rs — 2 scenarios re-projected; comment updated.
  • benches/match.rs — 3 scenarios re-projected; comment updated.
  • results/query/ore_metadata_100000.json — regenerated to reflect
    the new scenario list (5 scenarios, no range_lt_natural_ordered_10).

Verification

Smoke-run at the 100k tier:

ORE/ore/range_lt_ordered_10/100000            time: [507.80 µs 530.16 µs 546.18 µs]
ORE/ore_decrypt/range_lt_ordered_10/100000    time: [25.308 ms 25.523 ms 25.789 ms]
indexes_used: ["integer_encrypted_100000_ore_index"]

Index Scan + Limit confirmed in the metadata sidecar — consistent with
the previous hybrid-ordered timings. The rename and cast removal are the
only user-visible differences for this scenario.

Other-tier metadata sidecars (10k / 1M / 10M) and the criterion result
files still reference the old scenario names; they'll refresh on the
next full mise run bench:query:ore <rows> invocation and the stale
entries will age out of mise run report naturally.

Cross-links

coderdan added 2 commits May 24, 2026 21:08
Adds a custom `EqlV2Encrypted` sqlx Type+Decode in src/lib.rs that knows
how to read the `eql_v2_encrypted` Postgres composite directly. With the
harness now decoding the column natively, every bench scenario that
previously projected `value::jsonb` can SELECT `value` raw.

Motivation: projection-pushdown sort-key folding. When the SELECT casts
the same column the query orders on, Postgres folds the cast into the
Sort node — `Sort Key: ((value)::jsonb)` — which is structurally
unequal to any index expression on `eql_v2_encrypted` or its extractor.
Index-for-sort optimisation is silently lost. Removing the cast across
all scenarios means a future ORDER BY addition can't accidentally walk
into the trap. The cast is harmless under extractor ORDER BY (sort key
is `ore_block_u64_8_256(value)`, projection is `value::jsonb` — no
folding) but the consistent shape removes a footgun. Full writeup in
`docs/reference/query-performance.md` §4 in the EQL repo.

Sites updated:
- src/lib.rs: `EqlV2Encrypted` type, `execute` / `execute_and_decrypt`
  return `Vec<(i32, EqlV2Encrypted)>`, `sample_plaintext_string` probe
  SELECTs `value` directly.
- benches/ore.rs: 5 scenarios re-projected; preamble updated.
- benches/exact.rs: 2 scenarios re-projected; comment updated.
- benches/match.rs: 3 scenarios re-projected; comment updated.

Also deletes `range_lt_natural_ordered_10` (sort key fundamentally can't
match any allowed index — `btree (value)` is precluded by the entry-size
limit, so the scenario timed in seconds at the 10M tier; documented as
the perf-guide §4 anti-pattern, no need to ship as a bench number) and
renames `range_lt_hybrid_ordered_10` → `range_lt_ordered_10` (with
natural gone, the 'hybrid' qualifier loses its contrast partner).

Smoke-run at 100k tier:
  ORE/ore/range_lt_ordered_10/100000            time: [507.80 µs 530.16 µs 546.18 µs]
  ORE/ore_decrypt/range_lt_ordered_10/100000    time: [25.308 ms 25.523 ms 25.789 ms]
  indexes_used: ["integer_encrypted_100000_ore_index"]

Older tier metadata sidecars (10k / 1M / 10M) and the criterion result
files still reference the old scenario names; they'll refresh on the
next full `mise run bench:query:ore <rows>` invocation and the stale
entries will age out of the `mise run report` output naturally.
- Re-ran every ORE tier (10k / 100k / 1M / 10M) against the new
  cast-free scenarios from this branch. Each metadata sidecar +
  criterion result file regenerated with fresh ciphertexts and the
  current 5-scenario lineup (range_lt_natural_ordered_10 is gone,
  range_lt_hybrid_ordered_10 → range_lt_ordered_10).

- Regenerated report/ore.md and the four ORE query charts. New chart
  query_ore_range_lt_ordered_10_chart.png added (replacing the two
  prior hybrid/natural charts). BENCHMARK_REPORT.md bumped to match.

- README headline numbers refreshed:
  - ORE/range_gt_100 medians from the latest run (4.0/4.2/4.1/4.0 ms
    vs old 4.1/4.2/4.2/4.2 — drift within noise).
  - ORE row renamed: range_lt_hybrid_ordered_10 → range_lt_ordered_10
    (Index Scan + Limit, 0.5 ms flat across all tiers, unchanged).
  - Footnote rewritten: 'pathological ORDER BY value shape has been
    removed from the suite' (was 'excluded from the headline'),
    explains why (sort key can't match any allowed index — btree on
    value precluded by entry-size limit), points at the EQL perf guide
    as the canonical anti-pattern reference.

Other families (JSON, EXACT, MATCH, GROUP_BY, COMBO) weren't re-run,
so their numbers in the README are unchanged (last run 1-2 days ago).
@coderdan coderdan merged commit 23c687d into main May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant