refactor(bench): drop value::jsonb projection across ORE/exact/match by coderdan · Pull Request #17 · cipherstash/benches

coderdan · 2026-05-24T11:09:29Z

What

Adds a custom EqlV2Encrypted sqlx Type + Decode in src/lib.rs
that knows how to read the eql_v2_encrypted Postgres composite
directly. With the harness now decoding the column natively, every bench
scenario that previously projected value::jsonb can SELECT value raw.

Also deletes the pathological range_lt_natural_ordered_10 ORE scenario
and renames its sibling range_lt_hybrid_ordered_10 → range_lt_ordered_10.

Why

Two distinct sort-key traps interact in the ORE bench, and we want a
clean baseline that triggers neither.

1. Projection-pushdown folds the cast into the sort key. When the
SELECT casts the same column the query orders on, Postgres pushes the
cast into the inner scan output and re-uses it as the sort key —
Sort Key: ((value)::jsonb) rather than Sort Key: value. No index on
eql_v2_encrypted (default opclass) or on any extractor expression
(ore_block_u64_8_256(value)) matches that. Index-for-sort optimisation
is silently lost.

This was harmless under our previous extractor-form ORDER BY (sort key
is ore_block_u64_8_256(value), projection is (value)::jsonb —
structurally distinct, no folding). But removing the cast across the
board means a future scenario that adds ORDER BY value (or
ORDER BY (value -> 'sel') at the field level) can't accidentally walk
into the trap.

2. The natural-form ORDER BY value scenario was fundamentally
unfixable. A btree directly on value would let the planner use it
for sort, but the encrypted JSON body becomes the btree entry and trips
the 2712-byte btree page-key limit on realistic payloads. So ORDER BY value had to fall back to Sort + bitmap/seq scan — 62 s at the 10M
tier. That's the documented anti-pattern from
docs/reference/query-performance.md §4; a public bench shouldn't
headline pathological cases.

With natural gone, the hybrid_ordered_10 qualifier no longer has a
contrast partner — range_lt_ordered_10 reads more naturally.

Full writeup of the projection trap is now in §4.2 of the EQL perf guide
(encrypt-query-language PR #212).

Files changed

src/lib.rs — EqlV2Encrypted Type + Decode (composite → Json<EqlCiphertext>).
execute / execute_and_decrypt return Vec<(i32, EqlV2Encrypted)>.
sample_plaintext_string probe drops its cast too.
benches/ore.rs — 5 scenarios re-projected; deleted natural;
renamed hybrid. Preamble updated with the projection-pushdown note.
benches/exact.rs — 2 scenarios re-projected; comment updated.
benches/match.rs — 3 scenarios re-projected; comment updated.
results/query/ore_metadata_100000.json — regenerated to reflect
the new scenario list (5 scenarios, no range_lt_natural_ordered_10).

Verification

Smoke-run at the 100k tier:

ORE/ore/range_lt_ordered_10/100000            time: [507.80 µs 530.16 µs 546.18 µs]
ORE/ore_decrypt/range_lt_ordered_10/100000    time: [25.308 ms 25.523 ms 25.789 ms]
indexes_used: ["integer_encrypted_100000_ore_index"]

Index Scan + Limit confirmed in the metadata sidecar — consistent with
the previous hybrid-ordered timings. The rename and cast removal are the
only user-visible differences for this scenario.

Other-tier metadata sidecars (10k / 1M / 10M) and the criterion result
files still reference the old scenario names; they'll refresh on the
next full mise run bench:query:ore <rows> invocation and the stale
entries will age out of mise run report naturally.

Cross-links

EQL perf guide §4.2 (cast trap): docs: query-performance guide encrypt-query-language#212

Adds a custom `EqlV2Encrypted` sqlx Type+Decode in src/lib.rs that knows how to read the `eql_v2_encrypted` Postgres composite directly. With the harness now decoding the column natively, every bench scenario that previously projected `value::jsonb` can SELECT `value` raw. Motivation: projection-pushdown sort-key folding. When the SELECT casts the same column the query orders on, Postgres folds the cast into the Sort node — `Sort Key: ((value)::jsonb)` — which is structurally unequal to any index expression on `eql_v2_encrypted` or its extractor. Index-for-sort optimisation is silently lost. Removing the cast across all scenarios means a future ORDER BY addition can't accidentally walk into the trap. The cast is harmless under extractor ORDER BY (sort key is `ore_block_u64_8_256(value)`, projection is `value::jsonb` — no folding) but the consistent shape removes a footgun. Full writeup in `docs/reference/query-performance.md` §4 in the EQL repo. Sites updated: - src/lib.rs: `EqlV2Encrypted` type, `execute` / `execute_and_decrypt` return `Vec<(i32, EqlV2Encrypted)>`, `sample_plaintext_string` probe SELECTs `value` directly. - benches/ore.rs: 5 scenarios re-projected; preamble updated. - benches/exact.rs: 2 scenarios re-projected; comment updated. - benches/match.rs: 3 scenarios re-projected; comment updated. Also deletes `range_lt_natural_ordered_10` (sort key fundamentally can't match any allowed index — `btree (value)` is precluded by the entry-size limit, so the scenario timed in seconds at the 10M tier; documented as the perf-guide §4 anti-pattern, no need to ship as a bench number) and renames `range_lt_hybrid_ordered_10` → `range_lt_ordered_10` (with natural gone, the 'hybrid' qualifier loses its contrast partner). Smoke-run at 100k tier: ORE/ore/range_lt_ordered_10/100000 time: [507.80 µs 530.16 µs 546.18 µs] ORE/ore_decrypt/range_lt_ordered_10/100000 time: [25.308 ms 25.523 ms 25.789 ms] indexes_used: ["integer_encrypted_100000_ore_index"] Older tier metadata sidecars (10k / 1M / 10M) and the criterion result files still reference the old scenario names; they'll refresh on the next full `mise run bench:query:ore <rows>` invocation and the stale entries will age out of the `mise run report` output naturally.

- Re-ran every ORE tier (10k / 100k / 1M / 10M) against the new cast-free scenarios from this branch. Each metadata sidecar + criterion result file regenerated with fresh ciphertexts and the current 5-scenario lineup (range_lt_natural_ordered_10 is gone, range_lt_hybrid_ordered_10 → range_lt_ordered_10). - Regenerated report/ore.md and the four ORE query charts. New chart query_ore_range_lt_ordered_10_chart.png added (replacing the two prior hybrid/natural charts). BENCHMARK_REPORT.md bumped to match. - README headline numbers refreshed: - ORE/range_gt_100 medians from the latest run (4.0/4.2/4.1/4.0 ms vs old 4.1/4.2/4.2/4.2 — drift within noise). - ORE row renamed: range_lt_hybrid_ordered_10 → range_lt_ordered_10 (Index Scan + Limit, 0.5 ms flat across all tiers, unchanged). - Footnote rewritten: 'pathological ORDER BY value shape has been removed from the suite' (was 'excluded from the headline'), explains why (sort key can't match any allowed index — btree on value precluded by entry-size limit), points at the EQL perf guide as the canonical anti-pattern reference. Other families (JSON, EXACT, MATCH, GROUP_BY, COMBO) weren't re-run, so their numbers in the README are unchanged (last run 1-2 days ago).

coderdan added 2 commits May 24, 2026 21:08

coderdan merged commit 23c687d into main May 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(bench): drop value::jsonb projection across ORE/exact/match#17

refactor(bench): drop value::jsonb projection across ORE/exact/match#17
coderdan merged 2 commits into
mainfrom
dan/drop-jsonb-projection-cast

coderdan commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

coderdan commented May 24, 2026

What

Why

Files changed

Verification

Cross-links

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant