refactor(bench): drop value::jsonb projection across ORE/exact/match#17
Merged
Conversation
Adds a custom `EqlV2Encrypted` sqlx Type+Decode in src/lib.rs that knows how to read the `eql_v2_encrypted` Postgres composite directly. With the harness now decoding the column natively, every bench scenario that previously projected `value::jsonb` can SELECT `value` raw. Motivation: projection-pushdown sort-key folding. When the SELECT casts the same column the query orders on, Postgres folds the cast into the Sort node — `Sort Key: ((value)::jsonb)` — which is structurally unequal to any index expression on `eql_v2_encrypted` or its extractor. Index-for-sort optimisation is silently lost. Removing the cast across all scenarios means a future ORDER BY addition can't accidentally walk into the trap. The cast is harmless under extractor ORDER BY (sort key is `ore_block_u64_8_256(value)`, projection is `value::jsonb` — no folding) but the consistent shape removes a footgun. Full writeup in `docs/reference/query-performance.md` §4 in the EQL repo. Sites updated: - src/lib.rs: `EqlV2Encrypted` type, `execute` / `execute_and_decrypt` return `Vec<(i32, EqlV2Encrypted)>`, `sample_plaintext_string` probe SELECTs `value` directly. - benches/ore.rs: 5 scenarios re-projected; preamble updated. - benches/exact.rs: 2 scenarios re-projected; comment updated. - benches/match.rs: 3 scenarios re-projected; comment updated. Also deletes `range_lt_natural_ordered_10` (sort key fundamentally can't match any allowed index — `btree (value)` is precluded by the entry-size limit, so the scenario timed in seconds at the 10M tier; documented as the perf-guide §4 anti-pattern, no need to ship as a bench number) and renames `range_lt_hybrid_ordered_10` → `range_lt_ordered_10` (with natural gone, the 'hybrid' qualifier loses its contrast partner). Smoke-run at 100k tier: ORE/ore/range_lt_ordered_10/100000 time: [507.80 µs 530.16 µs 546.18 µs] ORE/ore_decrypt/range_lt_ordered_10/100000 time: [25.308 ms 25.523 ms 25.789 ms] indexes_used: ["integer_encrypted_100000_ore_index"] Older tier metadata sidecars (10k / 1M / 10M) and the criterion result files still reference the old scenario names; they'll refresh on the next full `mise run bench:query:ore <rows>` invocation and the stale entries will age out of the `mise run report` output naturally.
- Re-ran every ORE tier (10k / 100k / 1M / 10M) against the new
cast-free scenarios from this branch. Each metadata sidecar +
criterion result file regenerated with fresh ciphertexts and the
current 5-scenario lineup (range_lt_natural_ordered_10 is gone,
range_lt_hybrid_ordered_10 → range_lt_ordered_10).
- Regenerated report/ore.md and the four ORE query charts. New chart
query_ore_range_lt_ordered_10_chart.png added (replacing the two
prior hybrid/natural charts). BENCHMARK_REPORT.md bumped to match.
- README headline numbers refreshed:
- ORE/range_gt_100 medians from the latest run (4.0/4.2/4.1/4.0 ms
vs old 4.1/4.2/4.2/4.2 — drift within noise).
- ORE row renamed: range_lt_hybrid_ordered_10 → range_lt_ordered_10
(Index Scan + Limit, 0.5 ms flat across all tiers, unchanged).
- Footnote rewritten: 'pathological ORDER BY value shape has been
removed from the suite' (was 'excluded from the headline'),
explains why (sort key can't match any allowed index — btree on
value precluded by entry-size limit), points at the EQL perf guide
as the canonical anti-pattern reference.
Other families (JSON, EXACT, MATCH, GROUP_BY, COMBO) weren't re-run,
so their numbers in the README are unchanged (last run 1-2 days ago).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a custom
EqlV2EncryptedsqlxType+Decodeinsrc/lib.rsthat knows how to read the
eql_v2_encryptedPostgres compositedirectly. With the harness now decoding the column natively, every bench
scenario that previously projected
value::jsonbcan SELECTvalueraw.Also deletes the pathological
range_lt_natural_ordered_10ORE scenarioand renames its sibling
range_lt_hybrid_ordered_10→range_lt_ordered_10.Why
Two distinct sort-key traps interact in the ORE bench, and we want a
clean baseline that triggers neither.
1. Projection-pushdown folds the cast into the sort key. When the
SELECT casts the same column the query orders on, Postgres pushes the
cast into the inner scan output and re-uses it as the sort key —
Sort Key: ((value)::jsonb)rather thanSort Key: value. No index oneql_v2_encrypted(default opclass) or on any extractor expression(
ore_block_u64_8_256(value)) matches that. Index-for-sort optimisationis silently lost.
This was harmless under our previous extractor-form
ORDER BY(sort keyis
ore_block_u64_8_256(value), projection is(value)::jsonb—structurally distinct, no folding). But removing the cast across the
board means a future scenario that adds
ORDER BY value(orORDER BY (value -> 'sel')at the field level) can't accidentally walkinto the trap.
2. The natural-form
ORDER BY valuescenario was fundamentallyunfixable. A btree directly on
valuewould let the planner use itfor sort, but the encrypted JSON body becomes the btree entry and trips
the 2712-byte btree page-key limit on realistic payloads. So
ORDER BY valuehad to fall back to Sort + bitmap/seq scan — 62 s at the 10Mtier. That's the documented anti-pattern from
docs/reference/query-performance.md§4; a public bench shouldn'theadline pathological cases.
With natural gone, the
hybrid_ordered_10qualifier no longer has acontrast partner —
range_lt_ordered_10reads more naturally.Full writeup of the projection trap is now in §4.2 of the EQL perf guide
(
encrypt-query-languagePR #212).Files changed
src/lib.rs—EqlV2EncryptedType + Decode (composite →Json<EqlCiphertext>).execute/execute_and_decryptreturnVec<(i32, EqlV2Encrypted)>.sample_plaintext_stringprobe drops its cast too.benches/ore.rs— 5 scenarios re-projected; deleted natural;renamed hybrid. Preamble updated with the projection-pushdown note.
benches/exact.rs— 2 scenarios re-projected; comment updated.benches/match.rs— 3 scenarios re-projected; comment updated.results/query/ore_metadata_100000.json— regenerated to reflectthe new scenario list (5 scenarios, no
range_lt_natural_ordered_10).Verification
Smoke-run at the 100k tier:
Index Scan + Limit confirmed in the metadata sidecar — consistent with
the previous hybrid-ordered timings. The rename and cast removal are the
only user-visible differences for this scenario.
Other-tier metadata sidecars (10k / 1M / 10M) and the criterion result
files still reference the old scenario names; they'll refresh on the
next full
mise run bench:query:ore <rows>invocation and the staleentries will age out of
mise run reportnaturally.Cross-links