Summary
Encrypting the empty string "" as ordered encrypted text produces an empty ORE term (ob: []) (and empty bloom bf: []), and the comparison behaviour of an empty ORE term is undefined / inconsistent across EQL versions. This needs a deliberate decision: is "" a supported plaintext for ordered encrypted text, or is it out of scope?
Surfaced while adding eql_v3.text (#260), where "" was used as the SQLx matrix "zero" pivot and broke ordering, aggregates, and comparison counts.
Findings
EQL v2 (main) — no coverage, but defensive handling exists
- v2 has zero test coverage of
"" for encrypted text. The ORE-text fixture (tests/sqlx/migrations/006_install_ore_text_data.sql) is 100 real words; the smallest is 'aardvark'. No fixture ever has an empty ob.
- v2 does defensively handle empty term arrays:
eql_v2.compare_ore_block_u64_8_256_terms documents "empty arrays sort before non-empty arrays" and returns -1 for empty-vs-non-empty. This path is never exercised by any test.
EQL v3 (eql_v3.text, #260) — diverges from v2
- The v3 SEM ORE fork does not reproduce v2's empty-array guard. With
"" in the fixtures, empty ob orders as the maximum, not the minimum:
eql_v3.max(eql_v3.text_ord) returns the "" payload instead of the real max ("zzzz").
payload::eql_v3.text_ord > '' returns 0 rows (expected: all non-empty values).
'zzzz' > payload counts are off by one (the "" row is silently dropped).
count_distinct over ord_term hits function … returned NULL on the empty term.
So v2 says "empty sorts first" (untested), v3 effectively sorts it last/inconsistently — neither is validated end-to-end.
Decision needed
-
Is "" (and other degenerate/too-short-to-tokenize plaintext) a supported value for ordered encrypted text?
- If yes: the v3 SEM ORE comparison must define and implement empty-term ordering (mirror v2's "empty sorts first"), with explicit fixtures/tests covering it across
_eq / _ord / _ord_ore and min/max. The match (bf: []) empty-set semantics should also be pinned (everything contains the empty filter; the empty filter contains nothing).
- If no: document the constraint (minimum/at-least-one-ngram plaintext), and decide where it's enforced (proxy / client / EQL).
-
Reconcile the v2↔v3 ORE empty-array divergence regardless of (1), so the two schemas don't disagree on a payload either might receive.
Immediate mitigation (in #260)
PR #260 will drop "" from the eql_v3.text fixtures and use real non-empty values (mirroring v2's "real word" convention, smallest a short real token), plus replace the matrix's Default::default() zero-pivot with an overridable ScalarType::zero_pivot() so text supplies a real mid value. That unblocks the PR; this issue tracks the underlying behavioural decision and the v2/v3 divergence, which outlive #260.
References
Summary
Encrypting the empty string
""as ordered encrypted text produces an empty ORE term (ob: []) (and empty bloombf: []), and the comparison behaviour of an empty ORE term is undefined / inconsistent across EQL versions. This needs a deliberate decision: is""a supported plaintext for ordered encrypted text, or is it out of scope?Surfaced while adding
eql_v3.text(#260), where""was used as the SQLx matrix "zero" pivot and broke ordering, aggregates, and comparison counts.Findings
EQL v2 (
main) — no coverage, but defensive handling exists""for encrypted text. The ORE-text fixture (tests/sqlx/migrations/006_install_ore_text_data.sql) is 100 real words; the smallest is'aardvark'. No fixture ever has an emptyob.eql_v2.compare_ore_block_u64_8_256_termsdocuments "empty arrays sort before non-empty arrays" and returns-1for empty-vs-non-empty. This path is never exercised by any test.EQL v3 (
eql_v3.text, #260) — diverges from v2""in the fixtures, emptyoborders as the maximum, not the minimum:eql_v3.max(eql_v3.text_ord)returns the""payload instead of the real max ("zzzz").payload::eql_v3.text_ord > ''returns 0 rows (expected: all non-empty values).'zzzz' > payloadcounts are off by one (the""row is silently dropped).count_distinctoverord_termhitsfunction … returned NULLon the empty term.So v2 says "empty sorts first" (untested), v3 effectively sorts it last/inconsistently — neither is validated end-to-end.
Decision needed
Is
""(and other degenerate/too-short-to-tokenize plaintext) a supported value for ordered encrypted text?_eq/_ord/_ord_oreandmin/max. The match (bf: []) empty-set semantics should also be pinned (everything contains the empty filter; the empty filter contains nothing).Reconcile the v2↔v3 ORE empty-array divergence regardless of (1), so the two schemas don't disagree on a payload either might receive.
Immediate mitigation (in #260)
PR #260 will drop
""from theeql_v3.textfixtures and use real non-empty values (mirroring v2's "real word" convention, smallest a short real token), plus replace the matrix'sDefault::default()zero-pivot with an overridableScalarType::zero_pivot()sotextsupplies a real mid value. That unblocks the PR; this issue tracks the underlying behavioural decision and the v2/v3 divergence, which outlive #260.References
src/ore_block_u64_8_256/functions.sql—eql_v2.compare_ore_block_u64_8_256_terms(empty-array handling)src/v3/sem/ore_block_u64_8_256/— the v3 SEM forktests/sqlx/migrations/006_install_ore_text_data.sql— v2 ORE-text fixtures (smallest =aardvark)