From 5f7b38d3cfb0d107e5bdf685dde0999d325b076e Mon Sep 17 00:00:00 2001 From: Dan Draper Date: Wed, 20 May 2026 19:40:16 +1000 Subject: [PATCH 1/2] fix(opclass): non-raising btree comparator for eql_v2_encrypted MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit eql_v2_encrypted is a composite type and eql_v2.encrypted_operator_class is its DEFAULT btree opclass, so PostgreSQL invokes the opclass FUNCTION 1 comparator during ANALYZE (and for sort-based GROUP BY / DISTINCT). FUNCTION 1 was eql_v2.compare, which #211 made strict — it raises without a Block-ORE `ob` term (U-005). So ANALYZE raised on every hm-only encrypted column, and autovacuum runs ANALYZE routinely; this broke silently across managed installs. Give the opclass its own FUNCTION 1, eql_v2.encrypted_btree_compare: a total, non-raising 3-way comparator — Block ORE when present, else a total order on the hm term, else a deterministic payload tie-break. eql_v2.compare stays strict for the < / > range-operator path. This also fixes GROUP BY / DISTINCT on a bare encrypted column: with a raising opclass comparator the planner fell through to PostgreSQL's built-in record comparison (the composite type's implicit record_ops), which compares raw ciphertext — so two encryptions of the same plaintext (same hm, different c) failed to deduplicate. - src/operators/operator_class.sql: add encrypted_btree_compare, repoint FUNCTION 1; the strict eql_v2.compare is untouched. - operator_class_tests.rs: un-ignore index_behavior_with_different_data_types (broken by the strict comparator); add analyze_on_hmac_only_column_does_not_raise. - CHANGELOG: Fixed entry. --- CHANGELOG.md | 1 + src/operators/operator_class.sql | 129 ++++++++++++++++------- tests/sqlx/tests/operator_class_tests.rs | 29 ++++- 3 files changed, 119 insertions(+), 40 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index fbf43f44..2a0bbf84 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -64,6 +64,7 @@ Targeting `2.3.0` as a breaking release. Customers re-encrypt their data as part ### Fixed +- **`ANALYZE` (and autovacuum) no longer raises on equality-only `eql_v2_encrypted` columns; `GROUP BY` / `DISTINCT` on a bare encrypted column now deduplicate correctly.** `eql_v2_encrypted` is a composite type, and `eql_v2.encrypted_operator_class` is its `DEFAULT` btree opclass — so PostgreSQL invokes the opclass's FUNCTION 1 comparator to gather column statistics during `ANALYZE`, and may use it for sort-based `GROUP BY` / `DISTINCT`. FUNCTION 1 was `eql_v2.compare`, which #211 made strict (it raises without a Block-ORE `ob` term — see U-005): `ANALYZE` raised `feature_not_supported` on every `hm`-only column, and since autovacuum runs `ANALYZE` routinely this surfaced constantly. The btree opclass now has its own FUNCTION 1 — `eql_v2.encrypted_btree_compare`, a total, non-raising 3-way comparator (Block ORE when present, else a total order on the `hm` term, else a deterministic payload tie-break). `eql_v2.compare` stays strict for the `<` / `>` range-operator path. Without a non-raising opclass comparator the planner fell through to PostgreSQL's built-in record comparison, which compares raw ciphertext — so two encryptions of the same plaintext (same `hm`, different `c`) failed to group; the dedicated comparator fixes that too. - **Range operators on `eql_v2_encrypted` now declare the correct planner selectivity functions.** `<=`, `>`, and `>=` (all three type overloads each) previously declared `RESTRICT = scalarltsel, JOIN = scalarltjoinsel` — the "less-than" estimators — which fed the planner inaccurate row-count estimates for the affected predicates. The inner `eql_v2.ore_block_u64_8_256` `>=` operator had a related miss (`scalarlesel` where `scalargesel` belongs). Now `<=` uses `scalarlesel`, `>` uses `scalargtsel`, and `>=` uses `scalargesel` (matching `*joinsel` variants for the JOIN selector). No query result changes — only plan choice for range queries against Block ORE columns, which becomes load-bearing now that bare-form range predicates structurally match a functional ORE index ([#211](https://github.com/cipherstash/encrypt-query-language/pull/211)). ([#216](https://github.com/cipherstash/encrypt-query-language/issues/216)) ### Upgrade notes diff --git a/src/operators/operator_class.sql b/src/operators/operator_class.sql index 1c2df458..c5e86066 100644 --- a/src/operators/operator_class.sql +++ b/src/operators/operator_class.sql @@ -2,27 +2,106 @@ -- REQUIRE: src/encrypted/types.sql -- REQUIRE: src/encrypted/functions.sql -- REQUIRE: src/encrypted/compare.sql +-- REQUIRE: src/hmac_256/functions.sql +-- REQUIRE: src/ore_block_u64_8_256/functions.sql +-- REQUIRE: src/ore_block_u64_8_256/compare.sql -- REQUIRE: src/operators/<.sql -- REQUIRE: src/operators/<=.sql -- REQUIRE: src/operators/=.sql -- REQUIRE: src/operators/>=.sql -- REQUIRE: src/operators/>.sql ---! @brief PostgreSQL operator class definitions for encrypted value indexing +--! @file src/operators/operator_class.sql +--! @brief Btree operator class for the `eql_v2_encrypted` composite type --! ---! Defines the operator family and operator class required for btree indexing ---! of encrypted values. This enables PostgreSQL to use encrypted columns in: ---! - CREATE INDEX statements ---! - ORDER BY clauses ---! - Range queries ---! - Primary key constraints +--! `eql_v2_encrypted` is a composite type. PostgreSQL gives every composite +--! type an implicit row-wise btree comparison (`record_ops`) — but that +--! compares the raw ciphertext byte-for-byte, so two encryptions of the same +--! plaintext (same `hm`, different `c`) would sort and group as *distinct*. +--! `eql_v2.encrypted_operator_class` is registered `DEFAULT ... USING btree` +--! specifically to override `record_ops` with a comparison that is correct +--! for encrypted data: `GROUP BY`, `DISTINCT`, `ORDER BY`, sort-merge joins +--! and `ANALYZE` on a bare `eql_v2_encrypted` column all route through +--! FUNCTION 1 below. --! ---! The operator class maps the five comparison operators (<, <=, =, >=, >) ---! to the eql_v2.compare() support function for btree index operations. +--! @note FUNCTION 1 is `eql_v2.encrypted_btree_compare`, NOT the strict +--! `eql_v2.compare`. A btree support function must be total and must +--! never raise — `ANALYZE` calls it to build column statistics on +--! every encrypted column. `eql_v2.compare` is deliberately strict +--! (it raises without a Block-ORE `ob` term — see U-005); it backs +--! the `<` / `>` range operators, not this opclass. --! ---! @note This is the default operator class for eql_v2_encrypted type +--! @note Functional indexes are the canonical recipe for *building* indexes +--! on encrypted columns (see U-001 and docs/reference/database-indexes.md). +--! This opclass exists to keep the composite type's built-in +--! comparison correct — not as an index-building recommendation. +--! +--! @see eql_v2.encrypted_hash_operator_class (hash — GROUP BY / hash joins) --! @see eql_v2.compare ---! @see PostgreSQL documentation on operator classes + +-------------------- + +--! @brief Total, non-raising btree comparator for `eql_v2_encrypted` +--! +--! Three-way comparison (`-1` / `0` / `1`) used as FUNCTION 1 of +--! `eql_v2.encrypted_operator_class`. Unlike `eql_v2.compare`, it never +--! raises: a btree support function is invoked by `ANALYZE`, sort, and +--! `GROUP BY` on every value, so raising is not an option. +--! +--! Comparison priority: +--! 1. Both operands carry `ob` (Block ORE) — order-preserving comparison +--! via `eql_v2.compare_ore_block_u64_8_256`. +--! 2. Both operands carry `hm` (HMAC-256) — a total order on the hmac +--! bytes. Not order-preserving on plaintext (hmac is not), but +--! deterministic, total, and `= 0` exactly when the hmac terms match +--! — consistent with the `=` operator, so `GROUP BY` / `DISTINCT` +--! deduplicate correctly. +--! 3. Otherwise — a deterministic order on the raw payload. Reached only +--! for term-less / mixed payloads; present so the function stays total. +--! +--! @param a eql_v2_encrypted First value +--! @param b eql_v2_encrypted Second value +--! @return integer -1, 0, or 1 +--! +--! @internal +--! @see eql_v2.encrypted_operator_class +--! @see eql_v2.compare +CREATE FUNCTION eql_v2.encrypted_btree_compare(a eql_v2_encrypted, b eql_v2_encrypted) + RETURNS integer + IMMUTABLE STRICT PARALLEL SAFE + SET search_path = pg_catalog, extensions, public +AS $$ + DECLARE + hm_a text; + hm_b text; + BEGIN + -- Block ORE on both sides: order-preserving comparison. + IF eql_v2.has_ore_block_u64_8_256(a) AND eql_v2.has_ore_block_u64_8_256(b) THEN + RETURN eql_v2.compare_ore_block_u64_8_256(a, b); + END IF; + + -- HMAC on both sides: total order on the hmac bytes. `= 0` iff the hmac + -- terms match, consistent with the `=` operator and the hash opclass. + hm_a := eql_v2.hmac_256(a)::text; + hm_b := eql_v2.hmac_256(b)::text; + IF hm_a IS NOT NULL AND hm_b IS NOT NULL THEN + RETURN CASE + WHEN hm_a < hm_b THEN -1 + WHEN hm_a > hm_b THEN 1 + ELSE 0 + END; + END IF; + + -- Fallback for term-less / mixed payloads: a deterministic, non-raising + -- total order on the raw payload. Not a normal column shape — this + -- branch only keeps the btree FUNCTION 1 contract (total, never raises). + RETURN CASE + WHEN (a).data::text < (b).data::text THEN -1 + WHEN (a).data::text > (b).data::text THEN 1 + ELSE 0 + END; + END; +$$ LANGUAGE plpgsql; -------------------- @@ -34,30 +113,4 @@ CREATE OPERATOR CLASS eql_v2.encrypted_operator_class DEFAULT FOR TYPE eql_v2_en OPERATOR 3 =, OPERATOR 4 >=, OPERATOR 5 >, - FUNCTION 1 eql_v2.compare(a eql_v2_encrypted, b eql_v2_encrypted); - - --------------------- - --- CREATE OPERATOR FAMILY eql_v2.encrypted_operator_ordered USING btree; - --- CREATE OPERATOR CLASS eql_v2.encrypted_operator_ordered FOR TYPE eql_v2_encrypted USING btree FAMILY eql_v2.encrypted_operator_ordered AS --- OPERATOR 1 <, --- OPERATOR 2 <=, --- OPERATOR 3 =, --- OPERATOR 4 >=, --- OPERATOR 5 >, --- FUNCTION 1 eql_v2.compare_ore_block_u64_8_256(a eql_v2_encrypted, b eql_v2_encrypted); - --------------------- - --- CREATE OPERATOR FAMILY eql_v2.encrypted_hmac_256_operator USING btree; - --- CREATE OPERATOR CLASS eql_v2.encrypted_hmac_256_operator FOR TYPE eql_v2_encrypted USING btree FAMILY eql_v2.encrypted_hmac_256_operator AS --- OPERATOR 1 <, --- OPERATOR 2 <=, --- OPERATOR 3 =, --- OPERATOR 4 >=, --- OPERATOR 5 >, --- FUNCTION 1 eql_v2.compare_hmac(a eql_v2_encrypted, b eql_v2_encrypted); - + FUNCTION 1 eql_v2.encrypted_btree_compare(a eql_v2_encrypted, b eql_v2_encrypted); diff --git a/tests/sqlx/tests/operator_class_tests.rs b/tests/sqlx/tests/operator_class_tests.rs index 971e8b2f..46ad9aae 100644 --- a/tests/sqlx/tests/operator_class_tests.rs +++ b/tests/sqlx/tests/operator_class_tests.rs @@ -96,9 +96,10 @@ async fn index_usage_with_explain_analyze(pool: PgPool) -> Result<()> { } #[sqlx::test] -#[ignore = "Strict eql_v2.compare contract: raises on missing ORE term. This test builds a btree using eql_v2.encrypted_operator_class over hm-only payloads; the opclass calls compare() per row, which now raises. Equality on hm-only columns should use the inlined `=` operator (post-#193), not opclass-driven btree. Re-enable once the test is rewritten to use ORE-bearing payloads (or to assert raise-on-build-with-hm-only)."] async fn index_behavior_with_different_data_types(pool: PgPool) -> Result<()> { - // Test: Index behavior with various encrypted data types (37 assertions) + // Test: Index behavior with various encrypted data types. The opclass + // btree FUNCTION 1 is eql_v2.encrypted_btree_compare (total, non-raising), + // so building the index and ANALYZE over hm-only payloads both succeed. create_table_with_encrypted(&pool).await?; @@ -221,3 +222,27 @@ async fn index_behavior_with_different_data_types(pool: PgPool) -> Result<()> { Ok(()) } + +#[sqlx::test(fixtures(path = "../fixtures", scripts("encrypted_json")))] +async fn analyze_on_hmac_only_column_does_not_raise(pool: PgPool) -> Result<()> { + // Regression: eql_v2_encrypted has a DEFAULT btree operator class, so + // ANALYZE invokes its FUNCTION 1 comparator to gather column statistics. + // That comparator must never raise — ANALYZE (autovacuum included) runs + // on every encrypted column. It previously pointed at the strict + // eql_v2.compare, which raises without a Block-ORE `ob` term, so ANALYZE + // failed on every equality-only (`hm`-only) encrypted column. FUNCTION 1 + // is now eql_v2.encrypted_btree_compare (total, non-raising). + + create_table_with_encrypted(&pool).await?; + + for _ in 0..5 { + sqlx::query("INSERT INTO encrypted(e) VALUES (create_encrypted_json(1, 'hm'))") + .execute(&pool) + .await?; + } + + // Must complete without raising. + sqlx::query("ANALYZE encrypted").execute(&pool).await?; + + Ok(()) +} From e363fa9507823fab2caeaf860fef8b7f58d51403 Mon Sep 17 00:00:00 2001 From: Dan Draper Date: Wed, 20 May 2026 20:03:53 +1000 Subject: [PATCH 2/2] chore(changelog): link PR #227 on the ANALYZE-fix entry --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 2a0bbf84..d02ffd60 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -64,7 +64,7 @@ Targeting `2.3.0` as a breaking release. Customers re-encrypt their data as part ### Fixed -- **`ANALYZE` (and autovacuum) no longer raises on equality-only `eql_v2_encrypted` columns; `GROUP BY` / `DISTINCT` on a bare encrypted column now deduplicate correctly.** `eql_v2_encrypted` is a composite type, and `eql_v2.encrypted_operator_class` is its `DEFAULT` btree opclass — so PostgreSQL invokes the opclass's FUNCTION 1 comparator to gather column statistics during `ANALYZE`, and may use it for sort-based `GROUP BY` / `DISTINCT`. FUNCTION 1 was `eql_v2.compare`, which #211 made strict (it raises without a Block-ORE `ob` term — see U-005): `ANALYZE` raised `feature_not_supported` on every `hm`-only column, and since autovacuum runs `ANALYZE` routinely this surfaced constantly. The btree opclass now has its own FUNCTION 1 — `eql_v2.encrypted_btree_compare`, a total, non-raising 3-way comparator (Block ORE when present, else a total order on the `hm` term, else a deterministic payload tie-break). `eql_v2.compare` stays strict for the `<` / `>` range-operator path. Without a non-raising opclass comparator the planner fell through to PostgreSQL's built-in record comparison, which compares raw ciphertext — so two encryptions of the same plaintext (same `hm`, different `c`) failed to group; the dedicated comparator fixes that too. +- **`ANALYZE` (and autovacuum) no longer raises on equality-only `eql_v2_encrypted` columns; `GROUP BY` / `DISTINCT` on a bare encrypted column now deduplicate correctly.** `eql_v2_encrypted` is a composite type, and `eql_v2.encrypted_operator_class` is its `DEFAULT` btree opclass — so PostgreSQL invokes the opclass's FUNCTION 1 comparator to gather column statistics during `ANALYZE`, and may use it for sort-based `GROUP BY` / `DISTINCT`. FUNCTION 1 was `eql_v2.compare`, which #211 made strict (it raises without a Block-ORE `ob` term — see U-005): `ANALYZE` raised `feature_not_supported` on every `hm`-only column, and since autovacuum runs `ANALYZE` routinely this surfaced constantly. The btree opclass now has its own FUNCTION 1 — `eql_v2.encrypted_btree_compare`, a total, non-raising 3-way comparator (Block ORE when present, else a total order on the `hm` term, else a deterministic payload tie-break). `eql_v2.compare` stays strict for the `<` / `>` range-operator path. Without a non-raising opclass comparator the planner fell through to PostgreSQL's built-in record comparison, which compares raw ciphertext — so two encryptions of the same plaintext (same `hm`, different `c`) failed to group; the dedicated comparator fixes that too. ([#227](https://github.com/cipherstash/encrypt-query-language/pull/227)) - **Range operators on `eql_v2_encrypted` now declare the correct planner selectivity functions.** `<=`, `>`, and `>=` (all three type overloads each) previously declared `RESTRICT = scalarltsel, JOIN = scalarltjoinsel` — the "less-than" estimators — which fed the planner inaccurate row-count estimates for the affected predicates. The inner `eql_v2.ore_block_u64_8_256` `>=` operator had a related miss (`scalarlesel` where `scalargesel` belongs). Now `<=` uses `scalarlesel`, `>` uses `scalargtsel`, and `>=` uses `scalargesel` (matching `*joinsel` variants for the JOIN selector). No query result changes — only plan choice for range queries against Block ORE columns, which becomes load-bearing now that bare-form range predicates structurally match a functional ORE index ([#211](https://github.com/cipherstash/encrypt-query-language/pull/211)). ([#216](https://github.com/cipherstash/encrypt-query-language/issues/216)) ### Upgrade notes