Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/test-eql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ jobs:
run: |
mise run test:matrix:inventory
git diff --exit-code -- tests/sqlx/snapshots/int4_matrix_tests.txt \
tests/sqlx/snapshots/int2_matrix_tests.txt \
|| { echo "Coverage inventory stale — run 'mise run test:matrix:inventory' and commit."; exit 1; }

test:
Expand Down
5 changes: 3 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,9 @@ Each entry that ships in a published release links to the PR that introduced it.

### Added

- **`eql_v2_int4` encrypted-domain type family.** Four jsonb-backed domains for encrypted `int4` columns: `eql_v2_int4` (storage-only), `eql_v2_int4_eq` (`=` / `<>` via HMAC), and `eql_v2_int4_ord` / `eql_v2_int4_ord_ore` (also `<` `<=` `>` `>=` via ORE block terms). Supported comparisons resolve to inlinable wrappers; the native `jsonb` operator surface reachable through domain fallback is blocked (raises rather than silently mis-resolving). Each domain's `CHECK` requires the EQL envelope (`v`, `i`), the ciphertext (`c`), and the variant's index term(s), and pins the payload version (`VALUE->>'v' = '2'`, matching `eql_v2._encrypted_check_v`) — so a missing key or wrong-version payload is rejected on insert or cast rather than surfacing later at query time. Index via a functional index on the `eql_v2.eq_term` / `eql_v2.ord_term` extractors, not an operator class on the domain. Why: a type-safe, per-capability encrypted integer column instead of the untyped `eql_v2_encrypted`. This is the reference scalar implementation for the generated domain family. ([#239](https://github.com/cipherstash/encrypt-query-language/pull/239), supersedes [#225](https://github.com/cipherstash/encrypt-query-language/pull/225))
- **Per-domain `MIN` / `MAX` aggregates for the encrypted-domain family.** `eql_v2.min(eql_v2_<T>_ord)` / `eql_v2.max(eql_v2_<T>_ord)` (and the `_ord_ore` twin) are generated for every ord-capable scalar variant, giving type-safe extrema on domain-typed columns — comparison routes through the variant's `<` / `>` operator (ORE block term, no decryption). The aggregates are declared `PARALLEL = SAFE` with a combine function (the state function itself — min/max are associative), so PostgreSQL can use partial/parallel aggregation on large `GROUP BY` workloads. Why: the new domain types previously had no equivalent of the composite-type aggregates. The existing `eql_v2.min(eql_v2_encrypted)` / `eql_v2.max(eql_v2_encrypted)` aggregates are **retained** and continue to work on `eql_v2_encrypted` columns; the per-domain aggregates are additive and coexist with them. ([#239](https://github.com/cipherstash/encrypt-query-language/pull/239))
- **`eql_v3` encrypted-domain schema, with the `int4` family as its first member.** Encrypted-domain type families now live in a new, additional `eql_v3` schema (the existing `eql_v2` schema is unchanged — it keeps the core types/operators and stays the documented public API). Four jsonb-backed domains for encrypted `int4` columns: `eql_v3.int4` (storage-only), `eql_v3.int4_eq` (`=` / `<>` via HMAC), and `eql_v3.int4_ord` / `eql_v3.int4_ord_ore` (also `<` `<=` `>` `>=` via ORE block terms). Supported comparisons resolve to inlinable wrappers; the native `jsonb` operator surface reachable through domain fallback is blocked (raises rather than silently mis-resolving). Each domain's `CHECK` requires the EQL envelope (`v`, `i`), the ciphertext (`c`), and the variant's index term(s), and pins the payload version (`VALUE->>'v' = '2'`, matching `eql_v2._encrypted_check_v`) — so a missing key or wrong-version payload is rejected on insert or cast rather than surfacing later at query time. Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` extractors, not an operator class on the domain. The extractors still return the core `eql_v2.hmac_256` / `eql_v2.ore_block_u64_8_256` index-term types, which remain in `eql_v2` and are referenced cross-schema. Why: a type-safe, per-capability encrypted integer column instead of the untyped `eql_v2_encrypted`, namespaced under its own schema. This is the reference scalar implementation for the generated domain family. ([#239](https://github.com/cipherstash/encrypt-query-language/pull/239), supersedes [#225](https://github.com/cipherstash/encrypt-query-language/pull/225))
- **`eql_v3.int2` encrypted-domain type family.** Four jsonb-backed domains for encrypted `int2` columns — `eql_v3.int2` (storage-only), `eql_v3.int2_eq` (`=` / `<>` via HMAC), and `eql_v3.int2_ord` / `eql_v3.int2_ord_ore` (also `<` `<=` `>` `>=` via ORE block terms, with `MIN` / `MAX` aggregates) — generated from `tasks/codegen/types/int2.toml` by the same materializer as the `eql_v3.int4` reference. Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` extractors, not an operator class on the domain. Why: a type-safe, per-capability encrypted `smallint` column, proving the scalar generator generalizes beyond the `int4` reference. ([#243](https://github.com/cipherstash/encrypt-query-language/pull/243))
- **Per-domain `MIN` / `MAX` aggregates for the encrypted-domain family.** `eql_v3.min(eql_v3.<T>_ord)` / `eql_v3.max(eql_v3.<T>_ord)` (and the `_ord_ore` twin) are generated for every ord-capable scalar variant, giving type-safe extrema on domain-typed columns — comparison routes through the variant's `<` / `>` operator (ORE block term, no decryption). The aggregates are declared `PARALLEL = SAFE` with a combine function (the state function itself — min/max are associative), so PostgreSQL can use partial/parallel aggregation on large `GROUP BY` workloads. Why: the new domain types previously had no equivalent of the composite-type aggregates. The existing `eql_v2.min(eql_v2_encrypted)` / `eql_v2.max(eql_v2_encrypted)` aggregates are **retained** and continue to work on `eql_v2_encrypted` columns; the per-domain aggregates are additive and coexist with them. ([#239](https://github.com/cipherstash/encrypt-query-language/pull/239))

## [2.3.1] — 2026-05-21

Expand Down
7 changes: 4 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ This project uses `mise` for task management. Common commands:
- Run SQLx tests directly: `mise run test:sqlx`
- Run SQLx tests in watch mode: `mise run test:sqlx:watch`
- Tests are located in `tests/sqlx/` using Rust and SQLx framework
- Regenerate the scalar matrix coverage snapshots: `mise run test:matrix:inventory` (no database required). These committed `tests/sqlx/snapshots/<T>_matrix_tests.txt` baselines pin the set of `scalars::<T>::*` test names so a silently dropped/renamed/`#[cfg]`-gated test fails CI's `matrix-coverage` job. When you add or remove matrix tests (or add a scalar type), regenerate and commit the affected snapshot in the same change. See `tests/sqlx/snapshots/README.md`.

### Build System
- Dependencies are resolved using `-- REQUIRE:` comments in SQL files
Expand All @@ -50,7 +51,7 @@ This project uses `mise` for task management. Common commands:
This is the **Encrypt Query Language (EQL)** - a PostgreSQL extension for searchable encryption. Key architectural components:

### Core Structure
- **Schema**: All EQL functions/types are in `eql_v2` PostgreSQL schema
- **Schema**: Core EQL functions/types are in the `eql_v2` PostgreSQL schema. The encrypted-domain type families (`int4` and future scalar domains) live in a separate `eql_v3` schema (see below); they reuse the core `eql_v2` index-term types cross-schema. `eql_v2` is unchanged and remains the documented public API.
- **Main Type**: `eql_v2_encrypted` - composite type for encrypted columns (stored as JSONB)
- **Configuration**: `eql_v2_configuration` table tracks encryption configs
- **Index Types**: Various encrypted index types (blake3, hmac_256, bloom_filter, ore variants)
Expand All @@ -75,7 +76,7 @@ This is the **Encrypt Query Language (EQL)** - a PostgreSQL extension for search

### Encrypted-Domain Types

`src/encrypted_domain/` holds **encrypted-domain type families** — jsonb-backed PostgreSQL domains, one domain per operator/index capability (`eql_v2_<T>` storage-only, `eql_v2_<T>_eq`, `eql_v2_<T>_ord`). `eql_v2_int4` (PR #225) is the reference scalar implementation; future scalar types such as `int8`, `bool`, `date`, `float`, `numeric`, and `timestamp` follow this materializer pattern. `jsonb` needs a separate design and is out of scope for the scalar materializer.
`src/encrypted_domain/` holds **encrypted-domain type families** — jsonb-backed PostgreSQL domains in the **`eql_v3` schema**, one domain per operator/index capability (`eql_v3.<T>` storage-only, `eql_v3.<T>_eq`, `eql_v3.<T>_ord`). The schema qualifier replaces the old version-prefixed name, so the domains are `eql_v3.int4`, `eql_v3.int4_eq`, `eql_v3.int4_ord`, `eql_v3.int4_ord_ore` — created in `eql_v3`, not `public`. Their extractors/wrappers/aggregates (`eql_v3.eq_term`, `eql_v3.ord_term`, `eql_v3.eq`/`lt`/…, `eql_v3.min`/`max`) also live in `eql_v3`, but the index-term types they return and construct (`eql_v2.hmac_256`, `eql_v2.ore_block_u64_8_256`) stay in `eql_v2` and are referenced cross-schema. `eql_v3.int4` (PR #239, supersedes #225) is the reference scalar implementation; future scalar types such as `int8`, `bool`, `date`, `float`, `numeric`, and `timestamp` follow this materializer pattern. `jsonb` needs a separate design and is out of scope for the scalar materializer.

Adding a scalar encrypted-domain type is generated from a minimal manifest at `tasks/codegen/types/<T>.toml`: the filename supplies `<T>`, and the `[domain]` table maps each generated domain name to the fixed index terms it carries. Example: `int4_eq = ["hm"]`, `int4_ord = ["ore"]`. Term capabilities are fixed in `tasks/codegen/terms.py`: `hm` provides equality, and `ore` provides equality plus ordering. `mise run build` regenerates the scalar SQL surface into `src/encrypted_domain/<T>/` from every manifest at the start of every build; that surface includes supported comparison wrappers plus blockers for native `jsonb` operators that would otherwise be reachable through domain fallback. Use `mise run codegen:domain <T>` to refresh a single type manually while iterating on its manifest, or `mise run codegen:domain:all` to regenerate every type at once (the same enumeration `mise run build` uses). The generated `*_types.sql` / `*_functions.sql` / `*_operators.sql` files are gitignored and never committed — the TOML manifest plus `tasks/codegen/terms.py` are the source of truth. Generated files carry an `AUTO-GENERATED — DO NOT EDIT` header; change the manifest or term catalog and rebuild, never hand-edit. Hand-written SQL beyond the fixed surface goes in `src/encrypted_domain/<T>/<T>_extensions.sql` with no auto-generated header and explicit `-- REQUIRE:` edges — that file IS committed. `text` and `jsonb` are out of scope for this scalar materializer.

Expand Down Expand Up @@ -260,7 +261,7 @@ The entry under `Changed` / `Deprecated` should cross-link to the `U-NNN`. See `

### Versioning

The `eql_v2` PostgreSQL schema name is part of the public API and is **independent of the EQL release version**. Major-version bumps to EQL do not rename the schema. When deciding on a version bump:
The `eql_v2` PostgreSQL schema name is part of the public API and is **independent of the EQL release version**. Major-version bumps to EQL do not rename the schema. The `eql_v3` schema is **not** a rename of `eql_v2`: it is a separate, additional schema introduced to namespace the encrypted-domain type families. Both schemas coexist; `eql_v2` keeps the core types/operators and is unchanged. Adding a new schema for a new surface is additive, not a public-API break. When deciding on a version bump:

- **Patch (`2.3.x`)** — bug fixes, no behaviour changes
- **Minor (`2.x.0`)** — additive changes, behaviour changes that don't break the public API (signatures, schema name, payload format, operator names)
Expand Down
35 changes: 23 additions & 12 deletions docs/reference/encrypted-domain-generator.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ The contract those outputs must satisfy is in
[`encrypted-domain-implementation-spec.md`](./encrypted-domain-implementation-spec.md);
this file describes the machine that produces them.

The reference type is `eql_v2_int4` (PR #239). `text` and `jsonb` are
The reference type is `eql_v3.int4` (PR #239). `text` and `jsonb` are
outside scope.

## 1. Why a generator
Expand Down Expand Up @@ -245,7 +245,7 @@ operators are always blockers.
The table above covers `<domain>_functions.sql` only. Ordered domains
additionally emit `<domain>_aggregates.sql` — two state functions
(`min_sfunc`, `max_sfunc`) and two `CREATE AGGREGATE` declarations
(`eql_v2.min`, `eql_v2.max`). Each aggregate declares
(`eql_v3.min`, `eql_v3.max`). Each aggregate declares
`combinefunc = <sfunc>` and `parallel = safe`: min/max are associative, so
the state function doubles as the combine function, enabling partial and
parallel aggregation on large `GROUP BY` ORE workloads with no decryption.
Expand Down Expand Up @@ -280,13 +280,13 @@ incorrect SQL unreachable. Invariants encoded in code:
(`templates.py:46`), which doubles embedded single quotes. Today's catalog
strings are all quote-free so it is a no-op, but it guarantees a future
quote-bearing catalog string cannot break out of its literal.
- **No domain-over-domain.** Every domain is `CREATE DOMAIN ... AS
jsonb`, never `AS <some_other_domain>` (`templates.py:72`). PostgreSQL
- **No domain-over-domain.** Every domain is `CREATE DOMAIN eql_v3.<name>
AS jsonb`, never `AS <some_other_domain>` (`templates.py:72`). PostgreSQL
resolves operators against the underlying base type; a derived domain
would silently bypass the fixed operator surface.
- **No operator class on a domain.** The generator emits operators,
not operator classes. Callers index through the extractor function
(e.g. `USING btree (eql_v2.ord_term(col))`), whose return type
(e.g. `USING btree (eql_v3.ord_term(col))`), whose return type
already carries a default opclass.
- **Ownership boundary.** `writer.is_generated` recognises owned files
by their header line and refuses to overwrite anything else
Expand Down Expand Up @@ -323,7 +323,7 @@ output without per-type edits:
- **`tasks/pin_search_path.sql:265-290`** — structural skip identifies
encrypted-domain functions by language (`sql`), volatility
(`IMMUTABLE`), and the presence of at least one argument typed as a
jsonb-backed `DOMAIN` in `public` named `eql_v2_*`. New scalar types
jsonb-backed `DOMAIN` in the `eql_v3` schema. New scalar types
need no edit here.
- **`tasks/test/splinter.sh`** — name-based allowlist. The converged
wrapper names (`eq`, `neq`, `lt`, `lte`, `gt`, `gte`, `eq_term`,
Expand Down Expand Up @@ -377,12 +377,23 @@ The end-to-end shape from a generator perspective:
4. **Build picks it up automatically** — `tasks/build.sh` regenerates
before computing the `tsort` graph, so the new files appear in the
dependency walk via the `-- REQUIRE:` edges the generator emits.
5. **Baseline & test.** Create a hand-reviewed byte-parity baseline under
`tests/codegen/reference/<token>/` (each file marked `-- REFERENCE:` /
`// REFERENCE:`) so `test_against_reference.py` guards the new type — it
only covers types that have a baseline directory. Then run
`mise run test:codegen`, the relevant SQLx suites, and the PostgreSQL
matrix.
5. **Test.** Do **not** add a `tests/codegen/reference/<token>/` baseline.
`int4` is the sole golden master for the type-generic generator: the SQL
templates are pure token substitution and the only type-specific rendering
is `<token>_values.rs`, so a per-type baseline can only fail where `int4`'s
already would. Drift protection for the new type comes from the `int4`
reference (shared templates + `terms.py`), the committed `<token>_values.rs`
const guarded by the codegen staleness check, the `<token>` cases in
`test_scalars.py`, and the `ordered_numeric_matrix!` SQLx suite (behaviour,
not bytes). Run `mise run test:codegen`, the relevant SQLx suites, and the
PostgreSQL matrix.
6. **Snapshot the matrix inventory.** Run `mise run test:matrix:inventory`
and commit the new `tests/sqlx/snapshots/<token>_matrix_tests.txt` — the
sorted list of the type's `scalars::<token>::*` test names. CI's
`matrix-coverage` job `git diff --exit-code`s it (like `<token>_values.rs`)
to catch a silently dropped or renamed matrix test. The snapshot is a
committed test baseline, not gitignored generated SQL. See
`tests/sqlx/snapshots/README.md`.

Adding a new **term** is a bigger move — edit `terms.py`, add tests,
audit `splinter.sh` for a name collision, and update the reference
Expand Down
Loading