Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
05e6b81
refactor(codegen): target eql_v3 schema and src/v3/scalars output paths
tobyhede Jun 3, 2026
cf514f2
test(codegen): regenerate int4 golden for the eql_v3 schema + src/v3 …
tobyhede Jun 3, 2026
44d2dba
feat(v3): relocate schema and fork crypto/common into src/v3 (D7, D8)
tobyhede Jun 3, 2026
f2f43c9
feat(v3): add self-contained eql_v3.hmac_256 SEM type (jsonb-only)
tobyhede Jun 3, 2026
cf7ce98
feat(v3): add self-contained eql_v3.ore_block_u64_8_256 SEM type (jso…
tobyhede Jun 3, 2026
bbf09cc
feat(v3): move shared blocker to src/v3/scalars; remove src/encrypted…
tobyhede Jun 3, 2026
0ae5c10
build(v3): emit self-contained release/cipherstash-encrypt-v3.sql var…
tobyhede Jun 3, 2026
6e4cff2
fix(v3): keep eql_v3 SEM ore_block/hmac_256 inlinable in the combined…
tobyhede Jun 3, 2026
94d0177
test(v3): add self-containment gate (symbol + file + artifact) and CI…
tobyhede Jun 3, 2026
ae2bac5
test(v3): assert the v3 artifact is self-contained and v2-decoupled
tobyhede Jun 3, 2026
15370c8
test(v3): clean-DB install + functional-index smoke (proves D11, D4)
tobyhede Jun 3, 2026
fa091cd
docs(v3): document the self-contained eql_v3 schema and v3-only insta…
tobyhede Jun 3, 2026
eacf669
test(v3): update family mutation tests + drop-opclass fixture for the…
tobyhede Jun 3, 2026
973c5fe
refactor(codegen): extract v3 path constants, drop dead Term::returns…
tobyhede Jun 3, 2026
0717d16
test(v3): direct coverage for the eql_v3 ORE/HMAC SEM functions
tobyhede Jun 3, 2026
9b9557d
test(v3): harden self-containment checks to also reject bare eql_v2_ …
tobyhede Jun 3, 2026
19fe049
refactor(codegen): collapse CORE/DOMAIN schema constants into a singl…
tobyhede Jun 3, 2026
ba37472
fix(v3): address code-review findings on stale docs/comments + test h…
tobyhede Jun 3, 2026
fb98a09
docs(v3): add missing @param/@return tags to ore_block_u64_8_256 oper…
tobyhede Jun 3, 2026
eafaa17
style(test): rustfmt encrypted_domain/family/sem.rs
tobyhede Jun 3, 2026
567c5bb
chore: quote echo var in tasks/build.sh
tobyhede Jun 3, 2026
0bcf821
fix(v3): address PR #255 review feedback — SQL inlining, stale docs, …
tobyhede Jun 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions .github/workflows/test-eql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,34 @@ jobs:
run: |
mise run codegen:parity

self-contained-v3:
name: "eql_v3 self-containment"
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
persist-credentials: false

- uses: jdx/mise-action@1648a7812b9aeae629881980618f079932869151 # v4
with:
version: 2026.4.0
install: true
cache: true

- uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
with:
workspaces: .
shared-key: sqlx-tests

# Build to materialise release/cipherstash-encrypt-v3.sql and
# src/deps-ordered-v3.txt, then assert no eql_v2 symbol/file leakage.
- name: Build EQL
run: mise run --force build

- name: Assert eql_v3 is self-contained
run: mise run test:self_contained_v3

matrix-coverage:
name: "Matrix coverage inventory"
runs-on: ubuntu-latest
Expand Down Expand Up @@ -216,6 +244,11 @@ jobs:
rustup component add --toolchain ${active_rust_toolchain} rustfmt clippy
mise run --output prefix test --postgres ${POSTGRES_VERSION}

- name: Clean-DB v3 install smoke (Postgres ${{ matrix.postgres-version }})
run: |
mise run build
mise run test:clean_install_v3

splinter:
name: "Supabase splinter"
runs-on: ubuntu-latest-m
Expand Down
17 changes: 10 additions & 7 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ deps-ordered.txt
deps-supabase.txt
deps-ordered-supabase.txt

src/deps-v3.txt
src/deps-ordered-v3.txt

# Based on https://raw.githubusercontent.com/github/gitignore/main/Node.gitignore

src/version.sql
Expand Down Expand Up @@ -223,13 +226,13 @@ tests/sqlx/migrations/001_install_eql.sql
# never commit — stale fixtures hide bugs)
tests/sqlx/fixtures/eql_v2*

# Generated encrypted-domain SQL — regenerated by `tasks/build.sh` from
# tasks/codegen/types/<T>.toml on every build (or `mise run codegen:domain
# <T>` to refresh manually). Hand-written *_extensions.sql stays committed.
src/encrypted_domain/*/*_types.sql
src/encrypted_domain/*/*_functions.sql
src/encrypted_domain/*/*_operators.sql
src/encrypted_domain/*/*_aggregates.sql
# Generated encrypted-domain SQL — regenerated by `tasks/build.sh` from the
# eql-scalars::CATALOG via `cargo run -p eql-codegen` on every build. The
# hand-written src/v3/scalars/functions.sql (no type subdir) stays committed.
src/v3/scalars/*/*_types.sql
src/v3/scalars/*/*_functions.sql
src/v3/scalars/*/*_operators.sql
src/v3/scalars/*/*_aggregates.sql

# Large generated test data files
tests/ste_vec_vast.sql
Expand Down
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,10 @@ Each entry that ships in a published release links to the PR that introduced it.

### Added

- **`eql_v3` encrypted-domain schema, with the `int4` family as its first member.** Encrypted-domain type families now live in a new, additional `eql_v3` schema (the existing `eql_v2` schema is unchanged — it keeps the core types/operators and stays the documented public API). Four jsonb-backed domains for encrypted `int4` columns: `eql_v3.int4` (storage-only), `eql_v3.int4_eq` (`=` / `<>` via HMAC), and `eql_v3.int4_ord` / `eql_v3.int4_ord_ore` (also `<` `<=` `>` `>=` via ORE block terms). Supported comparisons resolve to inlinable wrappers; the native `jsonb` operator surface reachable through domain fallback is blocked (raises rather than silently mis-resolving). Each domain's `CHECK` requires the EQL envelope (`v`, `i`), the ciphertext (`c`), and the variant's index term(s), and pins the payload version (`VALUE->>'v' = '2'`, matching `eql_v2._encrypted_check_v`) — so a missing key or wrong-version payload is rejected on insert or cast rather than surfacing later at query time. Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` extractors, not an operator class on the domain. The extractors still return the core `eql_v2.hmac_256` / `eql_v2.ore_block_u64_8_256` index-term types, which remain in `eql_v2` and are referenced cross-schema. Why: a type-safe, per-capability encrypted integer column instead of the untyped `eql_v2_encrypted`, namespaced under its own schema. This is the reference scalar implementation for the generated domain family. ([#239](https://github.com/cipherstash/encrypt-query-language/pull/239), supersedes [#225](https://github.com/cipherstash/encrypt-query-language/pull/225))
- **`eql_v3` encrypted-domain schema, with the `int4` family as its first member.** Encrypted-domain type families now live in a new, additional `eql_v3` schema (the existing `eql_v2` schema is unchanged — it keeps the core types/operators and stays the documented public API). Four jsonb-backed domains for encrypted `int4` columns: `eql_v3.int4` (storage-only), `eql_v3.int4_eq` (`=` / `<>` via HMAC), and `eql_v3.int4_ord` / `eql_v3.int4_ord_ore` (also `<` `<=` `>` `>=` via ORE block terms). Supported comparisons resolve to inlinable wrappers; the native `jsonb` operator surface reachable through domain fallback is blocked (raises rather than silently mis-resolving). Each domain's `CHECK` requires the EQL envelope (`v`, `i`), the ciphertext (`c`), and the variant's index term(s), and pins the payload version (`VALUE->>'v' = '2'`, matching `eql_v2._encrypted_check_v`) — so a missing key or wrong-version payload is rejected on insert or cast rather than surfacing later at query time. Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` extractors, not an operator class on the domain. The extractors return the searchable-encrypted-metadata index-term types `eql_v3.hmac_256` / `eql_v3.ore_block_u64_8_256`, which `eql_v3` owns directly (see the self-contained `eql_v3` schema entry below). Why: a type-safe, per-capability encrypted integer column instead of the untyped `eql_v2_encrypted`, namespaced under its own schema. This is the reference scalar implementation for the generated domain family. ([#239](https://github.com/cipherstash/encrypt-query-language/pull/239), supersedes [#225](https://github.com/cipherstash/encrypt-query-language/pull/225))
- **`eql_v3.int2` encrypted-domain type family.** Four jsonb-backed domains for encrypted `int2` columns — `eql_v3.int2` (storage-only), `eql_v3.int2_eq` (`=` / `<>` via HMAC), and `eql_v3.int2_ord` / `eql_v3.int2_ord_ore` (also `<` `<=` `>` `>=` via ORE block terms, with `MIN` / `MAX` aggregates) — generated from the `int2` row in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.int4` reference. Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` extractors, not an operator class on the domain. Why: a type-safe, per-capability encrypted `smallint` column, proving the scalar generator generalizes beyond the `int4` reference. ([#243](https://github.com/cipherstash/encrypt-query-language/pull/243))
- **Per-domain `MIN` / `MAX` aggregates for the encrypted-domain family.** `eql_v3.min(eql_v3.<T>_ord)` / `eql_v3.max(eql_v3.<T>_ord)` (and the `_ord_ore` twin) are generated for every ord-capable scalar variant, giving type-safe extrema on domain-typed columns — comparison routes through the variant's `<` / `>` operator (ORE block term, no decryption). The aggregates are declared `PARALLEL = SAFE` with a combine function (the state function itself — min/max are associative), so PostgreSQL can use partial/parallel aggregation on large `GROUP BY` workloads. Why: the new domain types previously had no equivalent of the composite-type aggregates. The existing `eql_v2.min(eql_v2_encrypted)` / `eql_v2.max(eql_v2_encrypted)` aggregates are **retained** and continue to work on `eql_v2_encrypted` columns; the per-domain aggregates are additive and coexist with them. ([#239](https://github.com/cipherstash/encrypt-query-language/pull/239))
- **Self-contained `eql_v3` schema + standalone `release/cipherstash-encrypt-v3.sql` installer.** The `eql_v3` encrypted-domain surface no longer depends on `eql_v2` at runtime: it now owns its own copies of the searchable-encrypted-metadata (SEM) index-term types — `eql_v3.hmac_256` and `eql_v3.ore_block_u64_8_256` (with its btree operator class) — so the `eql_v3.eq_term` / `eql_v3.ord_term` extractors return `eql_v3` types and no `eql_v2.<symbol>` appears anywhere in the v3 SQL. The whole v3 surface relocated under a single `src/v3/` tree (`src/v3/sem/` for the hand-written SEM types, `src/v3/scalars/` for the generated domain families). A new build variant ships the `eql_v3` schema on its own as `release/cipherstash-encrypt-v3.sql`, installable into a database with no `eql_v2` present; a CI gate greps that artifact and its dependency closure to keep it `eql_v2`-free. Why: a clean foundation for the per-scalar encrypted-domain model to stand alone, ahead of it replacing the `eql_v2_encrypted` composite column type. This is additive — a new schema and a new artifact — and leaves `eql_v2` byte-for-byte unchanged. ([#255](https://github.com/cipherstash/encrypt-query-language/pull/255))

### Changed

Expand Down
10 changes: 6 additions & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ This project uses `mise` for task management. Common commands:
- `cipherstash-encrypt.sql` - Main installer
- `cipherstash-encrypt-supabase.sql` - Supabase-compatible (excludes operator classes)
- `cipherstash-encrypt-protect.sql` - ProtectJS variant (excludes config management)
- `cipherstash-encrypt-v3.sql` - Standalone, self-contained `eql_v3` surface only (globbed from `src/v3` alone; no `eql_v2`, installable into a DB with no `eql_v2` present)
- Corresponding uninstallers for each variant

#### Build Variants
Expand All @@ -45,13 +46,14 @@ This project uses `mise` for task management. Common commands:
| Main | Nothing | Full EQL with all features |
| Supabase | Operator classes | Supabase compatibility |
| Protect | `src/config/*`, `src/encryptindex/*` | ProtectJS (no database-side config) |
| v3-only | Everything outside `src/v3` (and `pin_search_path.sql`) | Self-contained `eql_v3` surface, `eql_v2`-free (gated by `mise run test:self_contained_v3`) |

## Project Architecture

This is the **Encrypt Query Language (EQL)** - a PostgreSQL extension for searchable encryption. Key architectural components:

### Core Structure
- **Schema**: Core EQL functions/types are in the `eql_v2` PostgreSQL schema. The encrypted-domain type families (`int4` and future scalar domains) live in a separate `eql_v3` schema (see below); they reuse the core `eql_v2` index-term types cross-schema. `eql_v2` is unchanged and remains the documented public API.
- **Schema**: Core EQL functions/types are in the `eql_v2` PostgreSQL schema. The encrypted-domain type families (`int4` and future scalar domains) live in a separate `eql_v3` schema (see below). The `eql_v3` surface is **self-contained**: it owns its own copies of the searchable-encrypted-metadata (SEM) index-term types (`eql_v3.hmac_256`, `eql_v3.ore_block_u64_8_256`, hand-written under `src/v3/sem/`) and has no runtime dependency on `eql_v2`. `eql_v2` is unchanged and remains the documented public API.
- **Main Type**: `eql_v2_encrypted` - composite type for encrypted columns (stored as JSONB)
- **Configuration**: `eql_v2_configuration` table tracks encryption configs
- **Index Types**: Various encrypted index types (blake3, hmac_256, bloom_filter, ore variants)
Expand All @@ -62,7 +64,7 @@ This is the **Encrypt Query Language (EQL)** - a PostgreSQL extension for search
- `src/operators/` - SQL operators for encrypted data comparisons
- `src/config/` - Configuration management functions
- `src/blake3/`, `src/hmac_256/`, `src/bloom_filter/`, `src/ore_*` - Index implementations
- `src/encrypted_domain/` - Encrypted-domain type families (jsonb-backed PostgreSQL domains, one per operator/index capability)
- `src/v3/` - Self-contained `eql_v3` surface: `src/v3/schema.sql`, forked `src/v3/crypto.sql` / `src/v3/common.sql`, hand-written SEM index-term types under `src/v3/sem/` (`hmac_256`, `ore_block_u64_8_256`), and the generated scalar encrypted-domain families under `src/v3/scalars/<T>/` (plus the shared blocker `src/v3/scalars/functions.sql`)
- `tasks/` - mise task scripts
- `tests/sqlx/` - Rust/SQLx test framework (PostgreSQL 14-17 support)
- `release/` - Generated SQL installation files
Expand All @@ -76,9 +78,9 @@ This is the **Encrypt Query Language (EQL)** - a PostgreSQL extension for search

### Encrypted-Domain Types

`src/encrypted_domain/` holds **encrypted-domain type families** — jsonb-backed PostgreSQL domains in the **`eql_v3` schema**, one domain per operator/index capability (`eql_v3.<T>` storage-only, `eql_v3.<T>_eq`, `eql_v3.<T>_ord`). The schema qualifier replaces the old version-prefixed name, so the domains are `eql_v3.int4`, `eql_v3.int4_eq`, `eql_v3.int4_ord`, `eql_v3.int4_ord_ore` — created in `eql_v3`, not `public`. Their extractors/wrappers/aggregates (`eql_v3.eq_term`, `eql_v3.ord_term`, `eql_v3.eq`/`lt`/…, `eql_v3.min`/`max`) also live in `eql_v3`, but the index-term types they return and construct (`eql_v2.hmac_256`, `eql_v2.ore_block_u64_8_256`) stay in `eql_v2` and are referenced cross-schema. `eql_v3.int4` (PR #239, supersedes #225) is the reference scalar implementation; future scalar types such as `int8`, `bool`, `date`, `float`, `numeric`, `timestamp`, `text`, and `jsonb` follow this materializer pattern. `text`, `numeric`, and `jsonb` are planned but have no generated SQL surface yet — `jsonb` in particular needs a separate SQL design beyond the ordered-scalar materializer. The `eql-scalars` fixture catalog (`crates/eql-scalars`) already models their fixture values ahead of the SQL surface.
`src/v3/scalars/` holds the generated **encrypted-domain type families** — jsonb-backed PostgreSQL domains in the **`eql_v3` schema**, one domain per operator/index capability (`eql_v3.<T>` storage-only, `eql_v3.<T>_eq`, `eql_v3.<T>_ord`). The schema qualifier replaces the old version-prefixed name, so the domains are `eql_v3.int4`, `eql_v3.int4_eq`, `eql_v3.int4_ord`, `eql_v3.int4_ord_ore` — created in `eql_v3`, not `public`. Their extractors/wrappers/aggregates (`eql_v3.eq_term`, `eql_v3.ord_term`, `eql_v3.eq`/`lt`/…, `eql_v3.min`/`max`) also live in `eql_v3`, and the SEM index-term types they return and construct (`eql_v3.hmac_256`, `eql_v3.ore_block_u64_8_256`) are **also `eql_v3`** — hand-written under `src/v3/sem/` so the whole v3 surface is self-contained (no `eql_v2.<symbol>` appears anywhere in v3 SQL; CI gates this via `mise run test:self_contained_v3` and the standalone `release/cipherstash-encrypt-v3.sql` installer). `eql_v3.int4` (PR #239, supersedes #225) is the reference scalar implementation; future scalar types such as `int8`, `bool`, `date`, `float`, `numeric`, `timestamp`, `text`, and `jsonb` follow this materializer pattern. `text`, `numeric`, and `jsonb` are planned but have no generated SQL surface yet — `jsonb` in particular needs a separate SQL design beyond the ordered-scalar materializer. The `eql-scalars` fixture catalog (`crates/eql-scalars`) already models their fixture values ahead of the SQL surface.

Adding a scalar encrypted-domain type is one row in the Rust catalog `eql-scalars::CATALOG` (`crates/eql-scalars/src/lib.rs`): a `ScalarSpec` giving the type `token` (e.g. `int8`), its `ScalarKind` (the `kind` field), the `DomainSpec`s mapping each generated domain suffix to its fixed index `Term`s (`_eq => [Hm]`, `_ord`/`_ord_ore => [Ore]`), and the `Fixture` value list. Term capabilities are fixed in the `Term` enum's `impl` methods (with unit tests): `Hm` provides equality, and `Ore` provides equality plus ordering. There is no TOML manifest and no Python — the catalog is the source of truth, validated by the compiler (an undefined term or unknown scalar is a compile error) plus catalog `#[test]`s. `mise run build` runs `cargo run -p eql-codegen`, which regenerates the scalar SQL surface into `src/encrypted_domain/<T>/` from `CATALOG` at the start of every build; that surface includes supported comparison wrappers plus blockers for native `jsonb` operators that would otherwise be reachable through domain fallback. `cargo run -p eql-codegen` regenerates every type at once (the same call `mise run build` uses; there is no per-type codegen task). The generated `*_types.sql` / `*_functions.sql` / `*_operators.sql` / `*_aggregates.sql` files are gitignored and never committed. The per-type plaintext fixture lists the SQLx matrix consumes are **not** a generated file — they are materialised from each `CATALOG` row at compile time as `eql_scalars::INT4_VALUES` / `INT2_VALUES` (the `int_values!` macro) and read directly by `ScalarType::FIXTURE_VALUES`; a Rust source of truth no longer round-trips through a committed generated `.rs`. Generated SQL carries a `-- AUTOMATICALLY GENERATED FILE` header (the project-wide marker `docs:validate` greps on); change the catalog and rebuild, never hand-edit. Hand-written SQL beyond the fixed surface goes in `src/encrypted_domain/<T>/<T>_extensions.sql` with no auto-generated header and explicit `-- REQUIRE:` edges — that file IS committed. `text` and `jsonb` are out of scope for this scalar materializer.
Adding a scalar encrypted-domain type is one row in the Rust catalog `eql-scalars::CATALOG` (`crates/eql-scalars/src/lib.rs`): a `ScalarSpec` giving the type `token` (e.g. `int8`), its `ScalarKind` (the `kind` field), the `DomainSpec`s mapping each generated domain suffix to its fixed index `Term`s (`_eq => [Hm]`, `_ord`/`_ord_ore => [Ore]`), and the `Fixture` value list. Term capabilities are fixed in the `Term` enum's `impl` methods (with unit tests): `Hm` provides equality, and `Ore` provides equality plus ordering. There is no TOML manifest and no Python — the catalog is the source of truth, validated by the compiler (an undefined term or unknown scalar is a compile error) plus catalog `#[test]`s. `mise run build` runs `cargo run -p eql-codegen`, which regenerates the scalar SQL surface into `src/v3/scalars/<T>/` from `CATALOG` at the start of every build; that surface includes supported comparison wrappers plus blockers for native `jsonb` operators that would otherwise be reachable through domain fallback. `cargo run -p eql-codegen` regenerates every type at once (the same call `mise run build` uses; there is no per-type codegen task). The generated `*_types.sql` / `*_functions.sql` / `*_operators.sql` / `*_aggregates.sql` files are gitignored and never committed. The per-type plaintext fixture lists the SQLx matrix consumes are **not** a generated file — they are materialised from each `CATALOG` row at compile time as `eql_scalars::INT4_VALUES` / `INT2_VALUES` (the `int_values!` macro) and read directly by `ScalarType::FIXTURE_VALUES`; a Rust source of truth no longer round-trips through a committed generated `.rs`. Generated SQL carries a `-- AUTOMATICALLY GENERATED FILE` header (the project-wide marker `docs:validate` greps on); change the catalog and rebuild, never hand-edit. Hand-written SQL beyond the fixed surface goes in `src/v3/scalars/<T>/<T>_extensions.sql` with no auto-generated header and explicit `-- REQUIRE:` edges — that file IS committed. `text` and `jsonb` are out of scope for this scalar materializer.

**Adding a new encrypted-domain type: follow `docs/reference/adding-a-scalar-encrypted-domain-type.md`.** The mechanics are fixed for ordered scalar domains; the catalog row only declares the token, kind, domain suffixes, and terms. New term behavior belongs in the `Term` enum's `impl` methods in `crates/eql-scalars/src` with tests, not in free-form catalog data.

Expand Down
Loading