Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 91 additions & 9 deletions .github/workflows/test-eql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,21 +29,26 @@ defaults:
run:
shell: bash -l {0}

permissions:
contents: read

jobs:
schema:
name: "JSON Schema validation"
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v6
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
persist-credentials: false

- uses: jdx/mise-action@v4
- uses: jdx/mise-action@1648a7812b9aeae629881980618f079932869151 # v4
with:
version: 2026.4.0
install: true
cache: true

- uses: Swatinem/rust-cache@v2
- uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
with:
workspaces: tests/sqlx
shared-key: sqlx-tests
Expand All @@ -52,10 +57,76 @@ jobs:
run: |
mise run test:schema

codegen:
name: "Encrypted-domain codegen"
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
persist-credentials: false

- uses: jdx/mise-action@1648a7812b9aeae629881980618f079932869151 # v4
with:
version: 2026.4.0
install: true
cache: true

Comment thread
tobyhede marked this conversation as resolved.
- name: Run codegen generator + drift tests
run: |
mise run test:codegen

# Regenerate the committed Rust fixture-value consts for EVERY type from
# their manifests and fail if any differ from / are missing in the tree.
# The value lists are rendered deterministically (unlike the encrypted
# .sql fixtures, whose ciphertext is non-deterministic and gitignored), so
# a plain diff is the right guard — it catches a manifest edit that wasn't
# regenerated. `git add -N` registers any brand-new untracked const so a
# forgotten-to-commit file also trips the diff. No Postgres needed: this
# only runs the Python generator.
- name: Regenerate and verify fixture-value consts (all types)
run: |
mise run codegen:domain:all
git add -N tests/sqlx/src/fixtures
git diff --exit-code -- tests/sqlx/src/fixtures \
|| { echo "Fixture value const(s) stale or uncommitted — run 'mise run codegen:domain:all' and commit tests/sqlx/src/fixtures."; exit 1; }

matrix-coverage:
name: "Matrix coverage inventory"
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
persist-credentials: false

- uses: jdx/mise-action@1648a7812b9aeae629881980618f079932869151 # v4
with:
version: 2026.4.0
install: true
cache: true

- uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
with:
workspaces: tests/sqlx
shared-key: sqlx-tests

# Regenerate the matrix test-name inventory with the SAME pinned feature
# set the local task uses (`--no-default-features`, scale excluded), then
# fail if it differs from the committed snapshot. A coverage change shows
# up as added/removed names in the PR diff — e.g. emptying `ord_domains`
# drops ~140 names, impossible to miss in review. No Postgres needed:
# `--list` only enumerates, the suite uses runtime queries.
- name: Regenerate and verify the matrix test-name inventory
run: |
mise run test:matrix:inventory
git diff --exit-code -- tests/sqlx/snapshots/int4_matrix_tests.txt \
|| { echo "Coverage inventory stale — run 'mise run test:matrix:inventory' and commit."; exit 1; }

test:
name: "Test & Validate EQL (Postgres ${{ matrix.postgres-version }})"
runs-on: ubuntu-latest-m
needs: schema
needs: [schema, codegen]

strategy:
fail-fast: false
Expand All @@ -64,19 +135,28 @@ jobs:

env:
POSTGRES_VERSION: ${{ matrix.postgres-version }}
# CS_* are required for `mise run test:sqlx` to regenerate the
# cipherstash-client-encrypted fixtures before the suite runs.
# This repository does not accept fork PRs, so the secrets-on-
# `pull_request` constraint that breaks the fork CI flow does not
# apply here — leave the env block unconditional.
CS_CLIENT_ACCESS_KEY: ${{ secrets.CS_CLIENT_ACCESS_KEY }}
CS_WORKSPACE_CRN: ${{ secrets.CS_WORKSPACE_CRN }}
CS_CLIENT_ID: ${{ secrets.CS_CLIENT_ID }}
CS_CLIENT_KEY: ${{ secrets.CS_CLIENT_KEY }}

steps:
- uses: actions/checkout@v6
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
persist-credentials: false

- uses: jdx/mise-action@v4
- uses: jdx/mise-action@1648a7812b9aeae629881980618f079932869151 # v4
with:
version: 2026.4.0
install: true # [default: true] run `mise install`
cache: true # [default: true] cache mise using GitHub's cache

- uses: Swatinem/rust-cache@v2
- uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
with:
workspaces: tests/sqlx
shared-key: sqlx-tests
Expand All @@ -103,9 +183,11 @@ jobs:
POSTGRES_VERSION: "17"

steps:
- uses: actions/checkout@v6
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
persist-credentials: false

- uses: jdx/mise-action@v4
- uses: jdx/mise-action@1648a7812b9aeae629881980618f079932869151 # v4
with:
version: 2026.4.0
install: true
Expand Down
10 changes: 9 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,15 @@ tests/sqlx/migrations/001_install_eql.sql

# Generated SQLx fixtures (regenerated via `mise run fixture:generate`,
# never commit — stale fixtures hide bugs)
tests/sqlx/fixtures/eql_v2_int4.sql
tests/sqlx/fixtures/eql_v2*

# Generated encrypted-domain SQL — regenerated by `tasks/build.sh` from
# tasks/codegen/types/<T>.toml on every build (or `mise run codegen:domain
# <T>` to refresh manually). Hand-written *_extensions.sql stays committed.
src/encrypted_domain/*/*_types.sql
src/encrypted_domain/*/*_functions.sql
src/encrypted_domain/*/*_operators.sql
src/encrypted_domain/*/*_aggregates.sql

# Large generated test data files
tests/ste_vec_vast.sql
Expand Down
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,11 @@ Each entry that ships in a published release links to the PR that introduced it.

## [Unreleased]

### Added

- **`eql_v2_int4` encrypted-domain type family.** Four jsonb-backed domains for encrypted `int4` columns: `eql_v2_int4` (storage-only), `eql_v2_int4_eq` (`=` / `<>` via HMAC), and `eql_v2_int4_ord` / `eql_v2_int4_ord_ore` (also `<` `<=` `>` `>=` via ORE block terms). Supported comparisons resolve to inlinable wrappers; the native `jsonb` operator surface reachable through domain fallback is blocked (raises rather than silently mis-resolving). Each domain's `CHECK` requires the EQL envelope (`v`, `i`), the ciphertext (`c`), and the variant's index term(s), and pins the payload version (`VALUE->>'v' = '2'`, matching `eql_v2._encrypted_check_v`) — so a missing key or wrong-version payload is rejected on insert or cast rather than surfacing later at query time. Index via a functional index on the `eql_v2.eq_term` / `eql_v2.ord_term` extractors, not an operator class on the domain. Why: a type-safe, per-capability encrypted integer column instead of the untyped `eql_v2_encrypted`. This is the reference scalar implementation for the generated domain family. ([#239](https://github.com/cipherstash/encrypt-query-language/pull/239), supersedes [#225](https://github.com/cipherstash/encrypt-query-language/pull/225))
- **Per-domain `MIN` / `MAX` aggregates for the encrypted-domain family.** `eql_v2.min(eql_v2_<T>_ord)` / `eql_v2.max(eql_v2_<T>_ord)` (and the `_ord_ore` twin) are generated for every ord-capable scalar variant, giving type-safe extrema on domain-typed columns — comparison routes through the variant's `<` / `>` operator (ORE block term, no decryption). The aggregates are declared `PARALLEL = SAFE` with a combine function (the state function itself — min/max are associative), so PostgreSQL can use partial/parallel aggregation on large `GROUP BY` workloads. Why: the new domain types previously had no equivalent of the composite-type aggregates. The existing `eql_v2.min(eql_v2_encrypted)` / `eql_v2.max(eql_v2_encrypted)` aggregates are **retained** and continue to work on `eql_v2_encrypted` columns; the per-domain aggregates are additive and coexist with them. ([#239](https://github.com/cipherstash/encrypt-query-language/pull/239))

## [2.3.1] — 2026-05-21

### Fixed
Expand Down
23 changes: 22 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ This is the **Encrypt Query Language (EQL)** - a PostgreSQL extension for search
- `src/operators/` - SQL operators for encrypted data comparisons
- `src/config/` - Configuration management functions
- `src/blake3/`, `src/hmac_256/`, `src/bloom_filter/`, `src/ore_*` - Index implementations
- `src/encrypted_domain/` - Encrypted-domain type families (jsonb-backed PostgreSQL domains, one per operator/index capability)
- `tasks/` - mise task scripts
- `tests/sqlx/` - Rust/SQLx test framework (PostgreSQL 14-17 support)
- `release/` - Generated SQL installation files
Expand All @@ -72,6 +73,25 @@ This is the **Encrypt Query Language (EQL)** - a PostgreSQL extension for search
- **Operators**: Support comparisons between encrypted and plain JSONB data
- **CipherStash Proxy**: Required for encryption/decryption operations

### Encrypted-Domain Types

`src/encrypted_domain/` holds **encrypted-domain type families** — jsonb-backed PostgreSQL domains, one domain per operator/index capability (`eql_v2_<T>` storage-only, `eql_v2_<T>_eq`, `eql_v2_<T>_ord`). `eql_v2_int4` (PR #225) is the reference scalar implementation; future scalar types such as `int8`, `bool`, `date`, `float`, `numeric`, and `timestamp` follow this materializer pattern. `jsonb` needs a separate design and is out of scope for the scalar materializer.

Adding a scalar encrypted-domain type is generated from a minimal manifest at `tasks/codegen/types/<T>.toml`: the filename supplies `<T>`, and the `[domain]` table maps each generated domain name to the fixed index terms it carries. Example: `int4_eq = ["hm"]`, `int4_ord = ["ore"]`. Term capabilities are fixed in `tasks/codegen/terms.py`: `hm` provides equality, and `ore` provides equality plus ordering. `mise run build` regenerates the scalar SQL surface into `src/encrypted_domain/<T>/` from every manifest at the start of every build; that surface includes supported comparison wrappers plus blockers for native `jsonb` operators that would otherwise be reachable through domain fallback. Use `mise run codegen:domain <T>` to refresh a single type manually while iterating on its manifest, or `mise run codegen:domain:all` to regenerate every type at once (the same enumeration `mise run build` uses). The generated `*_types.sql` / `*_functions.sql` / `*_operators.sql` files are gitignored and never committed — the TOML manifest plus `tasks/codegen/terms.py` are the source of truth. Generated files carry an `AUTO-GENERATED — DO NOT EDIT` header; change the manifest or term catalog and rebuild, never hand-edit. Hand-written SQL beyond the fixed surface goes in `src/encrypted_domain/<T>/<T>_extensions.sql` with no auto-generated header and explicit `-- REQUIRE:` edges — that file IS committed. `text` and `jsonb` are out of scope for this scalar materializer.

**Adding a new encrypted-domain type: follow `docs/reference/encrypted-domain-implementation-spec.md`.** The mechanics are fixed for ordered scalar domains; the manifest only declares domain names and terms. New term behavior belongs in `tasks/codegen/terms.py` with tests, not in free-form TOML fields.

Regeneration is deterministic: identical manifest + term catalog produce byte-identical SQL. If `mise run build` produces unexpected output, the change is in the manifest, `tasks/codegen/terms.py`, or `tasks/codegen/templates.py` — not in random run-to-run variation.

Footguns the spec exists to prevent:

- **Blockers must never be `STRICT`.** A `STRICT` blocker lets PostgreSQL skip the body and return `NULL` on a `NULL` argument, silently bypassing the "operator not supported" exception.
- **No domain-over-domain** (`CREATE DOMAIN a AS b`). Operators resolve against the ultimate base type (`jsonb`), so a derived domain does not inherit the base domain's operator surface — blockers stop engaging.
- **No operator class on a domain.** Index through a functional index on the extractor (`eq_term` / `ord_term`), whose return type already carries a default opclass.
- **Inlinable functions** (extractors, comparison wrappers) need `LANGUAGE sql`, a single-statement `SELECT`, `IMMUTABLE`, and **no `SET` clause** — a pinned `search_path` disables inlining. No per-type allowlist edit: the `pin_search_path.sql` structural rule recognises encrypted-domain functions intrinsically and `tasks/test/splinter.sh` covers the converged extractor/wrapper names.
- **Blockers must be `LANGUAGE plpgsql`, not `LANGUAGE sql`.** The inverse of the rule above. A blocker exists to always raise, but a `LANGUAGE sql` body is inlinable and the planner can elide the call when the result is provably unused (dead `CASE` branch, folded predicate). `LANGUAGE plpgsql` is opaque to the planner, so the call — and its `RAISE` — survives. The generator in `tasks/codegen/templates.py` enforces this; don't "simplify" the rendered blockers to `LANGUAGE sql` even though the body is a single expression.
- **Build with `mise run clean && mise run build`** — a bare build can leave stale `release/*.sql`.

### Testing Infrastructure
- Tests are written in Rust using SQLx, located in `tests/sqlx/`
- Tests run against PostgreSQL 14, 15, 16, 17 using Docker containers
Expand Down Expand Up @@ -199,6 +219,7 @@ Prefer `LANGUAGE SQL` over `LANGUAGE plpgsql` unless you need procedural feature
- Exception handling (`BEGIN...EXCEPTION...END`)
- Complex control flow (loops, early returns)
- Dynamic SQL (`EXECUTE`)
- Functions that must remain opaque to the planner — typically blockers whose only job is to `RAISE`. `LANGUAGE sql` would be inlined and may be elided when the result is provably unused; `LANGUAGE plpgsql` is never inlined, so the body always runs. See the encrypted-domain footgun list above and the blocker renderers in `tasks/codegen/templates.py`.

## Release & changelog discipline

Expand All @@ -222,7 +243,7 @@ What does *not* need an entry:

Pick the right section (`Added` / `Changed` / `Deprecated` / `Removed` / `Fixed` / `Security`). Lead with the user-visible fact, then a short "Why." explanation, then a PR link in parentheses. Match the tone and density of existing entries — a single dense paragraph per entry, not a bullet list.

Example shape (real entry from `2.3.0`):
Example entry (real entry from `2.3.0`):

> **`=`, `<>`, `~~` (`LIKE`), `~~*` (`ILIKE`) on `eql_v2_encrypted` are now inlinable SQL functions.** The planner can structurally match these operators against the documented functional indexes (`eql_v2.hmac_256(col)` for equality, `eql_v2.bloom_filter(col)` for `LIKE`/`ILIKE`), so bare-form queries (`WHERE col = $1`) engage the index without per-query rewriting. Previously these operators wrapped multi-branch PL/pgSQL bodies that the planner could not inline, forcing seq scans on Supabase / managed Postgres installations that lack operator-class indexes. ([#193](...), [#196](...))

Expand Down
7 changes: 0 additions & 7 deletions docs/development/documentation-inventory.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,13 +77,6 @@ Generated: Mon 27 Oct 2025 11:39:50 AEDT
## src/crypto.sql


## src/encrypted/aggregates.sql

- CREATE FUNCTION eql_v2.min(a eql_v2_encrypted, b eql_v2_encrypted)
- CREATE AGGREGATE eql_v2.min(eql_v2_encrypted)
- CREATE FUNCTION eql_v2.max(a eql_v2_encrypted, b eql_v2_encrypted)
- CREATE AGGREGATE eql_v2.max(eql_v2_encrypted)

## src/encrypted/casts.sql

- CREATE FUNCTION eql_v2.to_encrypted(data jsonb)
Expand Down
Loading