-
Notifications
You must be signed in to change notification settings - Fork 0
feat: fixture + bench data-generation foundation (int4 fixture, Proxy scaffolding) #224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,195 @@ | ||
| # High-Level Encrypted Domain Types Prototype Design | ||
|
|
||
| ## Context | ||
|
|
||
| EQL currently exposes one public encrypted column type, `public.eql_v2_encrypted`, | ||
| implemented as a composite type with a single `jsonb` payload field. Query behavior | ||
| is selected dynamically from the encrypted payload terms that are present (`hm`, | ||
| `bf`, `ob`, `opf`, `opv`, `sv`, etc.). | ||
|
|
||
| The new goal is to add high-level SQL column types such as `encrypted_text`, | ||
| `encrypted_jsonb`, and `encrypted_int4`. These types should make application DDL | ||
| clearer and give each plaintext shape a static, predictable SQL operator surface. | ||
| They should not rely on the broad dynamic dispatch behavior of | ||
| `eql_v2_encrypted`. | ||
|
|
||
| The prototype is intentionally limited to: | ||
|
|
||
| - `public.encrypted_text` | ||
| - `public.encrypted_jsonb` | ||
| - `public.encrypted_int4` | ||
|
|
||
| Configuration inference, automatic registration, broad type coverage, and | ||
| production migration behavior are out of scope for the prototype. The prototype | ||
| exists to prove whether `jsonb` domain types can provide a clean client-facing | ||
| DDL surface while still producing indexable query plans without operator | ||
| classes. | ||
|
|
||
| ## History And Spike Findings | ||
|
|
||
| A previous branch tried changing `eql_v2_encrypted` itself from a composite type | ||
| to a `jsonb` domain. That PR closed unmerged with failing CI, and there is no | ||
| clear written rationale for the failure. Separately, EQL has kept | ||
| `public.eql_v2_encrypted` and `public.eql_v2_configuration` outside the | ||
| `eql_v2` schema so EQL upgrades can drop and recreate `eql_v2` without | ||
| cascading into customer columns. | ||
|
|
||
| A transient SQL spike compared three shapes: | ||
|
|
||
| - domain over raw `jsonb` | ||
| - domain over `public.eql_v2_encrypted` | ||
| - independent composite type with `(data jsonb)` | ||
|
|
||
| The spike showed that domains over `public.eql_v2_encrypted` are ergonomic and | ||
| can use existing helpers, but inherit base EQL operators when exact domain | ||
| operators are absent. Independent composites avoid inherited behavior, but need | ||
| more casts and exact helper/operator wrappers. | ||
|
|
||
| The approved design is simpler: define the high-level types as domains over | ||
| raw `jsonb`, then define exact operators for supported and unsupported | ||
| operations. This removes the extra `eql_v2_encrypted` layer from the new public | ||
| types. | ||
|
|
||
| ## Type Model | ||
|
|
||
| Create public domain types over `jsonb`: | ||
|
|
||
| ```sql | ||
| CREATE DOMAIN public.encrypted_text AS jsonb; | ||
| CREATE DOMAIN public.encrypted_jsonb AS jsonb; | ||
| CREATE DOMAIN public.encrypted_int4 AS jsonb; | ||
| ``` | ||
|
|
||
| The payload remains the existing EQL encrypted JSONB payload. The specific | ||
| types do not depend on `public.eql_v2_encrypted` for storage or operator | ||
| dispatch. | ||
|
|
||
| Because PostgreSQL domains can fall back to base-type behavior, every public | ||
| operation in the supported SQL surface must have an exact domain operator: | ||
|
|
||
| - supported operations delegate to fixed index-term helpers; | ||
| - unsupported operations raise a type-specific error. | ||
|
|
||
| This prevents accidental fallback to native `jsonb` semantics for common SQL | ||
| operators. | ||
|
|
||
| ## Prototype Acceptance Criteria | ||
|
|
||
| The prototype must prove these properties: | ||
|
|
||
| - exact domain operators resolve for supported operations; | ||
| - exact blocker operators prevent common unsupported operations from falling | ||
| through to native `jsonb` behavior; | ||
| - supported hot-path operator functions are inlineable SQL functions with no | ||
| `SET search_path` clause; | ||
| - bare operator predicates use functional indexes and do not require custom | ||
| btree or hash operator classes; | ||
| - where existing helper signatures are awkward, temporary typed helper wrappers | ||
| are small, `LANGUAGE sql`, immutable, strict, parallel-safe, and inlineable | ||
| when used in indexed predicates. | ||
|
|
||
| ## Operator Surface | ||
|
|
||
| ### `encrypted_text` | ||
|
|
||
| Supported: | ||
|
|
||
| - `=` and `<>`, using the `hm` term through `eql_v2.hmac_256(value::jsonb)` | ||
| - `~~` and `~~*`, using the `bf` term through `eql_v2.bloom_filter(value::jsonb)` | ||
|
|
||
| Unsupported blockers: | ||
|
|
||
| - `<`, `<=`, `>`, `>=` | ||
| - `@>`, `<@` | ||
| - `->`, `->>` | ||
|
|
||
| ### `encrypted_int4` | ||
|
|
||
| Supported: | ||
|
|
||
| - `=` and `<>`, using the `hm` term through `eql_v2.hmac_256(value::jsonb)` | ||
| - `<`, `<=`, `>`, `>=`, using OPE terms by default through an inlineable | ||
| expression over `value::jsonb` | ||
|
|
||
| Unsupported blockers: | ||
|
|
||
| - `~~`, `~~*` | ||
| - `@>`, `<@` | ||
| - `->`, `->>` | ||
|
|
||
| ### `encrypted_jsonb` | ||
|
|
||
| Supported: | ||
|
|
||
| - `=` and `<>`, using the `hm` term through `eql_v2.hmac_256(value::jsonb)` | ||
| - `@>` and `<@`, using `sv` through inlineable typed STE vector helpers or | ||
| wrappers | ||
| - `->` and `->>`, using stubbed or adapted encrypted JSON path helpers for the | ||
| domain type | ||
|
|
||
| Unsupported blockers: | ||
|
|
||
| - `<`, `<=`, `>`, `>=` | ||
| - `~~`, `~~*` | ||
|
|
||
| ## Out Of Scope | ||
|
|
||
| Do not add configuration inference in this prototype. The prototype should not | ||
| change `eql_v2.add_column`, `eql_v2.add_search_config`, or the configuration | ||
| validation functions. | ||
|
|
||
| Do not add automatic registration or event triggers in this prototype. | ||
|
|
||
| Do not add full support for additional encrypted scalar types in this prototype. | ||
| The three selected types are enough to test text, scalar range, and JSONB | ||
| operator behavior. | ||
|
|
||
| ## Error Handling | ||
|
|
||
| Unsupported exact operators should raise clear errors: | ||
|
|
||
| ```text | ||
| operator < is not supported for encrypted_text | ||
| operator ~~ is not supported for encrypted_int4 | ||
| operator -> is not supported for encrypted_int4 | ||
| ``` | ||
|
|
||
| Missing required encrypted index terms should fail through the fixed helper path | ||
| with the existing helper errors, such as missing `hm`, `bf`, `opf`, or `sv`. | ||
|
|
||
| Supported hot-path functions should not raise custom errors for missing terms if | ||
| an existing helper already provides a precise missing-term error. | ||
|
|
||
| ## Testing | ||
|
|
||
| Add focused SQLx coverage for the first three domain types: | ||
|
|
||
| - Domain creation and assignment from valid encrypted JSONB payloads. | ||
| - Supported operators for each type. | ||
| - Unsupported operators raise the exact type-specific error instead of falling | ||
| through to native `jsonb` behavior. | ||
| - Functional indexes engage for supported terms: | ||
| - `encrypted_text`: `eql_v2.hmac_256(col::jsonb)`, | ||
| `eql_v2.bloom_filter(col::jsonb)` | ||
| - `encrypted_int4`: `eql_v2.hmac_256(col::jsonb)`, and an OPE order | ||
| expression over `col::jsonb` | ||
| - `encrypted_jsonb`: `eql_v2.hmac_256(col::jsonb)`, and a typed STE vector | ||
| array helper or overload that accepts `encrypted_jsonb` | ||
| - `EXPLAIN` plans show index scans for bare operator predicates such as | ||
| `col = rhs`, `col ~~ rhs`, `col < rhs`, and `col @> rhs`. | ||
| - The same predicates do not require btree/hash operator classes. | ||
| - Prepared statements with domain-typed parameters still resolve to exact | ||
| domain operators. | ||
|
|
||
| ## Implementation Boundary | ||
|
|
||
| Write the first three type surfaces manually. Do not introduce a generator in | ||
| the prototype. Manual SQL keeps the spike easy to audit and | ||
| lets tests prove the domain-over-`jsonb` approach before expanding to | ||
| `encrypted_int2`, `encrypted_int8`, numeric, floating-point, boolean, date, and | ||
| timestamp types. | ||
|
|
||
| Supported operator functions and helper wrappers that appear in indexed | ||
| predicates must be SQL-language functions intended for planner inlining. | ||
| Unsupported blocker functions can use PL/pgSQL because they are not performance | ||
| paths. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| ["bench:up"] | ||
| description = "Start Postgres + Proxy for benchmark data generation" | ||
| dir = "{{config_root}}" | ||
| run = """ | ||
| if [ ! -f tests/benchmarks/.env ]; then | ||
| echo "ERROR: tests/benchmarks/.env missing. Copy .env.example and fill in credentials." >&2 | ||
| exit 1 | ||
| fi | ||
| docker compose --env-file tests/benchmarks/.env -f tests/benchmarks/docker-compose.yml up -d | ||
| export PGPASSWORD="password" | ||
| echo "Waiting for bench-postgres on localhost:7433..." | ||
| for i in $(seq 1 60); do | ||
| if psql -U cipherstash -d cipherstash -h localhost -p 7433 -c 'SELECT 1' >/dev/null 2>&1; then | ||
| echo "bench-postgres ready." | ||
| break | ||
| fi | ||
| sleep 1 | ||
| if [ "$i" -eq 60 ]; then | ||
| echo "bench-postgres did not become ready in 60s." | ||
| echo | ||
| echo '=== bench-postgres logs ===' | ||
| docker logs bench-postgres 2>&1 | tail -40 | ||
| exit 1 | ||
| fi | ||
| done | ||
|
|
||
| echo "Waiting for bench-proxy on localhost:6433..." | ||
| for i in $(seq 1 60); do | ||
| if psql -U cipherstash -d cipherstash -h localhost -p 6433 -c 'SELECT 1' >/dev/null 2>&1; then | ||
| echo "bench-proxy ready." | ||
| exit 0 | ||
| fi | ||
| sleep 1 | ||
| done | ||
| echo "bench-proxy did not become ready in 60s." | ||
| echo | ||
| echo '=== bench-proxy logs ===' | ||
| docker logs bench-proxy 2>&1 | tail -40 | ||
| exit 1 | ||
| """ | ||
|
|
||
| ["bench:down"] | ||
| description = "Stop benchmark Postgres + Proxy" | ||
| dir = "{{config_root}}" | ||
| run = """ | ||
| docker compose -f tests/benchmarks/docker-compose.yml down -v | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use the same env-file for teardown as startup. Line 46 omits Suggested fix-docker compose -f tests/benchmarks/docker-compose.yml down -v
+docker compose --env-file tests/benchmarks/.env -f tests/benchmarks/docker-compose.yml down -v🤖 Prompt for AI Agents |
||
| """ | ||
|
|
||
| ["bench:generate"] | ||
| description = "Generate 100K encrypted bench dataset (requires bench:up first)" | ||
| # `build` produces release/cipherstash-encrypt.sql, which generate.sh | ||
| # installs into the bench Postgres container before applying schema.sql. | ||
| depends = ["build"] | ||
| dir = "{{config_root}}" | ||
| run = """ | ||
| tests/benchmarks/generate.sh 100k | ||
| """ | ||
|
|
||
| ["bench:full"] | ||
| description = "Run committed SQLx bench/regression suite" | ||
| dir = "{{config_root}}" | ||
| run = """ | ||
| mise run --output prefix test:bench | ||
| """ | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| ["proxy:up"] | ||
| description = "Start CipherStash Proxy connected to existing Postgres" | ||
| # Reuses the tests/docker-compose.yml Postgres on POSTGRES_PORT. | ||
| # CS_* credentials are read from the shell environment (mise/direnv/profile). | ||
| # Readiness is verified from the host (the proxy image lacks busybox nc, so | ||
| # the container-internal healthcheck cannot be used). Dumps container logs | ||
| # on failure so the user does not need a separate `docker logs` invocation. | ||
| dir = "{{config_root}}/tests" | ||
| run = """ | ||
| docker compose -f docker-compose.proxy.yml up -d | ||
| echo "Waiting for proxy on localhost:6432..." | ||
| export PGPASSWORD="${POSTGRES_PASSWORD:-password}" | ||
| for i in $(seq 1 60); do | ||
| if psql -U "${POSTGRES_USER:-cipherstash}" -d "${POSTGRES_DB:-cipherstash}" \ | ||
| -h localhost -p 6432 -c 'SELECT 1' >/dev/null 2>&1; then | ||
| echo "Proxy ready." | ||
| exit 0 | ||
| fi | ||
| sleep 1 | ||
| done | ||
| echo "Proxy did not become ready in 60s." | ||
| echo | ||
| echo '=== cipherstash-proxy logs ===' | ||
| docker logs cipherstash-proxy 2>&1 | tail -40 | ||
| exit 1 | ||
| """ | ||
|
|
||
| ["proxy:logs"] | ||
| description = "Tail CipherStash Proxy container logs" | ||
| dir = "{{config_root}}/tests" | ||
| run = "docker logs --tail 100 -f cipherstash-proxy" | ||
|
|
||
| ["proxy:down"] | ||
| description = "Stop CipherStash Proxy" | ||
| dir = "{{config_root}}/tests" | ||
| run = "docker compose -f docker-compose.proxy.yml down" | ||
|
|
||
| ["fixture:int:generate"] | ||
| description = "Generate encrypted_int4 fixture (009) via Proxy" | ||
| # Prerequisites: | ||
| # - mise run postgres:up (existing Postgres on POSTGRES_PORT) | ||
| # - mise run reset (ensures EQL is installed in that Postgres) | ||
| # - mise run proxy:up (Proxy on localhost:6432) | ||
| depends = ["build"] | ||
| dir = "{{config_root}}" | ||
| run = "tasks/fixtures/generate_encrypted_int4.sh" |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,69 @@ | ||||||||||||||||||
| #!/usr/bin/env bash | ||||||||||||||||||
| # Common helpers for fixture generators. Sourced — not executed directly. | ||||||||||||||||||
| # Sets PG_URL / PROXY_URL and exposes restart_proxy_and_wait + dump_fixture_table. | ||||||||||||||||||
|
|
||||||||||||||||||
| # Resolve Postgres / Proxy connection from mise [env] (POSTGRES_*) with the | ||||||||||||||||||
| # usual defaults. PROXY_PORT comes from tests/docker-compose.proxy.yml. | ||||||||||||||||||
| PG_USER="${POSTGRES_USER:-cipherstash}" | ||||||||||||||||||
| PG_PASSWORD="${POSTGRES_PASSWORD:-password}" | ||||||||||||||||||
| PG_DB="${POSTGRES_DB:-cipherstash}" | ||||||||||||||||||
| PG_HOST="${POSTGRES_HOST:-localhost}" | ||||||||||||||||||
| PG_PORT="${POSTGRES_PORT:-7432}" | ||||||||||||||||||
| PROXY_PORT="${PROXY_PORT:-6432}" | ||||||||||||||||||
|
|
||||||||||||||||||
| PG_URL="postgresql://${PG_USER}:${PG_PASSWORD}@${PG_HOST}:${PG_PORT}/${PG_DB}" | ||||||||||||||||||
| PROXY_URL="postgresql://${PG_USER}:${PG_PASSWORD}@${PG_HOST}:${PROXY_PORT}/${PG_DB}" | ||||||||||||||||||
|
|
||||||||||||||||||
| export PGPASSWORD="$PG_PASSWORD" | ||||||||||||||||||
|
Comment on lines
+14
to
+17
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use password-less DSNs since Including Suggested change-PG_URL="postgresql://${PG_USER}:${PG_PASSWORD}@${PG_HOST}:${PG_PORT}/${PG_DB}"
-PROXY_URL="postgresql://${PG_USER}:${PG_PASSWORD}@${PG_HOST}:${PROXY_PORT}/${PG_DB}"
+PG_URL="postgresql://${PG_USER}@${PG_HOST}:${PG_PORT}/${PG_DB}"
+PROXY_URL="postgresql://${PG_USER}@${PG_HOST}:${PROXY_PORT}/${PG_DB}"📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||
|
|
||||||||||||||||||
| # Proxy caches its encrypt config at connection-handler init time, so any | ||||||||||||||||||
| # add_search_config call applied AFTER Proxy started won't take effect | ||||||||||||||||||
| # until Proxy reconnects. Restart and wait for it to come back. | ||||||||||||||||||
| restart_proxy_and_wait() { | ||||||||||||||||||
| echo "==> Restarting Proxy so it reloads the new encrypt config" | ||||||||||||||||||
| docker restart cipherstash-proxy >/dev/null | ||||||||||||||||||
|
|
||||||||||||||||||
| for i in $(seq 1 60); do | ||||||||||||||||||
| if psql "$PROXY_URL" -c 'SELECT 1' >/dev/null 2>&1; then | ||||||||||||||||||
| echo " Proxy ready." | ||||||||||||||||||
| return 0 | ||||||||||||||||||
| fi | ||||||||||||||||||
| sleep 1 | ||||||||||||||||||
| done | ||||||||||||||||||
|
|
||||||||||||||||||
| echo "ERROR: Proxy did not come back up after restart" >&2 | ||||||||||||||||||
| docker logs cipherstash-proxy 2>&1 | tail -20 | ||||||||||||||||||
| return 1 | ||||||||||||||||||
| } | ||||||||||||||||||
|
|
||||||||||||||||||
| # Render fixture rows as INSERT statements using format(%L). Caller supplies: | ||||||||||||||||||
| # $1 = source table name (e.g. bench_text) | ||||||||||||||||||
| # $2 = destination table name in the migration (e.g. encrypted_text_plaintext) | ||||||||||||||||||
| # $3 = comma-separated source-column projection | ||||||||||||||||||
| # (e.g. "id, plaintext, (encrypted_text).data::text") | ||||||||||||||||||
| # $4 = comma-separated destination column types for format() placeholders | ||||||||||||||||||
| # (e.g. "%L, %L, %L::jsonb") | ||||||||||||||||||
| # $5 = destination column-name tuple | ||||||||||||||||||
| # (e.g. "(id, plaintext, payload)") | ||||||||||||||||||
| # $6 = output path | ||||||||||||||||||
| # | ||||||||||||||||||
| # The migration is written with a DROP / CREATE preamble plus the rendered | ||||||||||||||||||
| # INSERT statements. The CREATE statement must be supplied by the caller via | ||||||||||||||||||
| # stdin BEFORE calling this function; see how each generator pipes it in. | ||||||||||||||||||
| dump_fixture_table() { | ||||||||||||||||||
| local src_table="$1" | ||||||||||||||||||
| local dst_table="$2" | ||||||||||||||||||
| local src_projection="$3" | ||||||||||||||||||
| local fmt_placeholders="$4" | ||||||||||||||||||
| local dst_columns="$5" | ||||||||||||||||||
| local output_path="$6" | ||||||||||||||||||
|
|
||||||||||||||||||
| psql "$PG_URL" -v ON_ERROR_STOP=1 -t -A -c " | ||||||||||||||||||
| SELECT format( | ||||||||||||||||||
| 'INSERT INTO ${dst_table} ${dst_columns} VALUES (${fmt_placeholders});', | ||||||||||||||||||
| ${src_projection} | ||||||||||||||||||
| ) | ||||||||||||||||||
| FROM ${src_table} | ||||||||||||||||||
| ORDER BY id; | ||||||||||||||||||
| " >> "$output_path" | ||||||||||||||||||
| } | ||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO the "standard" int ordering should use Block ORE not OPE.
An OPE version could be supported via something like
encrypted_int_compat(which would be less secure and should be avoided if possible).Also, ORE on integers isn't lossy so we can safely use it for equality on this type. No need for the
hmat all (we need it oneql_v2_encryptedbecause its needed for strings).