Skip to content

Refactor code generator into Rust#251

Merged
tobyhede merged 10 commits into
eql_v3from
v3-rust-codegen-generator-parity
Jun 3, 2026
Merged

Refactor code generator into Rust#251
tobyhede merged 10 commits into
eql_v3from
v3-rust-codegen-generator-parity

Conversation

@tobyhede
Copy link
Copy Markdown
Contributor

@tobyhede tobyhede commented Jun 1, 2026

Summary

Plan 2 of 3 of the Rust codegen migration. Adds crates/eql-codegen — a std-only Rust binary that renders the scalar encrypted-domain SQL surface and the committed <T>_values.rs consts from the eql-scalars catalog (Plan 1). Stacked on eql_v3.

What's here

  • Generator (crates/eql-codegen): consts, operator_surface, templates/generate, an ownership-guarded writer, and a main CLI.
  • Operator catalog: operator generation is modelled as a central PostgreSQL operator catalog — symbols, generated function names, PostgreSQL signatures, optional CREATE OPERATOR metadata. No generic Kind enum; Wrapper vs Unsupported is chosen per-domain from Term::operators_for_terms.
  • Per-kind SQL templates: functions.sql.j2 dispatches to functions/{extractor,wrapper,unsupported}.sql.j2 via dynamic {% include %} off the entry kind.
  • Parity gate (the safety story): tests/parity.rs — byte-exact <T>_values.rs + line-normalized int4 golden master — and mise run codegen:parity, enforced in CI.

Verification

  • cargo test -p eql-codegen — 60 lib + 3 parity green; clippy -D warnings and fmt clean.
  • mise run codegen:parity → PARITY OK (generated SQL + committed <T>_values.rs unchanged).

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c5d634df-4eae-48e2-a2d8-59e4489e1ae6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title 'Refactor code generator into Rust' directly and clearly summarizes the main change: porting the code generator from Python to Rust, which is the core objective of this PR.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch v3-rust-codegen-generator-parity

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/test-eql.yml (1)

6-20: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Include crates/** in test-eql.yml trigger paths so Rust generator-only PRs run the parity gate.

.github/workflows/test-eql.yml is the only workflow containing mise run codegen:parity, and repo-wide workflow searches show no crates path filters—so changes under crates/eql-codegen won’t trigger this workflow.

Proposed change
     paths:
       - ".github/workflows/test-eql.yml"
       - "src/**/*.sql"
       - "sql/**/*.sql"
       - "tests/**/*"
       - "tasks/**/*"
+      - "crates/**"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test-eql.yml around lines 6 - 20, The workflow
test-eql.yml doesn't include the crates path so changes in crates/eql-codegen
won't trigger the parity gate; update the paths arrays in the test-eql.yml
manifest (both the root push paths and the pull_request.paths block) to include
"crates/**" so that files under the crates directory (e.g., crates/eql-codegen)
will cause mise run codegen:parity to run.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/eql-codegen/tests/parity.rs`:
- Around line 100-134: The test rust_and_python_generators_are_byte_identical
currently performs no assertion and can mutate the repo by invoking the Python
generator; change it so it doesn't run by default and can't dirty the working
tree: add the #[ignore] attribute to the test fn
rust_and_python_generators_are_byte_identical (or remove the test entirely) so
cargo test won't run it by default, or alternatively make it a real test by
removing the early break/let _ and implementing a non-mutating assertion (e.g.,
run Python against a tempdir and assert parity) — reference the test function
name rust_and_python_generators_are_byte_identical and the Python invocation
block that shells out to "python3" for locating the code to change.

---

Outside diff comments:
In @.github/workflows/test-eql.yml:
- Around line 6-20: The workflow test-eql.yml doesn't include the crates path so
changes in crates/eql-codegen won't trigger the parity gate; update the paths
arrays in the test-eql.yml manifest (both the root push paths and the
pull_request.paths block) to include "crates/**" so that files under the crates
directory (e.g., crates/eql-codegen) will cause mise run codegen:parity to run.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dd504cd2-5ed5-42fd-890c-41908a5d8cfa

📥 Commits

Reviewing files that changed from the base of the PR and between a12c729 and 70a5ebd.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • .github/workflows/test-eql.yml
  • crates/eql-codegen/Cargo.toml
  • crates/eql-codegen/src/consts.rs
  • crates/eql-codegen/src/generate.rs
  • crates/eql-codegen/src/lib.rs
  • crates/eql-codegen/src/main.rs
  • crates/eql-codegen/src/operator_surface.rs
  • crates/eql-codegen/src/templates.rs
  • crates/eql-codegen/src/writer.rs
  • crates/eql-codegen/tests/parity.rs
  • mise.toml
  • tasks/codegen-parity.sh

Comment thread crates/eql-codegen/tests/parity.rs
@tobyhede tobyhede marked this pull request as draft June 2, 2026 04:10
@tobyhede tobyhede force-pushed the v3-rust-codegen-generator-parity branch from 74c1824 to 77312fb Compare June 2, 2026 05:26
@tobyhede tobyhede force-pushed the v3-rust-codegen-workspace-catalog branch 2 times, most recently from f482823 to 7ba43a6 Compare June 2, 2026 05:37
Base automatically changed from v3-rust-codegen-workspace-catalog to eql_v3 June 2, 2026 05:56
Port the scalar encrypted-domain SQL generator from the Python oracle to
the `eql-codegen` Rust crate, achieving generator parity and wiring it
into the build/test/CI flow.

Generator:
- Scaffold the eql-codegen crate: AUTO-GENERATED headers, schema consts,
  sql_str, operator-surface table, backing_function, role/brief/shape
  helpers, and the fixture-values renderer.
- Port the domain-block, extractor, wrapper, three blocker renderers
  (LANGUAGE plpgsql, never STRICT), operator, aggregate (ord-capability),
  and types-file renderers; relocate derivation helpers + logic tests into
  the context layer.
- Render the int4 SQL surface (types/functions/operators/aggregates) from
  minijinja templates; add the minijinja+serde template environment.
- Port the ownership-guarded writer (std-only) and the generate_all
  orchestrator + CLI (generate / list-types).

Tests & CI:
- int4 golden-master reference tests; values.rs byte-exact, integration
  golden line-normalized; coarsened footgun + escaping guards as
  whole-file/context scans.
- Retire the Python-oracle parity gate; codegen:parity now gates on
  golden + values. Drop the deprecated Python test:codegen drift check.

Cleanup:
- Standardize the generated-SQL marker on `-- AUTOMATICALLY GENERATED FILE`
  (the project-wide marker docs:validate greps on); regenerate int2/int4
  values.rs fixtures and update the consts.rs assertion and CLAUDE.md.
- Address review findings (operator-check rename, placeholder_payload
  comment, exit-code u8 clamp).
@tobyhede tobyhede force-pushed the v3-rust-codegen-generator-parity branch from 77312fb to 2737d5f Compare June 2, 2026 06:13
tobyhede added 4 commits June 2, 2026 16:37
…ates

The Rust port (2737d5f) re-headed every committed/golden artifact to the
`AUTOMATICALLY GENERATED FILE.` marker and updated consts.rs + CLAUDE.md,
but left the Python generator on its original `AUTO-GENERATED — DO NOT EDIT.`
marker. CI still runs the Python generator (codegen:domain:all, build), so
its ownership check mismatched the Rust-marked files and refused to overwrite
them as hand-written (OwnershipError), failing the "Encrypted-domain codegen"
job. Align tasks/codegen/templates.py (and its marker tests) to the canonical
single-line Rust marker — the one docs:validate greps on and consts.rs asserts.

Also fix the "Rust workspace crates" job: cargo fmt the eql-codegen sources
(never formatted) and resolve two latent clippy manual_contains lints in
generate.rs that fmt's fail-fast had masked.

Verified locally: codegen:domain:all (+ git diff), codegen:parity, and
test:crates (fmt --check + clippy -D warnings + cargo test) all pass.
- Point generated values.rs header at eql-scalars::CATALOG, not deleted TOML
- Cover int2 in scalar_family_inlinable_operators_are_clean lint test
- codegen-parity.sh: catch untracked generated values.rs via git status
- main.rs: make non-zero Ok exit-code arm an explicit FAILURE
- Fix stale fixture.rs breadcrumb in cipherstash.rs to int4.rs
- FIXTURE_SCHEMA.md: int4 fixture is now 17 rows incl signed extremes + zero
…alog

Replace the generic `Kind` enum (Symmetric/Path/BlockerOnly) and the
parallel `SYMMETRIC_OPERATORS`/`PATH_OPERATORS`/`BLOCKER_ONLY_OPERATORS`
slices + per-shape helpers with a single `OPERATORS` catalog where each
operator carries strongly-typed PostgreSQL signatures and pure-data
planner metadata.

- operator_surface.rs: add `TypeSlot`, `OperatorSignature`,
  `RenderedSignature`, and `OperatorMetadata`; each `Operator` now owns
  its `signatures` and `metadata`. Rename `backing` -> `function_name`
  and `backing_function` -> `operator_function_name`. Delete `Kind` and
  the three category slices.
- generate.rs: `render_functions_file`/`render_operators_file` iterate
  the catalog x signatures and choose wrapper vs unsupported-operator
  entries per domain from `Term::operators_for_terms`. Remove the
  category loops and `*_shapes` helpers; preserve the `selector` param
  name for `->`/`->>` via a small `arg_b_name` helper.
- context.rs: `operator_entry` takes `&Operator` and renders metadata
  only when the domain supports the operator; rename `FnEntry::Blocker`
  -> `Unsupported` and `blocker_entry` -> `unsupported_entry`.

Generated SQL is byte-identical (codegen:parity passes); blocking is now
a per-domain emission decision rather than a static operator category.
Adds direct unit tests for metadata gating, the selector naming, the
jsonpath/text[] slots, and a non-empty-signatures catalog invariant.

Spec: docs/development/operator-surface-generation-spec.md
The committed `<T>_values.rs` files carry the Rust generator's header
("Generated from the `<T>` row in `eql-scalars::CATALOG`"), but the
Python generator still emitted the older "tasks/codegen/types/<T>.toml"
wording. CI's "regenerate fixture-value consts" step runs the Python
generator (`codegen:domain:all`) and diffs the result, so the header
mismatch failed the build.

Re-align `templates.py` to the Rust header (the recurrence of the drift
fixed in e186464) and update the now-stale assertion in
`test_templates.py`. Verified: `codegen:domain:all` leaves
tests/sqlx/src/fixtures clean and `codegen:parity` still passes.
@tobyhede tobyhede force-pushed the v3-rust-codegen-generator-parity branch from daf1709 to 2c08b8e Compare June 2, 2026 11:34
tobyhede added 2 commits June 3, 2026 08:13
Replace the if/elif/else body in functions.sql.j2 with a dynamic
{% include %} driven off each entry's serde "kind" tag
(Extractor/Wrapper/Unsupported -> functions/<kind>.sql.j2). The parent
template keeps the once-per-file header, -- REQUIRE: loop, and entry loop;
each per-kind body lives in its own partial under templates/functions/.

Dispatch stays in the template layer off the existing #[serde(tag="kind")]
value, so FnEntry, generate.rs, and the Rust render path are unchanged.
Generated SQL is line-normalized-identical (golden test + codegen:parity).
Remove VERSION_KEY and ENVELOPE_VERSION — dead leftovers from the Python
port never wired into the Rust generator (the version-pin clause is a
literal 'v'='2' in types.sql.j2). Fold the always-present CIPHERTEXT_KEY
into ENVELOPE_KEYS as ["v","i","c"]; context::domain_block keeps the same
v,i,c ordering ahead of term keys, so generated SQL is byte-identical.

Demote all consts.rs items from pub to pub(crate). Nothing outside the
crate imports eql_codegen::consts, so this is safe and lets the built-in
dead_code lint see them — the existing `clippy -- -D warnings` CI step
(mise run test:crates) now fails on unused internal items, no workflow
change needed.

Verified: clippy -D warnings clean, 63 lib + 3 parity tests pass,
codegen:parity OK (generated SQL unchanged).
@tobyhede tobyhede marked this pull request as ready for review June 2, 2026 22:30
tobyhede added 2 commits June 3, 2026 08:47
Replace the \n\-continuation format string in render_fixture_values_rs
with a flush-left raw string literal — same emitted bytes, far more
legible. Kept as format! (not a minijinja template) deliberately:
<T>_values.rs is validated byte-exact, unlike the line-normalized SQL
surface, and the type-aware literal rendering stays in Rust regardless.

Verified byte-identical via codegen:parity (rust_generator_matches_
committed_values_rs); fmt/clippy -D warnings clean, all tests pass.
The `eql_v2_int4`/`eql_v2_int2` fixtures are plain jsonb-payload test
tables; the encrypted-domain types they exercise are correctly under
eql_v3 (eql_v3.int4_eq/_ord), derived from the scalar type and applied
via per-query cast. Reword matrix.rs's misleading "EQL domain type name"
comment on `eql_type` to say fixture/table name, and point the fixtures
mod.rs value-const docs at eql-scalars::CATALOG instead of the retired
tasks/codegen Python path. Comment-only; no behaviour change.
@tobyhede tobyhede changed the title Rust codegen Plan 2: eql-codegen generator + Python-parity gate Refactor code generator into Rust Jun 2, 2026
…ixtures

The domain_over_domain/domain_opclass lint-catalog header described the
target as an eql_v2_* domain, but the checks match both eql_v3.* and
legacy public.eql_v2_*. The int2/int4 fixture doc-comments called the
fixture struct names a 'domain'; the actual domains are eql_v3.int2 /
eql_v3.int4 (see 88ad559).

Comment-only; no behaviour change.

--! @file encrypted_domain/int4/int4_eq_operators.sql
--! @brief Equality-only domain of the int4 encrypted-domain family — operator declarations.
--! @brief Operators for eql_v3.int4_eq.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Majority of the reference changes are tightening of comments - reduce the surface area of required variables.

--! @return boolean
CREATE FUNCTION eql_v3.lt(a eql_v3.int4_eq, b eql_v3.int4_eq)
RETURNS boolean IMMUTABLE PARALLEL SAFE
AS $$ BEGIN RETURN eql_v3.encrypted_domain_unsupported_bool('eql_v3.int4_eq', '<'); END; $$
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only SQL code change - removing a level of indirection.

--! @file encrypted_domain/{{ token }}/{{ name }}_functions.sql
--! @brief Functions for {{ dom }}.
{% for e in entries %}
{% include "functions/" ~ e.kind|lower ~ ".sql.j2" -%}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assured that this is the canonical jinja approach.
Is fine for now.

@tobyhede tobyhede merged commit a9b3785 into eql_v3 Jun 3, 2026
10 checks passed
@tobyhede tobyhede deleted the v3-rust-codegen-generator-parity branch June 3, 2026 05:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant