diff --git a/CHANGELOG.md b/CHANGELOG.md index 0c128bdc..f5dff796 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -27,6 +27,7 @@ Each entry that ships in a published release links to the PR that introduced it. - **`eql_v3.int8` encrypted-domain type family.** Four jsonb-backed domains for encrypted `int8` columns — `eql_v3.int8` (storage-only), `eql_v3.int8_eq` (`=` / `<>` via HMAC), and `eql_v3.int8_ord` / `eql_v3.int8_ord_ore` (also `<` `<=` `>` `>=` via ORE block terms, with `MIN` / `MAX` aggregates) — generated from the `int8` row in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.int4` reference. Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` extractors, not an operator class on the domain. Why: a type-safe, per-capability encrypted `bigint` column, extending the scalar generator across the full 64-bit integer width. ([#253](https://github.com/cipherstash/encrypt-query-language/pull/253)) - **`eql_v3.date` encrypted-domain type family.** Four jsonb-backed domains for encrypted `date` columns — `eql_v3.date` (storage-only), `eql_v3.date_eq` (`=` / `<>` via HMAC), and `eql_v3.date_ord` / `eql_v3.date_ord_ore` (also `<` `<=` `>` `>=` via ORE block terms, with `MIN` / `MAX` aggregates) — generated from the `date` row in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.int4` reference. Plaintexts encrypt under the `date` cast and compare via the same ORE block terms as the integer scalars (ORE is plaintext-agnostic — dates order like integers). Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` extractors, not an operator class on the domain. Why: the first **non-integer ordered** scalar encrypted-domain type — a type-safe, per-capability encrypted `date` column — proving the generator and SQLx test matrix generalize beyond fixed-width integers. ([#256](https://github.com/cipherstash/encrypt-query-language/pull/256)) - **Per-domain `MIN` / `MAX` aggregates for the encrypted-domain family.** `eql_v3.min(eql_v3._ord)` / `eql_v3.max(eql_v3._ord)` (and the `_ord_ore` twin) are generated for every ord-capable scalar variant, giving type-safe extrema on domain-typed columns — comparison routes through the variant's `<` / `>` operator (ORE block term, no decryption). The aggregates are declared `PARALLEL = SAFE` with a combine function (the state function itself — min/max are associative), so PostgreSQL can use partial/parallel aggregation on large `GROUP BY` workloads. Why: the new domain types previously had no equivalent of the composite-type aggregates. The existing `eql_v2.min(eql_v2_encrypted)` / `eql_v2.max(eql_v2_encrypted)` aggregates are **retained** and continue to work on `eql_v2_encrypted` columns; the per-domain aggregates are additive and coexist with them. ([#239](https://github.com/cipherstash/encrypt-query-language/pull/239)) +- **`eql_v3.text` encrypted-domain family (`text`, `text_eq`, `text_match`, `text_ord`, `text_ord_ore`).** Adds equality (`=` / `<>` via HMAC), match (`@>` / `<@` via a new self-contained `eql_v3.bloom_filter` SEM index term), and ORE ordering (`<` `<=` `>` `>=`, `min` / `max`) for encrypted text, at parity with EQL v2 text — generated from the `text` row in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.int4` reference. `text` is the first scalar to add a new index `Term` (`Bloom`) and the first non-integer, unbounded ordered kind (lexicographic pivots, hand-written `impl ScalarType`). Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` / `eql_v3.match_term` extractors, not an operator class on the domain. Why: brings searchable encrypted text to the namespaced, `eql_v2`-free `eql_v3` surface. Match is exposed as bloom-filter containment on the `text_match` domain — deliberately *not* SQL `LIKE` (no wildcard/anchoring; probabilistic ngram containment) — and never backs equality (which always routes through `Hm`). ([#260](https://github.com/cipherstash/encrypt-query-language/pull/260)) - **Self-contained `eql_v3` schema + standalone `release/cipherstash-encrypt-v3.sql` installer.** The `eql_v3` encrypted-domain surface no longer depends on `eql_v2` at runtime: it now owns its own copies of the searchable-encrypted-metadata (SEM) index-term types — `eql_v3.hmac_256` and `eql_v3.ore_block_u64_8_256` (with its btree operator class) — so the `eql_v3.eq_term` / `eql_v3.ord_term` extractors return `eql_v3` types and no `eql_v2.` appears anywhere in the v3 SQL. The whole v3 surface relocated under a single `src/v3/` tree (`src/v3/sem/` for the hand-written SEM types, `src/v3/scalars/` for the generated domain families). A new build variant ships the `eql_v3` schema on its own as `release/cipherstash-encrypt-v3.sql`, installable into a database with no `eql_v2` present; a CI gate greps that artifact and its dependency closure to keep it `eql_v2`-free. Why: a clean foundation for the per-scalar encrypted-domain model to stand alone, ahead of it replacing the `eql_v2_encrypted` composite column type. This is additive — a new schema and a new artifact — and leaves `eql_v2` byte-for-byte unchanged. ([#255](https://github.com/cipherstash/encrypt-query-language/pull/255)) ### Changed diff --git a/crates/eql-codegen/src/context.rs b/crates/eql-codegen/src/context.rs index 78b50883..4581bfa4 100644 --- a/crates/eql-codegen/src/context.rs +++ b/crates/eql-codegen/src/context.rs @@ -310,19 +310,43 @@ mod tests { #[test] fn operator_entry_emits_metadata_only_when_supported() { use crate::operator_surface::operator; - // Supported comparison operator carries its planner metadata. - let eq = operator_entry(&operator("="), "eql_v3.int4_eq", "eql_v3.int4_eq", true); - assert_eq!(eq.symbol, "="); - assert_eq!(eq.function_name, "eq"); - assert_eq!( - eq.metadata.as_deref(), - Some("COMMUTATOR = =, NEGATOR = <>, RESTRICT = eqsel, JOIN = eqjoinsel") - ); - // The same operator, unsupported on this domain → no metadata line. - let eq_unsupported = operator_entry(&operator("="), "eql_v3.int4", "eql_v3.int4", false); - assert_eq!(eq_unsupported.metadata, None); - // Supported but metadata-less operator (`@>`) → still no metadata line. - let contains = operator_entry(&operator("@>"), "eql_v3.int4_eq", "eql_v3.int4_eq", true); - assert_eq!(contains.metadata, None); + + // (symbol, domain, supported) -> expected `CREATE OPERATOR` metadata + // clause. Adding a term that carries operator metadata is one new row + // here, not another hand-rolled assertion block. + let cases: &[(&str, &str, bool, Option<&str>)] = &[ + // Supported comparison operator carries its planner metadata. + ( + "=", + "eql_v3.int4_eq", + true, + Some("COMMUTATOR = =, NEGATOR = <>, RESTRICT = eqsel, JOIN = eqjoinsel"), + ), + // The same operator, unsupported on this domain → no metadata line. + ("=", "eql_v3.int4", false, None), + // Supported but metadata-less operator (`->`) → still no metadata. + ("->", "eql_v3.int4_eq", true, None), + // `@>` carries containment metadata when supported (the Bloom + // `text_match` path). + ( + "@>", + "eql_v3.text_match", + true, + Some("COMMUTATOR = <@, RESTRICT = contsel, JOIN = contjoinsel"), + ), + // ... but suppressed when `@>` is a blocker (non-Bloom domains), + // which is why the int4 golden is unchanged. + ("@>", "eql_v3.int4_eq", false, None), + ]; + + for (symbol, dom, supported, expected) in cases { + let entry = operator_entry(&operator(symbol), dom, dom, *supported); + assert_eq!(entry.symbol, *symbol); + assert_eq!( + entry.metadata.as_deref(), + *expected, + "operator {symbol} on {dom} (supported={supported})", + ); + } } } diff --git a/crates/eql-codegen/src/operator_surface.rs b/crates/eql-codegen/src/operator_surface.rs index 38f1ec19..1e7985c8 100644 --- a/crates/eql-codegen/src/operator_surface.rs +++ b/crates/eql-codegen/src/operator_surface.rs @@ -33,7 +33,7 @@ impl OperatorMetadata { } /// Render the `CREATE OPERATOR` metadata clause, or `None` when no hint is - /// present (the `@>`/`<@` symmetric-but-empty case collapses to `None`). + /// present (e.g. the path-selector operators, which carry no metadata). pub fn render(self) -> Option { let mut extras = Vec::new(); if let Some(c) = self.commutator { @@ -208,6 +208,18 @@ const fn cmp_metadata( } } +/// Containment-operator metadata (`@>` / `<@`): commutator is the mirror +/// operator, no negator (a non-containment is not another listed operator), +/// containment selectivity estimators. +const fn containment_metadata(commutator: &'static str) -> OperatorMetadata { + OperatorMetadata { + restrict: Some("contsel"), + join: Some("contjoinsel"), + commutator: Some(commutator), + negator: None, + } +} + /// The 20-operator catalog. Order is: comparison operators, then path-selector /// operators, then the remaining native jsonb operators. pub const OPERATORS: &[Operator] = &[ @@ -251,13 +263,13 @@ pub const OPERATORS: &[Operator] = &[ symbol: "@>", function_name: "contains", signatures: BOOL_SYMMETRIC_SIGNATURES, - metadata: OperatorMetadata::none(), + metadata: containment_metadata("<@"), }, Operator { symbol: "<@", function_name: "contained_by", signatures: BOOL_SYMMETRIC_SIGNATURES, - metadata: OperatorMetadata::none(), + metadata: containment_metadata("@>"), }, Operator { symbol: "->", @@ -519,7 +531,25 @@ mod tests { "COMMUTATOR = =, NEGATOR = <>, RESTRICT = eqsel, JOIN = eqjoinsel" ); assert_eq!(operator("->").metadata.render(), None); - assert_eq!(operator("@>").metadata.render(), None); + // `@>`/`<@` now carry containment metadata (no negator). + assert_eq!( + operator("@>").metadata.render().unwrap(), + "COMMUTATOR = <@, RESTRICT = contsel, JOIN = contjoinsel" + ); + } + + #[test] + fn containment_operators_have_containment_metadata() { + let c = operator("@>"); + assert_eq!(c.metadata.commutator, Some("<@")); + assert_eq!(c.metadata.restrict, Some("contsel")); + assert_eq!(c.metadata.join, Some("contjoinsel")); + assert_eq!(c.metadata.negator, None); + let cb = operator("<@"); + assert_eq!(cb.metadata.commutator, Some("@>")); + assert_eq!(cb.metadata.restrict, Some("contsel")); + assert_eq!(cb.metadata.join, Some("contjoinsel")); + assert_eq!(cb.metadata.negator, None); } #[test] diff --git a/crates/eql-scalars/src/lib.rs b/crates/eql-scalars/src/lib.rs index b5a362c7..c07398a0 100644 --- a/crates/eql-scalars/src/lib.rs +++ b/crates/eql-scalars/src/lib.rs @@ -152,6 +152,7 @@ impl ScalarKind { pub enum Term { Hm, Ore, + Bloom, } impl Term { @@ -160,6 +161,7 @@ impl Term { match self { Term::Hm => "hm", Term::Ore => "ob", + Term::Bloom => "bf", } } @@ -168,6 +170,7 @@ impl Term { match self { Term::Hm => "eq_term", Term::Ore => "ord_term", + Term::Bloom => "match_term", } } @@ -176,6 +179,7 @@ impl Term { match self { Term::Hm => "hmac_256", Term::Ore => "ore_block_u64_8_256", + Term::Bloom => "bloom_filter", } } @@ -184,6 +188,7 @@ impl Term { match self { Term::Hm => "eq", Term::Ore => "ord", + Term::Bloom => "match", } } @@ -192,6 +197,7 @@ impl Term { match self { Term::Hm => &["=", "<>"], Term::Ore => &["=", "<>", "<", "<=", ">", ">="], + Term::Bloom => &["@>", "<@"], } } @@ -203,6 +209,7 @@ impl Term { "src/v3/sem/ore_block_u64_8_256/functions.sql", "src/v3/sem/ore_block_u64_8_256/operators.sql", ], + Term::Bloom => &["src/v3/sem/bloom_filter/functions.sql"], } } } @@ -302,7 +309,10 @@ impl Fixture { Some(k) => Some(k.max_value()), None => None, }, - Fixture::Zero => Some(0), + Fixture::Zero => match kind.as_bounded_int() { + Some(_) => Some(0), + None => None, + }, Fixture::Int(n) => Some(n), Fixture::Numeric(_) | Fixture::Text(_) | Fixture::Jsonb(_) | Fixture::Date(_) => None, } @@ -383,6 +393,33 @@ const ORDERED_INT_DOMAINS: &[DomainSpec] = &[ }, ]; +/// Domains for `text`: the ordered shape plus a `_match` domain backed by the +/// `Bloom` term (`@>`/`<@` containment). The ordered subset (`""`, `_eq`, +/// `_ord_ore`, `_ord`) is identical to `ORDERED_INT_DOMAINS`; `_match` is the +/// only addition, so text still runs the standard ordered matrix. +const TEXT_DOMAINS: &[DomainSpec] = &[ + DomainSpec { + suffix: "", + terms: &[], + }, + DomainSpec { + suffix: "_eq", + terms: &[Term::Hm], + }, + DomainSpec { + suffix: "_match", + terms: &[Term::Bloom], + }, + DomainSpec { + suffix: "_ord_ore", + terms: &[Term::Ore], + }, + DomainSpec { + suffix: "_ord", + terms: &[Term::Ore], + }, +]; + /// Builds a `&[Fixture]`. The `int ;` arm (a tt-muncher over `Min`/`Max`/ /// `Zero` and `N()`) range-checks each literal against `` at compile /// time via `const _RANGE_CHECK`, so out-of-range literals do not compile; @@ -474,9 +511,34 @@ pub const DATE: ScalarSpec = ScalarSpec { fixtures: DATE_FIXTURES, }; +/// `text` fixture plaintexts — curated so eq/ord give a lexicographic spread +/// and the match suite has a known substring pair (`"aardvark"`/`"aard"`, +/// sharing 3-grams) and a disjoint value (`"zzzz"`, no shared 3-grams). +/// `"aard"` is the lexicographic `min_pivot`, `"zzzz"` the `max_pivot`, and +/// `"frank"` the interior `mid_pivot`; all three must be present verbatim so the +/// matrix can fetch their ciphertext. All distinct. +/// +/// The empty string is deliberately **not** a fixture: text is an ordered, not +/// signed, scalar (no numeric origin), and `""` encrypts to an empty ORE term +/// whose comparison is undefined (see issue #262). The interior pivot is a real +/// median value, not `String::default()`. +const TEXT_FIXTURES: &[Fixture] = fixtures!(text; + "aard", "aardvark", "alice", "bob", "carol", + "dave", "erin", "frank", "mallory", "trent", "zzzz"); + +/// `text` — an ordered, non-integer, unbounded scalar. Adds a `_match` domain +/// (the `Bloom` term) on top of the ordered shape. Public because the SQLx +/// harness reads `TEXT_VALUES` (materialised below). +pub const TEXT: ScalarSpec = ScalarSpec { + token: "text", + kind: ScalarKind::Text, + domains: TEXT_DOMAINS, + fixtures: TEXT_FIXTURES, +}; + /// The scalar catalog — the single source of truth. Order is significant (it /// drives generation order). New types are appended as their SQL surface lands. -pub const CATALOG: &[ScalarSpec] = &[INT4, INT2, INT8, DATE]; +pub const CATALOG: &[ScalarSpec] = &[INT4, INT2, INT8, DATE, TEXT]; /// Materialise an integer scalar's fixtures into a typed `&'static` slice at /// compile time. This is the **single-sourced** plaintext list the SQLx test @@ -529,6 +591,38 @@ int_values!(INT4_VALUES, i32, INT4); int_values!(INT2_VALUES, i16, INT2); int_values!(INT8_VALUES, i64, INT8); +/// Materialise a `text` scalar's fixtures into a `&'static [&'static str]` at +/// compile time — the single-sourced plaintext list the SQLx matrix reads via +/// `ScalarType::fixture_values()` and the fixture generator encrypts. Unlike +/// `date` (chrono is not `const`-friendly), a `Fixture::Text(&'static str)` is +/// already const, so text materialises a typed slice like the integer kinds. +/// A non-text fixture is a const-eval panic (compile-time guard). +macro_rules! text_values { + ($name:ident, $spec:expr) => { + #[doc = concat!("Distinct plaintext fixture values for `", stringify!($spec), "`, ")] + #[doc = "materialised from its `CATALOG` row (see `text_values!`)."] + pub const $name: &[&'static str] = { + const SPEC: ScalarSpec = $spec; + const N: usize = SPEC.fixtures.len(); + const ARR: [&'static str; N] = { + let mut out = [""; N]; + let mut i = 0; + while i < N { + out[i] = match SPEC.fixtures[i] { + Fixture::Text(s) => s, + _ => panic!("text scalar fixture must be Fixture::Text"), + }; + i += 1; + } + out + }; + &ARR + }; + }; +} + +text_values!(TEXT_VALUES, TEXT); + #[cfg(test)] mod rust_tests { use super::*; @@ -683,6 +777,38 @@ mod term_tests { ] ); } + + #[test] + fn bloom_term_contract() { + let b = Term::Bloom; + assert_eq!(b.json_key(), "bf"); + assert_eq!(b.extractor(), "match_term"); + assert_eq!(b.ctor(), "bloom_filter"); + assert_eq!(b.role(), "match"); + assert_eq!(b.operators(), &["@>", "<@"]); + assert_eq!(b.requires(), &["src/v3/sem/bloom_filter/functions.sql"]); + } + + #[test] + fn bloom_extractor_routes_match_operators() { + let terms = &[Term::Bloom]; + assert_eq!( + Term::extractor_for_operator(terms, "@>"), + Some("match_term") + ); + assert_eq!( + Term::extractor_for_operator(terms, "<@"), + Some("match_term") + ); + assert_eq!(Term::extractor_for_operator(terms, "="), None); + } + + #[test] + fn bloom_role_is_match_not_ord() { + assert_eq!(Term::role_for_terms(&[Term::Bloom]), "match"); + // match is not ord-capable: no aggregates. + assert_ne!(Term::role_for_terms(&[Term::Bloom]), "ord"); + } } #[cfg(test)] @@ -914,9 +1040,24 @@ mod catalog_tests { } #[test] - fn catalog_has_int4_int2_int8_date_in_order() { + fn catalog_has_int4_int2_int8_date_text_in_order() { let tokens: Vec<&str> = CATALOG.iter().map(|s| s.token).collect(); - assert_eq!(tokens, vec!["int4", "int2", "int8", "date"]); + assert_eq!(tokens, vec!["int4", "int2", "int8", "date", "text"]); + } + + #[test] + fn text_spec_is_in_catalog() { + let text = scalar("text"); + assert_eq!(text.kind, ScalarKind::Text); + let suffixes: Vec<_> = text.domains.iter().map(|d| d.suffix).collect(); + assert_eq!(suffixes, vec!["", "_eq", "_match", "_ord_ore", "_ord"]); + } + + #[test] + fn text_match_domain_carries_only_bloom() { + let text = scalar("text"); + let m = text.domains.iter().find(|d| d.suffix == "_match").unwrap(); + assert_eq!(m.terms, &[Term::Bloom]); } /// The three temporal matrix pivots must be present verbatim in DATE's @@ -944,16 +1085,20 @@ mod catalog_tests { } #[test] - fn all_types_share_the_same_domain_shape() { - // Every scalar declares the same four domains with the same terms; - // only the token differs (the matrix-snapshot collapse depends on this). - // Generic over CATALOG, so it covers every type — including new ones — - // and subsumes the old per-type `_maps_to_*_with_four_domains` / - // `_domain_terms_match_manifest` tests (which only restated the - // catalog literal for one token). + fn all_types_share_the_same_ordered_domain_shape() { + // Every scalar declares the same four ordered domains with the same + // terms; only the token differs (the matrix-snapshot collapse depends on + // this). `text` additionally carries a `_match` domain (asserted in + // `text_match_domain_carries_only_bloom`), which is excluded here so the + // shared *ordered* shape stays the invariant across every type. + // Generic over CATALOG, so it covers every type — including new ones. for s in CATALOG { - let shape: Vec<(&str, &[Term])> = - s.domains.iter().map(|d| (d.suffix, d.terms)).collect(); + let shape: Vec<(&str, &[Term])> = s + .domains + .iter() + .filter(|d| d.suffix != "_match") + .map(|d| (d.suffix, d.terms)) + .collect(); assert_eq!( shape, vec![ @@ -962,7 +1107,7 @@ mod catalog_tests { ("_ord_ore", &[Term::Ore][..]), ("_ord", &[Term::Ore][..]), ], - "{} has unexpected domain shape", + "{} has unexpected ordered domain shape", s.token ); } @@ -1027,6 +1172,40 @@ mod values_tests { check(&INT2, INT2_VALUES); check(&INT8, INT8_VALUES); } + + #[test] + // `TEXT_VALUES` is a compile-time const slice, so clippy can prove the + // non-emptiness guard true; keep it as an explicit invariant regardless. + #[allow(clippy::const_is_empty)] + fn text_values_are_distinct_and_nonempty() { + assert!(!TEXT_VALUES.is_empty()); + let mut seen = std::collections::HashSet::new(); + for v in TEXT_VALUES { + assert!(seen.insert(*v), "duplicate text fixture: {v}"); + } + // The interior `mid_pivot` ("frank") must be present; the empty string + // must NOT (text has no numeric origin — see issue #262). + assert!( + TEXT_VALUES.contains(&"frank"), + "TEXT_VALUES must include the mid pivot \"frank\"" + ); + assert!( + !TEXT_VALUES.contains(&""), + "TEXT_VALUES must not include the empty string" + ); + } + + #[test] + fn text_values_match_fixtures_in_order() { + let from_fixtures: Vec<&str> = TEXT_FIXTURES + .iter() + .map(|f| match f { + Fixture::Text(s) => *s, + other => panic!("text fixture must be Fixture::Text, got {other:?}"), + }) + .collect(); + assert_eq!(TEXT_VALUES.to_vec(), from_fixtures); + } } #[cfg(test)] diff --git a/crates/eql-tests-macros/src/lib.rs b/crates/eql-tests-macros/src/lib.rs index e252bca0..8e3e23f1 100644 --- a/crates/eql-tests-macros/src/lib.rs +++ b/crates/eql-tests-macros/src/lib.rs @@ -34,41 +34,55 @@ use syn::parse::{Parse, ParseStream}; use syn::punctuated::Punctuated; use syn::{bracketed, Ident, Token, Type}; -/// One `token => rust_type` entry, with an optional trailing `[temporal]` flag. +/// A recognised entry marker — a non-integer scalar kind that diverges from the +/// generated-integer path. Both variants are **hand-written** (their +/// `impl ScalarType` is authored in `scalar_domains.rs`, not generated) and use +/// **pivot-presence** fixture asserts instead of the integer signed-extreme +/// ones; they differ only in the `scalar_fixture!` discriminator they stamp. +#[derive(Clone, Copy, PartialEq, Eq, Debug)] +enum Marker { + /// `[temporal]` — a chrono-backed ordered scalar (`date`). + Temporal, + /// `[text]` — an unbounded, lexicographically-ordered scalar (`text`) that + /// additionally carries a `match` capability, so its fixture stamps the + /// `Match` index. + Text, +} + +/// One `token => rust_type` entry, with an optional trailing `[marker]`. struct ScalarEntry { /// Postgres type token (`int4`); also the fixture/domain suffix and the /// matrix `suite` ident. token: Ident, /// Rust plaintext type (`i32`). rust_type: Type, - /// Whether this entry is a **temporal** (chrono-backed) scalar rather than a - /// fixed-width integer. Declared explicitly via a trailing `[temporal]` - /// marker in the dispatch list (`date => chrono::NaiveDate [temporal]`) - /// rather than sniffed from the Rust type path — so a temporal type that - /// isn't chrono-spelled, or a non-temporal type whose path happens to - /// contain `DateTime`, cannot be misclassified. - temporal: bool, + /// The explicit `[marker]` on this entry, or `None` for a fixed-width + /// integer. Declared in the dispatch list (`date => chrono::NaiveDate + /// [temporal]`, `text => String [text]`) rather than sniffed from the Rust + /// type path, so a misspelled or unusually-spelled type cannot be + /// misclassified. + marker: Option, } /// The recognised optional entry markers, written in `[brackets]` after the -/// rust type (`date => chrono::NaiveDate [temporal]`). `temporal` is the only -/// one today; a new marker is added here so the accepted set stays a single -/// source of truth — `parse_optional_marker` validates against this slice and -/// the rejection message lists it verbatim. -const SUPPORTED_MARKERS: &[&str] = &["temporal"]; - -/// Parse the optional trailing `[marker]` on a scalar entry, returning whether -/// the `temporal` marker was present. +/// rust type (`date => chrono::NaiveDate [temporal]`, `text => String [text]`). +/// A new marker is added here so the accepted set stays a single source of +/// truth — `parse_optional_marker` validates against this slice and the +/// rejection message lists it verbatim. +const SUPPORTED_MARKERS: &[&str] = &["temporal", "text"]; + +/// Parse the optional trailing `[marker]` on a scalar entry, returning the +/// recognised marker (or `None` for an ordinary integer scalar). /// -/// Absent brackets → `false` (an ordinary integer scalar). When brackets are -/// present they must hold exactly one recognised identifier: an unknown marker -/// (`[temporial]`), empty brackets (`[]`), or trailing junk (`[temporal foo]`) -/// are all hard parse errors. The whole point of the explicit marker is that a -/// typo fails loudly rather than silently defaulting an entry to integer, so -/// the parse is strict on every malformed shape, not just unknown names. -fn parse_optional_marker(input: ParseStream) -> syn::Result { +/// Absent brackets → `None`. When brackets are present they must hold exactly +/// one recognised identifier: an unknown marker (`[temporial]`), empty brackets +/// (`[]`), or trailing junk (`[temporal foo]`) are all hard parse errors. The +/// whole point of the explicit marker is that a typo fails loudly rather than +/// silently defaulting an entry to integer, so the parse is strict on every +/// malformed shape, not just unknown names. +fn parse_optional_marker(input: ParseStream) -> syn::Result> { if !input.peek(syn::token::Bracket) { - return Ok(false); + return Ok(None); } let content; bracketed!(content in input); @@ -85,21 +99,21 @@ fn parse_optional_marker(input: ParseStream) -> syn::Result { } let name = marker.to_string(); - if !SUPPORTED_MARKERS.contains(&name.as_str()) { - let supported = SUPPORTED_MARKERS - .iter() - .map(|m| format!("`{m}`")) - .collect::>() - .join(", "); - return Err(syn::Error::new( - marker.span(), - format!("unknown scalar marker `{name}`; supported markers: {supported}"), - )); + match name.as_str() { + "temporal" => Ok(Some(Marker::Temporal)), + "text" => Ok(Some(Marker::Text)), + _ => { + let supported = SUPPORTED_MARKERS + .iter() + .map(|m| format!("`{m}`")) + .collect::>() + .join(", "); + Err(syn::Error::new( + marker.span(), + format!("unknown scalar marker `{name}`; supported markers: {supported}"), + )) + } } - - // Only `temporal` flips the temporal flag; a future non-temporal marker - // would pass validation above but leave this `false`. - Ok(name == "temporal") } impl Parse for ScalarEntry { @@ -107,28 +121,33 @@ impl Parse for ScalarEntry { let token: Ident = input.parse()?; input.parse::]>()?; let rust_type: Type = input.parse()?; - let temporal = parse_optional_marker(input)?; + let marker = parse_optional_marker(input)?; Ok(ScalarEntry { token, rust_type, - temporal, + marker, }) } } impl ScalarEntry { - /// Whether this entry is a **temporal** (chrono-backed) scalar rather than a - /// fixed-width integer, as declared by the `[temporal]` marker. It drives - /// two divergences: + /// Whether this entry's `impl ScalarType` is **hand-written** (in + /// `scalar_domains.rs`) rather than macro-generated. True for every marked + /// entry — both `[temporal]` and `[text]`: their values can't be a `const` + /// slice (or the integer materialiser doesn't apply) and their pivots are + /// explicit sentinels, so `emit_scalar_type_impls` skips them. Marked + /// entries also use the **pivot-presence** fixture asserts instead of the + /// integer-only signed-extreme ones (`::MIN`, `contains(&0)`, + /// `any(|v| v < 0)`), which don't typecheck for a date or string. /// - /// 1. The `impl ScalarType` for a temporal scalar is **hand-written** in - /// `scalar_domains.rs` (chrono values can't be a `const` slice and the - /// pivots are explicit sentinels), so `emit_scalar_type_impls` skips it. - /// 2. The integer-only fixture asserts (`::MIN`, `contains(&0)`, - /// `any(|v| v < 0)`) don't typecheck for a date, so `scalar_fixture!` - /// stamps a temporal (pivot-presence) variant instead. - fn is_temporal(&self) -> bool { - self.temporal + /// Renamed from the earlier `is_temporal()`: the predicate never tested + /// "is this a date" — it tested "does this entry diverge from the generated + /// integer path". `[temporal]` was simply its only member at the time. When + /// `[text]` joined the same category, the name was generalised to the shared + /// axis (hand-written impl + pivot-presence fixtures) it actually gates, so + /// the call sites read as intent rather than as a single-kind check. + fn is_hand_written(&self) -> bool { + self.marker.is_some() } } @@ -158,35 +177,52 @@ fn values_const_ident(token: &Ident) -> Ident { /// Emit one `impl ScalarType for ` per entry. See /// [`emit_scalar_type_impls`]. fn scalar_type_impls_tokens(list: &ScalarList) -> TokenStream2 { - // Temporal scalars hand-write their `impl ScalarType` (see `is_temporal`); - // only integer scalars get a macro-generated impl. - let impls = list.entries.iter().filter(|e| !e.is_temporal()).map(|e| { - let token_str = e.token.to_string(); - let rust_type = &e.rust_type; - let values = values_const_ident(&e.token); - quote! { - impl ScalarType for #rust_type { - const PG_TYPE: &'static str = #token_str; - - /// The catalog `eql_scalars::*_VALUES` list — the same values - /// the fixture generator encrypts, so the oracle can't drift - /// from the fixture. - fn fixture_values() -> &'static [#rust_type] { - ::eql_scalars::#values + // Marked scalars (`[temporal]` / `[text]`) hand-write their `impl + // ScalarType` (see `is_hand_written`); only integer scalars get a + // macro-generated impl. + let impls = list + .entries + .iter() + .filter(|e| !e.is_hand_written()) + .map(|e| { + let token_str = e.token.to_string(); + let rust_type = &e.rust_type; + let values = values_const_ident(&e.token); + quote! { + impl ScalarType for #rust_type { + const PG_TYPE: &'static str = #token_str; + + /// The catalog `eql_scalars::*_VALUES` list — the same values + /// the fixture generator encrypts, so the oracle can't drift + /// from the fixture. + fn fixture_values() -> &'static [#rust_type] { + ::eql_scalars::#values + } } - /// Integer scalars pivot on their inherent `MIN`/`MAX` consts; - /// the fixture lists include both (`fixtures!(int …; Min, …, Max)`). - fn min_pivot() -> #rust_type { - <#rust_type>::MIN + impl OrderedScalar for #rust_type { + /// Integer scalars pivot on their inherent `MIN`/`MAX` consts; + /// the fixture lists include both (`fixtures!(int …; Min, …, Max)`). + fn min_pivot() -> #rust_type { + <#rust_type>::MIN + } + + fn max_pivot() -> #rust_type { + <#rust_type>::MAX + } + // `mid_pivot` inherits the default `Self::default()` = `0`, + // which is the numeric origin and a `Zero` fixture row. } - fn max_pivot() -> #rust_type { - <#rust_type>::MAX + impl SignedScalar for #rust_type { + /// Integers are signed about `0`; the fixtures straddle it + /// (negatives below, positives above). + fn origin() -> #rust_type { + 0 + } } } - } - }); + }); quote! { #(#impls)* } } @@ -198,19 +234,26 @@ fn scalar_fixture_modules_tokens(list: &ScalarList) -> TokenStream2 { let rust_type = &e.rust_type; let mod_ident = format_ident!("eql_v2_{}", e.token); let fixture_name = format!("eql_v2_{}", token_str); - if e.is_temporal() { - // Temporal scalars have no `eql_scalars::_VALUES` const (chrono - // is not `const`-friendly). The values come from the harness - // accessor (`_values()`), and the fixture stamps the - // `temporal` kind so the integer-only signed-extreme asserts are - // replaced by a pivot-presence assert. The accessor name mirrors - // the token (`date` -> `date_values`). + if let Some(marker) = e.marker { + // Marked (hand-written) scalars have no `eql_scalars::_VALUES` + // const usable by the integer materialiser (chrono is not + // `const`-friendly; text is owned `String`). The values come from + // the harness accessor (`_values()`), and the fixture stamps + // the kind-specific discriminator so the integer-only signed-extreme + // asserts are replaced by a pivot-presence assert. The `text` + // discriminator additionally adds the `Match` index so generated + // payloads carry `bf`. The accessor name mirrors the token + // (`date` -> `date_values`, `text` -> `text_values`). let values_fn = format_ident!("{}_values", e.token); + let discriminator = match marker { + Marker::Temporal => format_ident!("temporal"), + Marker::Text => format_ident!("text"), + }; quote! { - #[doc = concat!("`eql_v2_", #token_str, "` temporal scalar fixture — generated by `scalar_types!`.")] + #[doc = concat!("`eql_v2_", #token_str, "` hand-written scalar fixture — generated by `scalar_types!`.")] pub mod #mod_ident { use crate::scalar_domains::#values_fn as values; - crate::scalar_fixture!(temporal, #fixture_name, #rust_type, values()); + crate::scalar_fixture!(#discriminator, #fixture_name, #rust_type, values()); } } } else { @@ -406,12 +449,12 @@ mod tests { assert!(dispatch.contains(r#""date" =>"#)); } - /// Parse a single entry, asserting it parses, and return whether it is - /// temporal. Keeps the per-shape assertions below to one line each. - fn parse_entry_is_temporal(src: &str) -> bool { + /// Parse a single entry, asserting it parses, and return its marker. + /// Keeps the per-shape assertions below to one line each. + fn parse_entry_marker(src: &str) -> Option { syn::parse_str::(src) .unwrap_or_else(|e| panic!("`{src}` should parse: {e}")) - .is_temporal() + .marker } /// Parse a single entry expecting a parse error, returning the message. @@ -425,22 +468,44 @@ mod tests { #[test] fn no_marker_is_integer() { // No brackets → integer, even when the type path mentions chrono: - // temporal-ness is declared, never inferred from the rust type. - assert!(!parse_entry_is_temporal("int4 => i32")); - assert!(!parse_entry_is_temporal("date => chrono::NaiveDate")); + // the kind is declared, never inferred from the rust type. + assert_eq!(parse_entry_marker("int4 => i32"), None); + assert_eq!(parse_entry_marker("date => chrono::NaiveDate"), None); } #[test] fn temporal_marker_sets_the_flag() { - assert!(parse_entry_is_temporal( - "date => chrono::NaiveDate [temporal]" - )); + assert_eq!( + parse_entry_marker("date => chrono::NaiveDate [temporal]"), + Some(Marker::Temporal) + ); // Marker binds to its own entry, not the next one, across a list. let list = syn::parse_str::("date => chrono::NaiveDate [temporal], int4 => i32,") .unwrap(); - assert!(list.entries[0].is_temporal()); - assert!(!list.entries[1].is_temporal()); + assert_eq!(list.entries[0].marker, Some(Marker::Temporal)); + assert_eq!(list.entries[1].marker, None); + } + + #[test] + fn text_marker_skips_generated_impl() { + let list = syn::parse_str::("text => String [text],").unwrap(); + let out = norm(&scalar_type_impls_tokens(&list)); + assert!( + !out.contains("impl ScalarType for String"), + "text marker must skip the generated impl (hand-written instead)" + ); + } + + #[test] + fn text_marker_stamps_text_fixture_and_accessor() { + let list = syn::parse_str::("text => String [text],").unwrap(); + assert_eq!(list.entries[0].marker, Some(Marker::Text)); + let mods = norm(&scalar_fixture_modules_tokens(&list)); + assert!(mods.contains("pub mod eql_v2_text")); + // text discriminator (drives the Match index) + the text_values accessor. + assert!(mods.contains("text ,"), "got: {mods}"); + assert!(mods.contains("text_values"), "got: {mods}"); } #[test] diff --git a/docs/reference/adding-a-scalar-encrypted-domain-type.md b/docs/reference/adding-a-scalar-encrypted-domain-type.md index 2f95c20c..074836c0 100644 --- a/docs/reference/adding-a-scalar-encrypted-domain-type.md +++ b/docs/reference/adding-a-scalar-encrypted-domain-type.md @@ -22,8 +22,9 @@ The whole SQL surface is **generated** from a single Rust source of truth: the rendered by the [`eql-codegen`](../../crates/eql-codegen/) crate. There is no TOML manifest and no Python — adding a type is adding one `ScalarSpec` row, validated by the compiler plus catalog `#[test]`s. The reference type is -`eql_v3.int4`. **`text` and `jsonb` are out of scope** for this materializer -(see §7). +`eql_v3.int4`; `eql_v3.text` is the worked non-integer example (ordered + +equality + a `match` capability via the `Bloom` term). **`jsonb` remains out of +scope** for this materializer (see §7). --- @@ -115,10 +116,11 @@ than a runtime validator: `json_key` / `extractor` / `returns` / `ctor` values are the cross-schema SQL contract — changing one is a generated-SQL behaviour change, not a refactor: -| Term | JSON key | Extractor | Returns | Operators | -| ----- | -------- | ----------- | -------------------------------- | -------------------------- | -| `Hm` | `hm` | `eq_term` | `eql_v3.hmac_256` | `=` `<>` | -| `Ore` | `ob` | `ord_term` | `eql_v3.ore_block_u64_8_256` | `=` `<>` `<` `<=` `>` `>=` | +| Term | JSON key | Extractor | Returns | Operators | +| ------- | -------- | ------------ | -------------------------------- | -------------------------- | +| `Hm` | `hm` | `eq_term` | `eql_v3.hmac_256` | `=` `<>` | +| `Ore` | `ob` | `ord_term` | `eql_v3.ore_block_u64_8_256` | `=` `<>` `<` `<=` `>` `>=` | +| `Bloom` | `bf` | `match_term` | `eql_v3.bloom_filter` | `@>` `<@` | A type that needs a non-ORE equality term on an ordered domain needs a **new `Term`**, not a catalog flag. Adding a term is a code change to the `Term` @@ -198,16 +200,21 @@ jsonb-backed and token-driven): strings into a `LazyLock>` and exposes them via a `date_values()` accessor; `ScalarType::fixture_values()` returns a borrow of that. The fixtures must include the three pivot plaintexts verbatim — for - `date`: `"1900-01-01"` (min), `"1970-01-01"` (zero = `NaiveDate::default()`), - `"2099-12-31"` (max) — guarded by `temporal_fixtures_include_pivot_plaintexts`. -- **The pivot trait, not `Self::MIN`/`MAX`.** `ScalarType::fixture_values()` is a - method (not a `const`), and the comparison pivots come from - `ScalarType::min_pivot()` / `max_pivot()` (zero stays `Default::default()`). - Integer impls return `Self::MIN`/`Self::MAX` (emitted by the proc-macro); - temporal impls return explicit sentinel dates and are **hand-written** in - `scalar_domains.rs` (the macro emits only integer impls). `to_sql_literal` is - overridden to single-quote the value (`'1970-01-01'`), since a bare `Display` - date is not a valid SQL literal. + `date`: `"1900-01-01"` (min), `"1970-01-01"` (mid = the epoch), `"2099-12-31"` + (max) — guarded by `date_pivots_are_in_fixture_values`. +- **The pivot traits, not `Self::MIN`/`MAX`.** Pivots live on a small trait + hierarchy over `ScalarType` (`scalar_domains.rs`): **`OrderedScalar`** carries + the `min_pivot()` / `max_pivot()` boundaries and the interior `mid_pivot()` + (default `Self::default()`); **`SignedScalar: OrderedScalar`** adds `origin()` + (the numeric zero / sign boundary). Integer impls (`min=MIN`, `max=MAX`, + `mid` inherits `0`, `origin=0`) are emitted by the proc-macro; temporal impls + return explicit sentinel dates (`mid` inherits the epoch = `origin()`) and are + **hand-written** in `scalar_domains.rs`. `date` is both `OrderedScalar` and + `SignedScalar`; `text` is `OrderedScalar` only (lexicographic order has no + origin, so it overrides `mid_pivot()` with a real median fixture rather than + the degenerate `String::default()` empty string — see issue #262). + `to_sql_literal` is overridden to single-quote the value (`'1970-01-01'`), + since a bare `Display` date is not a valid SQL literal. - **The sqlx `chrono` feature.** The test crate enables sqlx's `chrono` feature (and depends on `chrono` directly) so `Encode`/`Decode`/`Type` resolve for `NaiveDate`. The integer-only fixture asserts (`::MIN`, `contains(&0)`, @@ -244,12 +251,16 @@ the `scalar_types!` list also fails the `generate_for_token` catch-all loudly at fixture-generation time. The coverage these registrations unlock comes from the `ordered_numeric_matrix!` -convention wrapper in `tests/sqlx/src/matrix.rs`: one `impl ScalarType` plus a -single invocation taking `suite`, `scalar`, and `eql_type`. The matrix derives -its comparison pivots — the scalar's `MIN`, `MAX`, and zero -(`Default::default()`) — from the type rather than a hand-written list, so the -invocation carries no pivot argument. Equality-only scalars use the sibling -`eq_only_scalar_matrix!`. The `matrix.rs` module header is the canonical, +convention wrapper in `tests/sqlx/src/matrix.rs`: one `impl ScalarType` (+ +`OrderedScalar`) plus a single invocation taking `suite`, `scalar`, and +`eql_type`. The matrix derives its comparison pivots — the scalar's +`min_pivot()`, `max_pivot()`, and the interior `mid_pivot()` — from +`OrderedScalar` rather than a hand-written list, so the invocation carries no +pivot argument. The pivot *sweep* is uniform across every ordered type (one +canonical snapshot); the signed-only sign-boundary test (`SignedScalar`, +`int`/`date`) lives outside `scalars::` in `encrypted_domain/signed.rs`, so a +`text` instantiation of it is a compile error and it never enters the inventory +snapshot. Equality-only scalars use the sibling `eq_only_scalar_matrix!`. The `matrix.rs` module header is the canonical, current list of the categories the matrix emits (sanity, correctness, cross-shape, supported-NULL, blocker raises, index engagement, ORDER BY, ORDER BY USING) — read it rather than duplicating a count here. For ordered `int4`, @@ -400,14 +411,15 @@ enables them — and the native `jsonb` operators are blocker-only. The wrapper/blocker split per domain (the 44-operator total never moves): -| Domain terms | Extractors | Wrappers | Blockers | Functions | Operators | -| ---------------- | ---------: | -------: | -------: | --------: | --------: | -| none | 0 | 0 | 44 | 44 | 44 | -| `&[Term::Hm]` | 1 (`eq_term`) | 6 | 38 | 45 | 44 | -| `&[Term::Ore]` | 1 (`ord_term`) | 18 | 26 | 45 | 44 | +| Domain terms | Extractors | Wrappers | Blockers | Functions | Operators | +| ----------------- | ---------: | -------: | -------: | --------: | --------: | +| none | 0 | 0 | 44 | 44 | 44 | +| `&[Term::Hm]` | 1 (`eq_term`) | 6 | 38 | 45 | 44 | +| `&[Term::Bloom]` | 1 (`match_term`) | 6 | 38 | 45 | 44 | +| `&[Term::Ore]` | 1 (`ord_term`) | 18 | 26 | 45 | 44 | -Six wrappers for `Hm` = `=` and `<>` × three shapes; eighteen for `Ore` = six -operators × three shapes. +Six wrappers for `Hm` = `=` and `<>` × three shapes; six for `Bloom` = `@>` and +`<@` × three shapes; eighteen for `Ore` = six operators × three shapes. **Untyped-literal resolver edge.** PostgreSQL's operator resolver still prefers the built-in `jsonb` operator for untyped string literals in forms such as @@ -652,14 +664,25 @@ golden reference under `tests/codegen/reference/int4/`. --- -## 7. Out of scope — `text` and `jsonb` - -`text` and `jsonb` are **not** materialised through this generator. The -`ScalarKind` enum carries `Text` / `Numeric` / `Jsonb` variants and the -`Fixture` enum carries their string-backed shapes at the capability layer, but -`CATALOG` declares only the ordered scalars today — the fixed-width integers -(`int2` / `int4` / `int8`) and the temporal `date` — so no `text` / `jsonb` SQL -surface is generated. Text and JSONB encrypted behaviour lives on the composite +## 7. `text` (in scope) and `jsonb` (out of scope) + +`text` **is** materialised through this generator. It is the worked example of +an ordered, non-integer, unbounded scalar: it hand-writes its `impl ScalarType` ++ `OrderedScalar` (the `[text]` harness marker, mirroring `date`'s `[temporal]`) +with explicit lexicographic `min`/`max` pivots instead of `::MIN`/`::MAX` and a +real median `mid_pivot()`. It is **`OrderedScalar` but not `SignedScalar`** — +lexicographic text has no numeric origin, so it does not get the signed-only +sign-boundary test, and the empty string is deliberately not a fixture (`""` +encrypts to an empty ORE term; issue #262). `text` is also the first type to add +a new index `Term` (`Bloom`) — giving it a `match` capability (`@>`/`<@` +bloom-filter containment on the `eql_v3.text_match` domain) on top of equality +(`Hm`) and ordering (`Ore`). Match is deliberately **not** SQL `LIKE`: it is +probabilistic ngram-bloom containment, exposed only on `text_match`, and never +backs equality. + +`jsonb` remains **out of scope**. The `ScalarKind`/`Fixture` enums carry its +string-backed shape at the capability layer, but no `jsonb` SQL surface is +generated — it needs a separate SQL design beyond this ordered-scalar +materializer. JSONB encrypted behaviour today lives on the composite `eql_v2_encrypted` type and its hand-written operator surface in `src/encrypted/` -and `src/operators/`, not the scalar materializer. `jsonb` in particular needs a -separate SQL design beyond this ordered-scalar materializer. +and `src/operators/`, not the scalar materializer. diff --git a/src/v3/sem/bloom_filter/functions.sql b/src/v3/sem/bloom_filter/functions.sql new file mode 100644 index 00000000..9e8dbfe9 --- /dev/null +++ b/src/v3/sem/bloom_filter/functions.sql @@ -0,0 +1,49 @@ +-- REQUIRE: src/v3/schema.sql +-- REQUIRE: src/v3/sem/bloom_filter/types.sql + +--! @file v3/sem/bloom_filter/functions.sql +--! @brief Extractor for the eql_v3 Bloom-filter SEM index term. +--! +--! jsonb-only subset of src/bloom_filter/functions.sql. The encrypted-column +--! overloads are intentionally omitted — the eql_v3 scalar domains extract from +--! the jsonb payload directly via a cast to the domain. (Doc comments +--! deliberately avoid naming eql_v2 symbols so the self-containment grep stays +--! clean.) + +--! @brief Test whether a jsonb payload carries a Bloom-filter (`bf`) term. +--! @param val jsonb The encrypted payload. +--! @return boolean True when the `bf` key is present and non-null. +CREATE FUNCTION eql_v3.has_bloom_filter(val jsonb) + RETURNS boolean + IMMUTABLE STRICT PARALLEL SAFE + SET search_path = pg_catalog, extensions, public +AS $$ + BEGIN + RETURN val ? 'bf' AND val ->> 'bf' IS NOT NULL; + END; +$$ LANGUAGE plpgsql; + +--! @brief Extract the Bloom-filter index term from a jsonb payload. +--! +--! Inlinable single-statement SQL — the planner can fold this into the calling +--! query so the functional GIN index built on `eql_v3.match_term(col)` (which +--! calls this) engages structurally. Mirrors `eql_v3.hmac_256(jsonb)`: no RAISE +--! and no pinned `search_path`. Returns NULL when `bf` is absent rather than +--! raising — the `match` capability is tied to the domain, whose CHECK already +--! guarantees `bf` is present, so a missing key can only occur on raw jsonb +--! outside the domain (where NULL, like the HMAC extractor, is the right +--! answer). An empty `bf` array yields an empty filter (contains nothing, +--! contained by everything), matching set-containment semantics. +--! +--! @param val jsonb The encrypted payload. +--! @return eql_v3.bloom_filter The `bf` array as a smallint[] domain value, or +--! NULL when `bf` is absent. +CREATE FUNCTION eql_v3.bloom_filter(val jsonb) + RETURNS eql_v3.bloom_filter + LANGUAGE sql + IMMUTABLE STRICT PARALLEL SAFE +AS $$ + SELECT CASE WHEN val ? 'bf' + THEN ARRAY(SELECT jsonb_array_elements(val -> 'bf'))::eql_v3.bloom_filter + END +$$; diff --git a/src/v3/sem/bloom_filter/types.sql b/src/v3/sem/bloom_filter/types.sql new file mode 100644 index 00000000..b319825e --- /dev/null +++ b/src/v3/sem/bloom_filter/types.sql @@ -0,0 +1,14 @@ +-- REQUIRE: src/v3/schema.sql + +--! @file v3/sem/bloom_filter/types.sql +--! @brief Self-contained eql_v3 Bloom-filter SEM index-term type. + +--! @brief Bloom-filter index term: a bit array stored as smallint[]. +--! +--! Backs the `match` capability (`@>` / `<@`) on `eql_v3.text_match`. The +--! filter is read from the `bf` field of an encrypted jsonb payload. Native +--! `smallint[]` array-containment (`@>`/`<@`) is inherited through the domain, +--! so this type needs no custom operators. +--! +--! @note Self-contained: references no eql_v2 symbol. +CREATE DOMAIN eql_v3.bloom_filter AS smallint[]; diff --git a/src/v3/sem/ore_block_u64_8_256/functions.sql b/src/v3/sem/ore_block_u64_8_256/functions.sql index ccc6a817..4558263c 100644 --- a/src/v3/sem/ore_block_u64_8_256/functions.sql +++ b/src/v3/sem/ore_block_u64_8_256/functions.sql @@ -57,10 +57,8 @@ CREATE FUNCTION eql_v3.ore_block_u64_8_256(val jsonb) SET search_path = pg_catalog, extensions, public AS $$ BEGIN - IF val IS NULL THEN - RETURN NULL; - END IF; - + -- Declared STRICT: PostgreSQL returns NULL for a NULL argument without + -- entering the body, so no explicit `val IS NULL` guard is needed. IF eql_v3.has_ore_block_u64_8_256(val) THEN RETURN eql_v3.jsonb_array_to_ore_block_u64_8_256(val->'ob'); END IF; diff --git a/tasks/pin_search_path.sql b/tasks/pin_search_path.sql index 75eed567..66fb3008 100644 --- a/tasks/pin_search_path.sql +++ b/tasks/pin_search_path.sql @@ -254,8 +254,8 @@ BEGIN -- reaches functional-index matching if these inner functions stay inlinable -- (no SET, IMMUTABLE). The generated extractors/wrappers themselves are -- spared by the jsonb-DOMAIN structural skip below; these SEM functions take - -- a composite (ore_block) or raw jsonb (hmac_256) arg, so they need an - -- explicit entry here. + -- a composite (ore_block) or raw jsonb (hmac_256, bloom_filter) arg, so they + -- need an explicit entry here. n.nspname = 'eql_v3' AND ( (p.pronargs = 2 @@ -265,6 +265,9 @@ BEGIN OR (p.pronargs = 1 AND p.proname = 'hmac_256' AND p.proargtypes[0] = jsonb_oid) + OR (p.pronargs = 1 + AND p.proname = 'bloom_filter' + AND p.proargtypes[0] = jsonb_oid) ) ); diff --git a/tasks/test/splinter.sh b/tasks/test/splinter.sh index 282b485b..cdb5d2f5 100755 --- a/tasks/test/splinter.sh +++ b/tasks/test/splinter.sh @@ -109,6 +109,9 @@ function_search_path_mutable eql_v2 grouped_value function Aggregate: same as mi # tasks/pin_search_path.sql and do not surface here. function_search_path_mutable eql_v3 eq_term function HMAC equality term extractor for the eql_v3 *_eq domains: returns eql_v3.hmac_256. Must inline so `eql_v3.eq_term(col)` folds into the calling query and matches the functional hash/btree index built on the same expression. SET search_path would disable SQL function inlining (see PostgreSQL inline_function). function_search_path_mutable eql_v3 ord_term function ORE-block order term extractor for the eql_v3 ordered domains: returns eql_v3.ore_block_u64_8_256 (carrying the main DEFAULT btree opclass). Used inside the inlinable comparison wrappers and as the functional-index expression USING btree (eql_v3.ord_term(col)); must inline. Covers both ord_term overloads (eql_v3.int4_ord, eql_v3.int4_ord_ore). +function_search_path_mutable eql_v3 match_term function Bloom-filter match term extractor for the eql_v3 *_match domains: returns eql_v3.bloom_filter. Used inside the inlinable @>/<@ containment wrappers and as the functional-index expression USING gin (eql_v3.match_term(col)); must inline so the GIN index engages. SET search_path would disable SQL function inlining. +function_search_path_mutable eql_v3 contains function Containment (@>) comparison wrapper on the eql_v3 *_match domains. Inlines to `match_term(a) @> match_term(b)`; must reach the functional GIN index on eql_v3.match_term(col) for bloom-filter match to engage Bitmap Index Scan. +function_search_path_mutable eql_v3 contained_by function Contained-by (<@) comparison wrapper on the eql_v3 *_match domains. Same rationale as eql_v3.contains. function_search_path_mutable eql_v3 eq function Equality comparison wrapper on the eql_v3 domains. Inlines to `eq_term(a) = eq_term(b)`; must reach the functional index on eql_v3.eq_term(col) for bare-form equality to engage Index Scan. Covers the converged eq wrappers on the eql_v3 int4 variants. function_search_path_mutable eql_v3 neq function Inequality comparison wrapper on the eql_v3 domains. Same rationale as eql_v3.eq. function_search_path_mutable eql_v3 lt function Less-than comparison wrapper on the eql_v3 ordered domains. Inlines to `ord_term(a) < ord_term(b)`; must reach the functional btree index on eql_v3.ord_term(col) for range queries to engage Index Scan. @@ -124,6 +127,7 @@ function_search_path_mutable eql_v3 ore_block_u64_8_256_lte function Inner compa function_search_path_mutable eql_v3 ore_block_u64_8_256_gt function Inner comparator for the eql_v3 ore_block_u64_8_256 `>` operator. Same rationale as eql_v3.ore_block_u64_8_256_eq. function_search_path_mutable eql_v3 ore_block_u64_8_256_gte function Inner comparator for the eql_v3 ore_block_u64_8_256 `>=` operator. Same rationale as eql_v3.ore_block_u64_8_256_eq. function_search_path_mutable eql_v3 hmac_256 function HMAC equality extractor for the eql_v3 SEM fork: inlinable SQL (jsonb) constructor used inside eql_v3.eq_term. Must inline so the functional hash/btree index on eql_v3.eq_term(col) engages. Mirrors eql_v2.hmac_256. +function_search_path_mutable eql_v3 bloom_filter function Bloom-filter match extractor for the eql_v3 SEM fork: inlinable SQL (jsonb) constructor used inside eql_v3.match_term. Must inline so the functional GIN index on eql_v3.match_term(col) engages. Mirrors eql_v3.hmac_256. function_search_path_mutable eql_v3 jsonb_array_to_bytea_array function Hand-written jsonb→bytea[] helper for the eql_v3 SEM fork: inlinable SQL (no SET, IMMUTABLE). Reached per-encrypted-value through eql_v3.ore_block_u64_8_256; must inline so the planner can fold it into the calling query. Pinned by neither the structural skip (it takes bare jsonb, not a jsonb-backed domain) nor an inline-critical OID clause — it carries the documented `eql-inline-critical` COMMENT marker that tasks/pin_search_path.sql honours. The eql_v2 copy stays plpgsql (pinned) by design. function_search_path_mutable eql_v3 jsonb_array_to_ore_block_u64_8_256 function Hand-written jsonb→ore_block composite helper for the eql_v3 SEM fork: inlinable SQL (no SET, IMMUTABLE). Same rationale as eql_v3.jsonb_array_to_bytea_array — reached per-encrypted-value through eql_v3.ore_block_u64_8_256, carries the `eql-inline-critical` COMMENT marker. The eql_v2 copy stays plpgsql (pinned) by design. ALLOW diff --git a/tests/sqlx/snapshots/README.md b/tests/sqlx/snapshots/README.md index ed363608..dd4715ad 100644 --- a/tests/sqlx/snapshots/README.md +++ b/tests/sqlx/snapshots/README.md @@ -12,6 +12,14 @@ modulo the type token (the matrix tests are macro-generated from one single canonical set plus a per-type normalize-and-compare carries the same signal at a fraction of the committed surface. +The "no per-type variation" property is preserved by design: every ordered +scalar sweeps the same three `OrderedScalar` pivots (`min`/`mid`/`max`), so the +`_pivot_mid_*` arms are identical modulo token across `int`/`date`/`text`. The +**signed-only** sign-boundary test (`SignedScalar`, `int`/`date` only) lives +*outside* the `scalars::::` namespace (in `encrypted_domain/signed.rs`, +mirroring the `text_match` suites), so it is deliberately invisible to this +inventory — keeping one canonical set rather than per-capability snapshots. + ## What it guards The SQLx assertions verify that the tests which run produce the right results. diff --git a/tests/sqlx/snapshots/matrix_tests.txt b/tests/sqlx/snapshots/matrix_tests.txt index 2cdfc22b..46cb3926 100644 --- a/tests/sqlx/snapshots/matrix_tests.txt +++ b/tests/sqlx/snapshots/matrix_tests.txt @@ -7,10 +7,10 @@ scalars::::matrix__eq_count_path_cast scalars::::matrix__eq_count_typed_column scalars::::matrix__eq_eq_pivot_max_correctness scalars::::matrix__eq_eq_pivot_max_cross_shape +scalars::::matrix__eq_eq_pivot_mid_correctness +scalars::::matrix__eq_eq_pivot_mid_cross_shape scalars::::matrix__eq_eq_pivot_min_correctness scalars::::matrix__eq_eq_pivot_min_cross_shape -scalars::::matrix__eq_eq_pivot_zero_correctness -scalars::::matrix__eq_eq_pivot_zero_cross_shape scalars::::matrix__eq_eq_supported_null scalars::::matrix__eq_gt_blocker scalars::::matrix__eq_gte_blocker @@ -21,10 +21,10 @@ scalars::::matrix__eq_lte_blocker scalars::::matrix__eq_native_absent_ops scalars::::matrix__eq_neq_pivot_max_correctness scalars::::matrix__eq_neq_pivot_max_cross_shape +scalars::::matrix__eq_neq_pivot_mid_correctness +scalars::::matrix__eq_neq_pivot_mid_cross_shape scalars::::matrix__eq_neq_pivot_min_correctness scalars::::matrix__eq_neq_pivot_min_cross_shape -scalars::::matrix__eq_neq_pivot_zero_correctness -scalars::::matrix__eq_neq_pivot_zero_cross_shape scalars::::matrix__eq_neq_supported_null scalars::::matrix__eq_path_op_blockers scalars::::matrix__eq_payload_check @@ -50,47 +50,47 @@ scalars::::matrix__ord_count_path_cast scalars::::matrix__ord_count_typed_column scalars::::matrix__ord_eq_pivot_max_correctness scalars::::matrix__ord_eq_pivot_max_cross_shape +scalars::::matrix__ord_eq_pivot_mid_correctness +scalars::::matrix__ord_eq_pivot_mid_cross_shape scalars::::matrix__ord_eq_pivot_min_correctness scalars::::matrix__ord_eq_pivot_min_cross_shape -scalars::::matrix__ord_eq_pivot_zero_correctness -scalars::::matrix__ord_eq_pivot_zero_cross_shape scalars::::matrix__ord_eq_supported_null scalars::::matrix__ord_gt_pivot_max_correctness scalars::::matrix__ord_gt_pivot_max_cross_shape +scalars::::matrix__ord_gt_pivot_mid_correctness +scalars::::matrix__ord_gt_pivot_mid_cross_shape scalars::::matrix__ord_gt_pivot_min_correctness scalars::::matrix__ord_gt_pivot_min_cross_shape -scalars::::matrix__ord_gt_pivot_zero_correctness -scalars::::matrix__ord_gt_pivot_zero_cross_shape scalars::::matrix__ord_gt_supported_null scalars::::matrix__ord_gte_pivot_max_correctness scalars::::matrix__ord_gte_pivot_max_cross_shape +scalars::::matrix__ord_gte_pivot_mid_correctness +scalars::::matrix__ord_gte_pivot_mid_cross_shape scalars::::matrix__ord_gte_pivot_min_correctness scalars::::matrix__ord_gte_pivot_min_cross_shape -scalars::::matrix__ord_gte_pivot_zero_correctness -scalars::::matrix__ord_gte_pivot_zero_cross_shape scalars::::matrix__ord_gte_supported_null scalars::::matrix__ord_index_engages_btree scalars::::matrix__ord_lt_pivot_max_correctness scalars::::matrix__ord_lt_pivot_max_cross_shape +scalars::::matrix__ord_lt_pivot_mid_correctness +scalars::::matrix__ord_lt_pivot_mid_cross_shape scalars::::matrix__ord_lt_pivot_min_correctness scalars::::matrix__ord_lt_pivot_min_cross_shape -scalars::::matrix__ord_lt_pivot_zero_correctness -scalars::::matrix__ord_lt_pivot_zero_cross_shape scalars::::matrix__ord_lt_supported_null scalars::::matrix__ord_lte_pivot_max_correctness scalars::::matrix__ord_lte_pivot_max_cross_shape +scalars::::matrix__ord_lte_pivot_mid_correctness +scalars::::matrix__ord_lte_pivot_mid_cross_shape scalars::::matrix__ord_lte_pivot_min_correctness scalars::::matrix__ord_lte_pivot_min_cross_shape -scalars::::matrix__ord_lte_pivot_zero_correctness -scalars::::matrix__ord_lte_pivot_zero_cross_shape scalars::::matrix__ord_lte_supported_null scalars::::matrix__ord_native_absent_ops scalars::::matrix__ord_neq_pivot_max_correctness scalars::::matrix__ord_neq_pivot_max_cross_shape +scalars::::matrix__ord_neq_pivot_mid_correctness +scalars::::matrix__ord_neq_pivot_mid_cross_shape scalars::::matrix__ord_neq_pivot_min_correctness scalars::::matrix__ord_neq_pivot_min_cross_shape -scalars::::matrix__ord_neq_pivot_zero_correctness -scalars::::matrix__ord_neq_pivot_zero_cross_shape scalars::::matrix__ord_neq_supported_null scalars::::matrix__ord_ord_routes_through_ob scalars::::matrix__ord_order_by_asc_no_where @@ -123,47 +123,47 @@ scalars::::matrix__ord_ore_count_path_cast scalars::::matrix__ord_ore_count_typed_column scalars::::matrix__ord_ore_eq_pivot_max_correctness scalars::::matrix__ord_ore_eq_pivot_max_cross_shape +scalars::::matrix__ord_ore_eq_pivot_mid_correctness +scalars::::matrix__ord_ore_eq_pivot_mid_cross_shape scalars::::matrix__ord_ore_eq_pivot_min_correctness scalars::::matrix__ord_ore_eq_pivot_min_cross_shape -scalars::::matrix__ord_ore_eq_pivot_zero_correctness -scalars::::matrix__ord_ore_eq_pivot_zero_cross_shape scalars::::matrix__ord_ore_eq_supported_null scalars::::matrix__ord_ore_gt_pivot_max_correctness scalars::::matrix__ord_ore_gt_pivot_max_cross_shape +scalars::::matrix__ord_ore_gt_pivot_mid_correctness +scalars::::matrix__ord_ore_gt_pivot_mid_cross_shape scalars::::matrix__ord_ore_gt_pivot_min_correctness scalars::::matrix__ord_ore_gt_pivot_min_cross_shape -scalars::::matrix__ord_ore_gt_pivot_zero_correctness -scalars::::matrix__ord_ore_gt_pivot_zero_cross_shape scalars::::matrix__ord_ore_gt_supported_null scalars::::matrix__ord_ore_gte_pivot_max_correctness scalars::::matrix__ord_ore_gte_pivot_max_cross_shape +scalars::::matrix__ord_ore_gte_pivot_mid_correctness +scalars::::matrix__ord_ore_gte_pivot_mid_cross_shape scalars::::matrix__ord_ore_gte_pivot_min_correctness scalars::::matrix__ord_ore_gte_pivot_min_cross_shape -scalars::::matrix__ord_ore_gte_pivot_zero_correctness -scalars::::matrix__ord_ore_gte_pivot_zero_cross_shape scalars::::matrix__ord_ore_gte_supported_null scalars::::matrix__ord_ore_index_engages_btree scalars::::matrix__ord_ore_lt_pivot_max_correctness scalars::::matrix__ord_ore_lt_pivot_max_cross_shape +scalars::::matrix__ord_ore_lt_pivot_mid_correctness +scalars::::matrix__ord_ore_lt_pivot_mid_cross_shape scalars::::matrix__ord_ore_lt_pivot_min_correctness scalars::::matrix__ord_ore_lt_pivot_min_cross_shape -scalars::::matrix__ord_ore_lt_pivot_zero_correctness -scalars::::matrix__ord_ore_lt_pivot_zero_cross_shape scalars::::matrix__ord_ore_lt_supported_null scalars::::matrix__ord_ore_lte_pivot_max_correctness scalars::::matrix__ord_ore_lte_pivot_max_cross_shape +scalars::::matrix__ord_ore_lte_pivot_mid_correctness +scalars::::matrix__ord_ore_lte_pivot_mid_cross_shape scalars::::matrix__ord_ore_lte_pivot_min_correctness scalars::::matrix__ord_ore_lte_pivot_min_cross_shape -scalars::::matrix__ord_ore_lte_pivot_zero_correctness -scalars::::matrix__ord_ore_lte_pivot_zero_cross_shape scalars::::matrix__ord_ore_lte_supported_null scalars::::matrix__ord_ore_native_absent_ops scalars::::matrix__ord_ore_neq_pivot_max_correctness scalars::::matrix__ord_ore_neq_pivot_max_cross_shape +scalars::::matrix__ord_ore_neq_pivot_mid_correctness +scalars::::matrix__ord_ore_neq_pivot_mid_cross_shape scalars::::matrix__ord_ore_neq_pivot_min_correctness scalars::::matrix__ord_ore_neq_pivot_min_cross_shape -scalars::::matrix__ord_ore_neq_pivot_zero_correctness -scalars::::matrix__ord_ore_neq_pivot_zero_cross_shape scalars::::matrix__ord_ore_neq_supported_null scalars::::matrix__ord_ore_ord_routes_through_ob scalars::::matrix__ord_ore_order_by_asc_no_where diff --git a/tests/sqlx/src/fixtures/driver.rs b/tests/sqlx/src/fixtures/driver.rs index 127e20a1..96c0411d 100644 --- a/tests/sqlx/src/fixtures/driver.rs +++ b/tests/sqlx/src/fixtures/driver.rs @@ -32,7 +32,7 @@ use super::spec::FixtureSpec; /// already satisfies the bounds. pub trait FixtureValue: EqlPlaintext - + Copy + + Clone + Send + Sync + for<'q> sqlx::Encode<'q, sqlx::Postgres> @@ -42,7 +42,7 @@ pub trait FixtureValue: impl FixtureValue for T where T: EqlPlaintext - + Copy + + Clone + Send + Sync + for<'q> sqlx::Encode<'q, sqlx::Postgres> @@ -213,7 +213,7 @@ where let id = (i as i64) + 1; sqlx::query(&insert) .bind(id) - .bind(*value) + .bind(value.clone()) .bind(sqlx::types::Json(payload)) .execute(&mut *direct) .await diff --git a/tests/sqlx/src/fixtures/eql_plaintext.rs b/tests/sqlx/src/fixtures/eql_plaintext.rs index 65c9f43e..004a4c13 100644 --- a/tests/sqlx/src/fixtures/eql_plaintext.rs +++ b/tests/sqlx/src/fixtures/eql_plaintext.rs @@ -57,6 +57,7 @@ impl PlaintextSqlType { pub const SMALLINT: PlaintextSqlType = PlaintextSqlType("smallint"); pub const BIGINT: PlaintextSqlType = PlaintextSqlType("bigint"); pub const DATE: PlaintextSqlType = PlaintextSqlType("date"); + pub const TEXT: PlaintextSqlType = PlaintextSqlType("text"); pub fn as_str(&self) -> &'static str { self.0 @@ -71,16 +72,18 @@ impl fmt::Display for PlaintextSqlType { /// The EQL `cast_as` for a scalar kind, drawn from the `Cast` allowlist. /// -/// Only the wired kinds (the integer kinds plus `Date`) have `EqlPlaintext` -/// impls, so only those resolve; the remaining kinds mirror the `eql_scalars` -/// accessor convention and `panic!`, since no impl can ever reach them. +/// Only the wired kinds (the integer kinds, `Text`, plus `Date`) have +/// `EqlPlaintext` impls, so only those resolve; the remaining kinds mirror the +/// `eql_scalars` accessor convention and `panic!`, since no impl can ever reach +/// them. const fn cast_for_kind(kind: ScalarKind) -> Cast { match kind { ScalarKind::I32 => Cast::INT, ScalarKind::I16 => Cast::SMALL_INT, ScalarKind::I64 => Cast::BIG_INT, ScalarKind::Date => Cast::DATE, - ScalarKind::Numeric | ScalarKind::Text | ScalarKind::Jsonb => { + ScalarKind::Text => Cast::TEXT, + ScalarKind::Numeric | ScalarKind::Jsonb => { panic!("EqlPlaintext is only implemented for the wired scalar kinds") } } @@ -88,14 +91,15 @@ const fn cast_for_kind(kind: ScalarKind) -> Cast { /// The `plaintext` oracle column SQL type for a scalar kind, drawn from the /// `PlaintextSqlType` allowlist. As with `cast_for_kind`, only the wired kinds -/// (integers plus `Date`) resolve. +/// (integers, `Text`, plus `Date`) resolve. const fn plaintext_sql_type_for_kind(kind: ScalarKind) -> PlaintextSqlType { match kind { ScalarKind::I32 => PlaintextSqlType::INTEGER, ScalarKind::I16 => PlaintextSqlType::SMALLINT, ScalarKind::I64 => PlaintextSqlType::BIGINT, ScalarKind::Date => PlaintextSqlType::DATE, - ScalarKind::Numeric | ScalarKind::Text | ScalarKind::Jsonb => { + ScalarKind::Text => PlaintextSqlType::TEXT, + ScalarKind::Numeric | ScalarKind::Jsonb => { panic!("EqlPlaintext is only implemented for the wired scalar kinds") } } @@ -107,6 +111,7 @@ mod sealed { impl Sealed for i16 {} impl Sealed for i64 {} impl Sealed for chrono::NaiveDate {} + impl Sealed for String {} } /// A Rust type usable as a fixture `plaintext` value, carrying its EQL cast @@ -166,6 +171,14 @@ impl EqlPlaintext for chrono::NaiveDate { } } +impl EqlPlaintext for String { + const KIND: ScalarKind = ScalarKind::Text; + + fn to_plaintext(&self) -> Plaintext { + Plaintext::Text(Some(self.clone())) + } +} + #[cfg(test)] mod tests { use super::*; @@ -259,4 +272,25 @@ mod tests { other => panic!("expected Plaintext::NaiveDate(Some(1970-01-01)), got {other:?}"), } } + + #[test] + fn string_cast_is_text() { + assert_eq!(::CAST, Cast::TEXT); + } + + #[test] + fn string_plaintext_sql_type_is_text() { + assert_eq!( + ::PLAINTEXT_SQL_TYPE, + PlaintextSqlType::TEXT + ); + } + + #[test] + fn string_to_plaintext_is_text() { + // A String must lift into the Text variant so the fixture driver + // encrypts it under the `text` cast. + let p = "hi".to_string().to_plaintext(); + assert!(matches!(p, Plaintext::Text(Some(ref s)) if s == "hi")); + } } diff --git a/tests/sqlx/src/fixtures/scalar_fixture.rs b/tests/sqlx/src/fixtures/scalar_fixture.rs index d6a955ab..6efcfa3d 100644 --- a/tests/sqlx/src/fixtures/scalar_fixture.rs +++ b/tests/sqlx/src/fixtures/scalar_fixture.rs @@ -13,28 +13,34 @@ /// Stamp out the `spec()` builder, the `fixture-gen` generator test, and the /// property-test module for a scalar fixture. /// -/// The leading **kind** discriminator (`int` / `temporal`) selects which -/// property asserts are stamped — the rest of the expansion is identical: +/// The leading **kind** discriminator (`int` / `temporal` / `text`) selects +/// which property asserts are stamped and which index set the fixture declares +/// — the rest of the expansion is identical: /// /// - `int` — signed-extreme asserts (`<$ty>::MIN`/`MAX`, `contains(&0)`, -/// `any(|v| v < 0)`). These typecheck only for integer plaintexts. +/// `any(|v| v < 0)`). These typecheck only for integer plaintexts. Indexes +/// `Unique` + `Ore`. /// - `temporal` — a pivot-presence assert (`min_pivot`/`max_pivot`/zero from the /// `ScalarType` impl all appear in the values). `<$ty>::MIN` / `< 0` don't /// exist for a `chrono::NaiveDate`, so the integer asserts can't be reused. +/// Indexes `Unique` + `Ore`. +/// - `text` — pivot-presence asserts (same as `temporal`; text has no signed +/// extremes), plus a third `Match` index so generated payloads carry `bf` for +/// the `text_match` containment surface. Indexes `Unique` + `Ore` + `Match`. /// /// - `$name` — the fixture name (`"eql_v2_int2"`), drives every derived path. -/// - `$ty` — the Rust plaintext type (`i16` / `chrono::NaiveDate`). +/// - `$ty` — the Rust plaintext type (`i16` / `chrono::NaiveDate` / `String`). /// - `$values` — the value source: the catalog const (`eql_scalars::INT2_VALUES`) -/// for integers, or the harness accessor (`date_values()`) for temporal. +/// for integers, or the harness accessor (`date_values()` / `text_values()`). /// -/// Indexes are fixed to `Unique` (HMAC, drives `=` / `<>`) and `Ore` (ORE -/// block terms, drives `<` `<=` `>` `>=`) with a committed `jsonb` payload — -/// the shape shared by every ordered scalar domain. +/// `Unique` drives `=` / `<>` (HMAC); `Ore` drives `<` `<=` `>` `>=` (ORE block +/// terms); `Match` drives `@>` / `<@` (bloom filter). The committed payload is +/// always `jsonb`. #[macro_export] macro_rules! scalar_fixture { // Integer scalars: signed-extreme property asserts. (int, $name:literal, $ty:ty, $values:expr $(,)?) => { - $crate::scalar_fixture!(@common $name, $ty, $values); + $crate::scalar_fixture!(@common $name, $ty, $values, [Unique, Ore]); #[cfg(test)] mod tests { @@ -73,12 +79,12 @@ macro_rules! scalar_fixture { // Temporal scalars: pivot-presence property assert (no signed extremes). (temporal, $name:literal, $ty:ty, $values:expr $(,)?) => { - $crate::scalar_fixture!(@common $name, $ty, $values); + $crate::scalar_fixture!(@common $name, $ty, $values, [Unique, Ore]); #[cfg(test)] mod tests { use super::*; - use $crate::scalar_domains::ScalarType; + use $crate::scalar_domains::OrderedScalar; #[test] fn spec_is_complete() { @@ -87,29 +93,60 @@ macro_rules! scalar_fixture { #[test] fn spec_includes_pivots() { - // The three matrix pivots (min/max/zero) must be present in the + // The three matrix pivots (min/mid/max) must be present in the // fixture — `fetch_fixture_payload` fetches each at test time. let spec = spec(); let values = spec.values(); - let min = <$ty as ScalarType>::min_pivot(); - let max = <$ty as ScalarType>::max_pivot(); - let zero: $ty = ::core::default::Default::default(); + let min = <$ty as OrderedScalar>::min_pivot(); + let mid = <$ty as OrderedScalar>::mid_pivot(); + let max = <$ty as OrderedScalar>::max_pivot(); assert!(values.contains(&min), "spec must include min_pivot {min:?}"); + assert!(values.contains(&mid), "spec must include mid_pivot {mid:?}"); assert!(values.contains(&max), "spec must include max_pivot {max:?}"); - assert!(values.contains(&zero), "spec must include zero pivot {zero:?}"); } } }; - // Shared expansion: the `spec()` builder + the gated generator test. - (@common $name:literal, $ty:ty, $values:expr) => { + // Text scalars: pivot-presence asserts (like temporal) + the `Match` index + // so generated payloads carry `bf` for the `text_match` containment surface. + (text, $name:literal, $ty:ty, $values:expr $(,)?) => { + $crate::scalar_fixture!(@common $name, $ty, $values, [Unique, Ore, Match]); + + #[cfg(test)] + mod tests { + use super::*; + use $crate::scalar_domains::OrderedScalar; + + #[test] + fn spec_is_complete() { + assert!(spec().check_complete().is_ok()); + } + + #[test] + fn spec_includes_pivots() { + // text has no signed extremes; assert the OrderedScalar pivots + // (min/mid/max) are present, like the temporal arm. + let spec = spec(); + let values = spec.values(); + let min = <$ty as OrderedScalar>::min_pivot(); + let mid = <$ty as OrderedScalar>::mid_pivot(); + let max = <$ty as OrderedScalar>::max_pivot(); + assert!(values.contains(&min), "spec must include min_pivot {min:?}"); + assert!(values.contains(&mid), "spec must include mid_pivot {mid:?}"); + assert!(values.contains(&max), "spec must include max_pivot {max:?}"); + } + } + }; + + // Shared expansion: the `spec()` builder + the gated generator test. The + // trailing `[Unique, Ore, ...]` token list parametrizes the index set. + (@common $name:literal, $ty:ty, $values:expr, [$($ix:ident),+ $(,)?]) => { /// The complete fixture definition. `IndexKind::Unique` drives `=` / /// `<>` (HMAC); `IndexKind::Ore` drives `<` `<=` `>` `>=` (ORE block - /// terms). + /// terms); `IndexKind::Match` (when present) drives `@>` / `<@` (bloom). pub fn spec() -> $crate::fixtures::FixtureSpec<'static, $ty> { $crate::fixtures::FixtureSpec::new($name) - .with_index($crate::fixtures::IndexKind::Unique) - .with_index($crate::fixtures::IndexKind::Ore) + $(.with_index($crate::fixtures::IndexKind::$ix))+ .with_column_type("jsonb") .with_values($values) } diff --git a/tests/sqlx/src/matrix.rs b/tests/sqlx/src/matrix.rs index 2fdb9e5b..0ce2a7eb 100644 --- a/tests/sqlx/src/matrix.rs +++ b/tests/sqlx/src/matrix.rs @@ -149,11 +149,12 @@ fn collect_index_scan_nodes(value: &serde_json::Value, found: &mut Vec<(String, /// type (see `scalar_domains.rs`, `format!("eql_v3.{}…", T::PG_TYPE)`). /// /// Pivots — the comparison anchors swept by the correctness / cross-shape -/// arms — are derived from the scalar type: `min_pivot()`, `max_pivot()`, and -/// zero (`Default::default()`). Integer scalars resolve `min_pivot`/`max_pivot` -/// to `Self::MIN`/`Self::MAX`; temporal scalars use explicit sentinel dates. The -/// fixture must contain those three plaintext rows, since each pivot's -/// ciphertext is fetched at test time via `fetch_fixture_payload`. +/// arms — are the `OrderedScalar` anchors: `min_pivot()`, `max_pivot()`, and the +/// interior `mid_pivot()`. Integer scalars resolve `min`/`max` to +/// `Self::MIN`/`Self::MAX` and `mid` to the origin `0`; temporal scalars use +/// explicit sentinel dates (`mid` = the epoch); `text` uses a real median +/// fixture for `mid`. The fixture must contain those three plaintext rows, since +/// each pivot's ciphertext is fetched at test time via `fetch_fixture_payload`. #[macro_export] macro_rules! ordered_numeric_matrix { ( @@ -176,9 +177,9 @@ macro_rules! ordered_numeric_matrix { ord_domains = [(ord, Ord), (ord_ore, OrdOre)], ord_ore_domains = [(ord_ore, OrdOre)], pivots = [ - (min, <$scalar as $crate::scalar_domains::ScalarType>::min_pivot()), - (max, <$scalar as $crate::scalar_domains::ScalarType>::max_pivot()), - (zero, <$scalar as ::core::default::Default>::default()), + (min, <$scalar as $crate::scalar_domains::OrderedScalar>::min_pivot()), + (max, <$scalar as $crate::scalar_domains::OrderedScalar>::max_pivot()), + (mid, <$scalar as $crate::scalar_domains::OrderedScalar>::mid_pivot()), ], eq_ops = [(eq, "="), (neq, "<>")], ord_ops = [(lt, "<"), (lte, "<="), (gt, ">"), (gte, ">=")], @@ -593,7 +594,7 @@ macro_rules! __scalar_matrix_correctness_case { let spec = $crate::__scalar_matrix_spec!($scalar, $variant); let pivot: $scalar = $pivot_val; let payload = - $crate::scalar_domains::fetch_fixture_payload::<$scalar>(&pool, pivot).await?; + $crate::scalar_domains::fetch_fixture_payload::<$scalar>(&pool, pivot.clone()).await?; let lit = $crate::scalar_domains::sql_string_literal(&payload); let predicate = format!( "payload::{d} {op} {lit}::jsonb::{d}", @@ -634,13 +635,13 @@ macro_rules! __scalar_matrix_cross_shape_case { let spec = $crate::__scalar_matrix_spec!($scalar, $variant); let pivot: $scalar = $pivot_val; let payload = - $crate::scalar_domains::fetch_fixture_payload::<$scalar>(&pool, pivot).await?; + $crate::scalar_domains::fetch_fixture_payload::<$scalar>(&pool, pivot.clone()).await?; let lit = $crate::scalar_domains::sql_string_literal(&payload); let forward_count = - <$scalar as $crate::scalar_domains::ScalarType>::expected_forward($op, pivot) + <$scalar as $crate::scalar_domains::ScalarType>::expected_forward($op, pivot.clone()) .len() as i64; let commuted_count = <$scalar as $crate::scalar_domains::ScalarType>::expected_forward( - $crate::scalar_domains::commute_op($op), pivot, + $crate::scalar_domains::commute_op($op), pivot.clone(), ).len() as i64; let d = &spec.sql_domain; let shapes = [ @@ -1183,8 +1184,8 @@ macro_rules! __scalar_matrix_scale_case { let values: &[$scalar] = <$scalar as ScalarType>::fixture_values(); anyhow::ensure!(values.len() >= 2, "scale test requires >= 2 fixture rows for distinct filler/pivot"); - let filler = values[0]; - let pivot = values[values.len() / 2]; + let filler = values[0].clone(); + let pivot = values[values.len() / 2].clone(); let filler_payload = $crate::scalar_domains::fetch_fixture_payload::<$scalar>(&pool, filler).await?; let pivot_payload = @@ -1281,8 +1282,8 @@ macro_rules! __scalar_matrix_scale_default_case { let values: &[$scalar] = <$scalar as ScalarType>::fixture_values(); anyhow::ensure!(values.len() >= 2, "scale test requires >= 2 fixture rows for distinct filler/pivot"); - let filler = values[0]; - let pivot = values[values.len() / 2]; + let filler = values[0].clone(); + let pivot = values[values.len() / 2].clone(); let filler_payload = $crate::scalar_domains::fetch_fixture_payload::<$scalar>(&pool, filler).await?; let pivot_payload = @@ -1395,7 +1396,7 @@ macro_rules! __scalar_matrix_fixture_shape { // Value-filtering oracle: take the midpoint of FIXTURE_VALUES, // derive its expected id from position, assert exactly one row. if !expected.is_empty() { - let probe = expected[expected.len() / 2]; + let probe = &expected[expected.len() / 2]; let probe_lit = <$scalar as ScalarType>::to_sql_literal(probe); let expected_id = (expected.len() / 2 + 1) as i64; let ids: Vec = sqlx::query_scalar(&format!( @@ -1458,9 +1459,9 @@ macro_rules! __scalar_matrix_ord_routes_case { let fixture_table = <$scalar as $crate::scalar_domains::ScalarType>::fixture_table_name(); let pivot: $scalar = - <$scalar as $crate::scalar_domains::ScalarType>::fixture_values()[0]; + <$scalar as $crate::scalar_domains::ScalarType>::fixture_values()[0].clone(); let pivot_lit = - <$scalar as $crate::scalar_domains::ScalarType>::to_sql_literal(pivot); + <$scalar as $crate::scalar_domains::ScalarType>::to_sql_literal(&pivot); let mut tx = pool.begin().await?; sqlx::query(&format!( @@ -1673,7 +1674,7 @@ macro_rules! __scalar_matrix_index_case { .execute(&mut *tx).await?; sqlx::query("SET LOCAL enable_seqscan = off").execute(&mut *tx).await?; - let pivot: $scalar = <$scalar as $crate::scalar_domains::ScalarType>::fixture_values()[0]; + let pivot: $scalar = <$scalar as $crate::scalar_domains::ScalarType>::fixture_values()[0].clone(); let payload = $crate::scalar_domains::fetch_fixture_payload::<$scalar>(&pool, pivot).await?; let lit = $crate::scalar_domains::sql_string_literal(&payload); @@ -1758,12 +1759,12 @@ macro_rules! __scalar_matrix_order_by_domain { $crate::__scalar_matrix_order_by_case! { suite = $suite, scalar = $scalar, script = $script, script_path = $script_path, dom_name = $dom_name, variant = $variant, - mode_name = asc_with_where, direction = "ASC", filter = gt_zero, + mode_name = asc_with_where, direction = "ASC", filter = gt_mid, } $crate::__scalar_matrix_order_by_case! { suite = $suite, scalar = $scalar, script = $script, script_path = $script_path, dom_name = $dom_name, variant = $variant, - mode_name = desc_with_where, direction = "DESC", filter = gt_zero, + mode_name = desc_with_where, direction = "DESC", filter = gt_mid, } }; } @@ -1782,18 +1783,19 @@ macro_rules! __scalar_matrix_order_by_case { async fn []( pool: sqlx::PgPool, ) -> anyhow::Result<()> { - use $crate::scalar_domains::ScalarType; + use $crate::scalar_domains::{OrderedScalar, ScalarType}; let spec = $crate::__scalar_matrix_spec!($scalar, $variant); let fixture_table = <$scalar as ScalarType>::fixture_table_name(); - let zero: $scalar = Default::default(); - let gt_zero = stringify!($filter) == "gt_zero"; - // Build the WHERE clause from the zero pivot's SQL literal so it - // is type-agnostic: `plaintext > 0` for integers, `plaintext > - // '1970-01-01'` for dates. A hardcoded `> 0` would not typecheck - // against a non-integer plaintext column. - let where_clause = if gt_zero { - format!(" WHERE plaintext > {}", <$scalar as ScalarType>::to_sql_literal(zero)) + let mid: $scalar = <$scalar as OrderedScalar>::mid_pivot(); + let gt_mid = stringify!($filter) == "gt_mid"; + // Build the WHERE clause from the interior pivot's SQL literal so + // it is type-agnostic: `plaintext > 0` for integers, `plaintext > + // '1970-01-01'` for dates, `plaintext > 'frank'` for text. A + // hardcoded `> 0` would not typecheck against a non-integer + // plaintext column. + let where_clause = if gt_mid { + format!(" WHERE plaintext > {}", <$scalar as ScalarType>::to_sql_literal(&mid)) } else { String::new() }; @@ -1808,8 +1810,8 @@ ORDER BY eql_v3.ord_term(payload::{d}) {dir}", let mut expected: Vec<$scalar> = <$scalar as ScalarType>::fixture_values().to_vec(); expected.sort(); - if gt_zero { - expected.retain(|v| *v > zero); + if gt_mid { + expected.retain(|v| *v > mid); } if $direction == "DESC" { expected.reverse(); } @@ -2105,10 +2107,10 @@ macro_rules! __scalar_matrix_aggregate_case { let fixture = <$scalar as ScalarType>::fixture_table_name(); let extremum: $scalar = <$scalar as ScalarType>::fixture_values() .iter() - .copied() + .cloned() .$picker() .expect("FIXTURE_VALUES must be non-empty"); - let extremum_lit = <$scalar as ScalarType>::to_sql_literal(extremum); + let extremum_lit = <$scalar as ScalarType>::to_sql_literal(&extremum); let expected: String = sqlx::query_scalar(&format!( "SELECT payload::text FROM {fixture} WHERE plaintext = {lit}", lit = extremum_lit, @@ -2223,13 +2225,14 @@ macro_rules! __scalar_matrix_aggregate_case { // Span the fixture's extremes — for signed numeric scalars this // exercises the ORE sign-bit edges in addition to pinning STRICT // sfunc behaviour. - let low: $scalar = *sorted.first().expect("non-empty after len check"); - let high: $scalar = *sorted.last().expect("non-empty after len check"); + let low: $scalar = sorted.first().expect("non-empty after len check").clone(); + let high: $scalar = sorted.last().expect("non-empty after len check").clone(); // .min() / .max() on two values resolves to the correct picker. - let expected_plaintext: $scalar = low.$picker(high); - let low_lit = <$scalar as ScalarType>::to_sql_literal(low); - let high_lit = <$scalar as ScalarType>::to_sql_literal(high); - let expected_lit = <$scalar as ScalarType>::to_sql_literal(expected_plaintext); + // Clone so `low`/`high` survive for the literals below. + let expected_plaintext: $scalar = low.clone().$picker(high.clone()); + let low_lit = <$scalar as ScalarType>::to_sql_literal(&low); + let high_lit = <$scalar as ScalarType>::to_sql_literal(&high); + let expected_lit = <$scalar as ScalarType>::to_sql_literal(&expected_plaintext); let mut tx = pool.begin().await?; sqlx::query(&format!( @@ -2403,12 +2406,12 @@ macro_rules! __scalar_matrix_aggregate_group_by_case { // the ground truth. let group1: &[$scalar] = &values[..3]; let group2: &[$scalar] = &values[3..5]; - let group1_extremum: $scalar = group1.iter().copied().$picker() + let group1_extremum: $scalar = group1.iter().cloned().$picker() .expect("group 1 is non-empty"); - let group2_extremum: $scalar = group2.iter().copied().$picker() + let group2_extremum: $scalar = group2.iter().cloned().$picker() .expect("group 2 is non-empty"); - let g1_lit = <$scalar as ScalarType>::to_sql_literal(group1_extremum); - let g2_lit = <$scalar as ScalarType>::to_sql_literal(group2_extremum); + let g1_lit = <$scalar as ScalarType>::to_sql_literal(&group1_extremum); + let g2_lit = <$scalar as ScalarType>::to_sql_literal(&group2_extremum); let mut tx = pool.begin().await?; sqlx::query(&format!( @@ -2418,7 +2421,7 @@ ON COMMIT DROP", // Insert group 1 rows. for v in group1 { - let lit = <$scalar as ScalarType>::to_sql_literal(*v); + let lit = <$scalar as ScalarType>::to_sql_literal(v); sqlx::query(&format!( "INSERT INTO group_test(group_key, value) \ SELECT 1, payload::{d} FROM {fixture} WHERE plaintext = {lit}", @@ -2426,7 +2429,7 @@ SELECT 1, payload::{d} FROM {fixture} WHERE plaintext = {lit}", } // Insert group 2 rows. for v in group2 { - let lit = <$scalar as ScalarType>::to_sql_literal(*v); + let lit = <$scalar as ScalarType>::to_sql_literal(v); sqlx::query(&format!( "INSERT INTO group_test(group_key, value) \ SELECT 2, payload::{d} FROM {fixture} WHERE plaintext = {lit}", diff --git a/tests/sqlx/src/scalar_domains.rs b/tests/sqlx/src/scalar_domains.rs index 69453f59..c5c47e05 100644 --- a/tests/sqlx/src/scalar_domains.rs +++ b/tests/sqlx/src/scalar_domains.rs @@ -16,7 +16,7 @@ use std::sync::LazyLock; /// One impl per scalar type. Two `const`s and the rest defaults. pub trait ScalarType: - Copy + Clone + Ord + Default + Debug @@ -46,40 +46,32 @@ pub trait ScalarType: /// `LazyLock>` and returns a borrow of it (see `date_values`). /// Integer scalars return their `eql_scalars::_VALUES` const directly. /// - /// For types driven by `ordered_numeric_matrix!`, the values MUST - /// include the three pivots (`min_pivot()`, `max_pivot()`, and zero - /// `Default::default()`): the matrix uses those as comparison pivots and - /// fetches each one's ciphertext via `fetch_fixture_payload`, which fails - /// loudly if the row is absent. + /// For types driven by `ordered_numeric_matrix!`, the values MUST include + /// the three `OrderedScalar` pivots (`min_pivot()`, `max_pivot()`, + /// `mid_pivot()`): the matrix uses those as comparison pivots and fetches + /// each one's ciphertext via `fetch_fixture_payload`, which fails loudly if + /// the row is absent. fn fixture_values() -> &'static [Self]; - /// The low comparison pivot swept by the correctness / cross-shape arms. - /// Integer scalars return `Self::MIN`; temporal scalars return an explicit - /// sentinel (e.g. `1900-01-01`). A trait method rather than `Self::MIN` - /// because `chrono::DateTime` exposes `MAX_UTC`, not an inherent - /// `::MAX` const. The pivot must be present verbatim in `fixture_values()`. - fn min_pivot() -> Self; - - /// The high comparison pivot. Integer scalars return `Self::MAX`; temporal - /// scalars return an explicit sentinel (e.g. `2099-12-31`). Must be present - /// verbatim in `fixture_values()`. - fn max_pivot() -> Self; - /// `fixtures.eql_v2_`. fn fixture_table_name() -> String { format!("fixtures.eql_v2_{}", Self::PG_TYPE) } - /// SQL-literal rendering via `Display`. Override for types whose - /// `Display` form isn't a valid SQL literal (e.g. strings, dates). - fn to_sql_literal(value: Self) -> String { + /// SQL-literal rendering via `Display`. Takes `&Self` so a non-`Copy` + /// scalar (e.g. `String`) can be rendered without being consumed. Override + /// for types whose `Display` form isn't a valid SQL literal (e.g. strings, + /// dates). + fn to_sql_literal(value: &Self) -> String { value.to_string() } /// Ground-truth result set for `WHERE col op pivot`. Default works /// for any `Ord` scalar; override only for non-orderable types. fn expected_forward(op: &str, pivot: Self) -> Vec { - let predicate: fn(Self, Self) -> bool = match op { + // `&Self`-taking predicate so the default impl stays generic over a + // merely-`Clone` (non-`Copy`) scalar like `String`. + let predicate: fn(&Self, &Self) -> bool = match op { "=" => |a, b| a == b, "<>" => |a, b| a != b, "<" => |a, b| a < b, @@ -90,14 +82,54 @@ pub trait ScalarType: }; let mut values: Vec = Self::fixture_values() .iter() - .copied() - .filter(|v| predicate(*v, pivot)) + .filter(|v| predicate(v, &pivot)) + .cloned() .collect(); values.sort(); values } } +/// An **ordered** scalar — one whose `_ord` domains support `<`/`<=`/`>`/`>=`. +/// Carries the three comparison anchors the `ordered_numeric_matrix!` sweeps: +/// the `min`/`max` boundaries and an interior `mid` pivot. All three must be +/// present verbatim in `fixture_values()` (the matrix fetches each pivot's +/// ciphertext via `fetch_fixture_payload`). +/// +/// `min`/`max` are boundary anchors; `mid` is an interior anchor used by the +/// correctness/cross-shape sweep and the ORDER-BY-with-filter arm. `mid` +/// defaults to `Self::default()` — for signed scalars that is the numeric +/// origin (`0`, epoch), which is a fine interior anchor; lexicographic scalars +/// (e.g. `String`, whose `Default` is the degenerate empty string) override it +/// with a real median fixture. +pub trait OrderedScalar: ScalarType { + /// The low boundary pivot. Integer scalars return `Self::MIN`; others an + /// explicit sentinel. Present verbatim in `fixture_values()`. + fn min_pivot() -> Self; + + /// The high boundary pivot. Integer scalars return `Self::MAX`; others an + /// explicit sentinel. Present verbatim in `fixture_values()`. + fn max_pivot() -> Self; + + /// The interior pivot. Defaults to `Self::default()` (the numeric origin for + /// signed scalars); override where `Default` is not a usable fixture anchor. + /// Present verbatim in `fixture_values()`. + fn mid_pivot() -> Self { + Self::default() + } +} + +/// A **signed** scalar — an ordered scalar with a numeric origin / sign +/// boundary (`int`, `date`). `text` is `OrderedScalar` but **not** +/// `SignedScalar`: lexicographic order has no origin. The bound gates the +/// signed-only sign-boundary test, so a `text` instantiation of it is a compile +/// error. +pub trait SignedScalar: OrderedScalar { + /// The numeric origin (the sign boundary): `0` for integers, the epoch for + /// dates. Fixtures straddle it (negatives below, positives above). + fn origin() -> Self; +} + // The per-type `impl ScalarType` blocks for the **integer** scalars (each // carrying its `PG_TYPE` token, `fixture_values() = eql_scalars::_VALUES`, // and `min_pivot()`/`max_pivot()` = `Self::MIN`/`Self::MAX`) are generated from @@ -145,6 +177,14 @@ impl ScalarType for chrono::NaiveDate { date_values() } + /// `Display` renders a `NaiveDate` as `2099-12-31` (unquoted), which is not + /// a valid SQL literal on its own — wrap it in single quotes. + fn to_sql_literal(value: &Self) -> String { + format!("'{value}'") + } +} + +impl OrderedScalar for chrono::NaiveDate { /// Temporal min pivot — `1900-01-01`, present verbatim in the catalog /// fixtures (not `Self::MIN`, which would be far outside the fixture set). fn min_pivot() -> Self { @@ -156,11 +196,111 @@ impl ScalarType for chrono::NaiveDate { fn max_pivot() -> Self { chrono::NaiveDate::from_ymd_opt(2099, 12, 31).expect("2099-12-31 is a valid date") } + // mid_pivot defaults to `NaiveDate::default()` = the epoch (`1970-01-01`), + // which is also `origin()` — a real fixture and the sign boundary. +} - /// `Display` renders a `NaiveDate` as `2099-12-31` (unquoted), which is not - /// a valid SQL literal on its own — wrap it in single quotes. - fn to_sql_literal(value: Self) -> String { - format!("'{value}'") +impl SignedScalar for chrono::NaiveDate { + /// Dates encrypt as a signed offset from the epoch; `1970-01-01` is the + /// origin, and the fixtures straddle it (1900s below, 2000s above). + fn origin() -> Self { + chrono::NaiveDate::from_ymd_opt(1970, 1, 1).expect("1970-01-01 is a valid date") + } +} + +/// Typed `String` fixture values, built once from `text`'s catalog row. +/// `eql_scalars::TEXT_VALUES` is a `&[&'static str]` const, but the `ScalarType` +/// contract returns `&[Self]` = `&[String]` (owned), so we materialise them into +/// a `LazyLock>` and return a borrow — the same shape as +/// `date_values`. (Unlike `date`, no parsing is needed; the values are the +/// strings verbatim.) +static TEXT_VALUES_CELL: LazyLock> = LazyLock::new(|| { + eql_scalars::TEXT_VALUES + .iter() + .map(|s| s.to_string()) + .collect() +}); + +/// The `String` fixture values, in catalog order. Public so the `eql_v2_text` +/// fixture module (emitted by `scalar_types!(fixture_modules)`) can hand the +/// slice to `scalar_fixture!` — `text` is owned `String`, so there is no +/// `eql_scalars::_VALUES`-typed `&[String]` const to point at directly. +pub fn text_values() -> &'static [String] { + &TEXT_VALUES_CELL +} + +impl ScalarType for String { + const PG_TYPE: &'static str = "text"; + + fn fixture_values() -> &'static [Self] { + text_values() + } + + /// `Display` for a `String` is the unquoted text, which is not a valid SQL + /// literal; quote it and double any embedded single quotes. + fn to_sql_literal(value: &Self) -> String { + format!("'{}'", value.replace('\'', "''")) + } +} + +impl OrderedScalar for String { + /// Lexicographic min pivot — the lexicographically-smallest fixture + /// (`"aard"`). Present verbatim in `fixture_values()`; keep in sync with + /// `TEXT_FIXTURES`. + fn min_pivot() -> Self { + "aard".to_string() + } + + /// Lexicographic max pivot — the lexicographically-largest fixture + /// (`"zzzz"`). + fn max_pivot() -> Self { + "zzzz".to_string() + } + + /// Interior pivot — a real median fixture. `String::default()` is `""`, + /// which is degenerate for ORE (issue #262), so `text` overrides the + /// inherited default with a genuine middle value. + fn mid_pivot() -> Self { + "frank".to_string() + } +} + +// `String` is deliberately NOT `SignedScalar`: lexicographic text has no +// numeric origin / sign boundary. The signed-only sign-boundary test bounds on +// `SignedScalar`, so a `String` instantiation of it would not compile. + +#[cfg(test)] +mod text_value_tests { + use super::*; + + /// The `min`/`mid`/`max` pivots resolve to fixture rows present verbatim, so + /// `fetch_fixture_payload` can resolve each one's ciphertext. + #[test] + fn text_pivots_are_in_fixture_values() { + let values = ::fixture_values(); + let min = ::min_pivot(); + let mid = ::mid_pivot(); + let max = ::max_pivot(); + assert!(values.contains(&min), "min_pivot {min:?} must be a fixture"); + assert!(values.contains(&mid), "mid_pivot {mid:?} must be a fixture"); + assert!(values.contains(&max), "max_pivot {max:?} must be a fixture"); + assert!(min <= mid && mid <= max, "min <= mid <= max must hold"); + // text has no numeric origin: the empty string is not a fixture. + assert!( + !values.iter().any(|v| v.is_empty()), + "the empty string must not be a text fixture" + ); + } + + /// The harness value list matches the catalog `TEXT_VALUES` in order — the + /// oracle cannot drift from the catalog the fixture generator encrypts. + #[test] + fn text_values_match_catalog() { + let got: Vec<&str> = ::fixture_values() + .iter() + .map(|s| s.as_str()) + .collect(); + assert_eq!(got, eql_scalars::TEXT_VALUES.to_vec()); } } @@ -192,25 +332,25 @@ mod date_value_tests { } } - /// The three temporal pivots resolve to fixture rows present verbatim. + /// The three temporal pivots resolve to fixture rows present verbatim, and + /// for this signed scalar `mid_pivot` coincides with the origin (the epoch). #[test] fn date_pivots_are_in_fixture_values() { let values = ::fixture_values(); - let min = ::min_pivot(); - let max = ::max_pivot(); - let zero = chrono::NaiveDate::default(); + let min = ::min_pivot(); + let mid = ::mid_pivot(); + let max = ::max_pivot(); + let origin = ::origin(); assert!(values.contains(&min), "min_pivot {min} must be a fixture"); + assert!(values.contains(&mid), "mid_pivot {mid} must be a fixture"); assert!(values.contains(&max), "max_pivot {max} must be a fixture"); - assert!( - values.contains(&zero), - "zero pivot {zero} must be a fixture" - ); - // Default is 1970-01-01, the documented zero pivot. + // mid inherits `NaiveDate::default()` = the epoch = origin(). assert_eq!( - zero, + mid, chrono::NaiveDate::from_ymd_opt(1970, 1, 1).unwrap(), - "NaiveDate::default() must be 1970-01-01" + "date mid_pivot must be the epoch (1970-01-01)" ); + assert_eq!(mid, origin, "for a signed scalar mid_pivot == origin"); } } @@ -343,7 +483,7 @@ pub async fn fetch_fixture_payload(pool: &PgPool, plaintext: T) - let sql = format!( "SELECT payload::text FROM {table} WHERE plaintext = {lit}", table = T::fixture_table_name(), - lit = T::to_sql_literal(plaintext), + lit = T::to_sql_literal(&plaintext), ); sqlx::query_scalar(&sql) .fetch_one(pool) diff --git a/tests/sqlx/src/scalar_types.rs b/tests/sqlx/src/scalar_types.rs index 6747900a..85c02885 100644 --- a/tests/sqlx/src/scalar_types.rs +++ b/tests/sqlx/src/scalar_types.rs @@ -53,6 +53,7 @@ macro_rules! scalar_types { int2 => i16, int8 => i64, date => chrono::NaiveDate [temporal], + text => String [text], } }; } diff --git a/tests/sqlx/tests/encrypted_domain.rs b/tests/sqlx/tests/encrypted_domain.rs index 0ac40b22..5c82b5fc 100644 --- a/tests/sqlx/tests/encrypted_domain.rs +++ b/tests/sqlx/tests/encrypted_domain.rs @@ -10,3 +10,19 @@ mod family; #[path = "encrypted_domain/scalars/mod.rs"] mod scalars; + +// Text-specific behavioural suites (literal-payload smoke + fixture-backed +// match-containment). Deliberately NOT under `scalars::` — the matrix-inventory +// gate treats every `scalars::::` prefix as a scalar type, so these would be +// mis-discovered as types `text_smoke` / `text_match`. +#[path = "encrypted_domain/text/text_smoke.rs"] +mod text_smoke; + +#[path = "encrypted_domain/text/text_match.rs"] +mod text_match; + +// Signed-only sign-boundary suite (`int`, `date`). Like the text suites it +// lives outside `scalars::` so the matrix-inventory snapshot (which pins the +// uniform per-type set) does not see the signed-only delta. +#[path = "encrypted_domain/signed.rs"] +mod signed; diff --git a/tests/sqlx/tests/encrypted_domain/family/sem.rs b/tests/sqlx/tests/encrypted_domain/family/sem.rs index b7d2543c..71e2f058 100644 --- a/tests/sqlx/tests/encrypted_domain/family/sem.rs +++ b/tests/sqlx/tests/encrypted_domain/family/sem.rs @@ -393,3 +393,33 @@ async fn jsonb_array_to_ore_block_input_shapes(pool: PgPool) -> Result<()> { Ok(()) } + +/// T7 — Bloom-filter SEM extractor (`eql_v3.bloom_filter(jsonb)`): reads the +/// `bf` array out of a payload. Inlinable SQL mirroring `hmac_256` — NULL on a +/// missing key, not a raise (the `match` capability is tied to the domain, +/// whose CHECK guarantees `bf`). +#[sqlx::test] +async fn bloom_filter_extractor_reads_bf_array(pool: PgPool) -> Result<()> { + let got: Vec = + sqlx::query_scalar("SELECT eql_v3.bloom_filter('{\"bf\":[1,2,3]}'::jsonb)::smallint[]") + .fetch_one(&pool) + .await?; + assert_eq!(got, vec![1i16, 2, 3]); + Ok(()) +} + +#[sqlx::test] +async fn bloom_filter_extractor_returns_null_without_bf(pool: PgPool) -> Result<()> { + // Inlinable SQL extractor (like hmac_256): a payload without `bf` yields + // NULL, not an exception. The RAISE is redundant because the `text_match` + // domain CHECK already guarantees `bf` is present on the typed path. + let got: Option> = + sqlx::query_scalar("SELECT eql_v3.bloom_filter('{\"hm\":\"x\"}'::jsonb)::smallint[]") + .fetch_one(&pool) + .await?; + assert!( + got.is_none(), + "absent bf must return NULL (capability is tied to the domain)" + ); + Ok(()) +} diff --git a/tests/sqlx/tests/encrypted_domain/scalars/mod.rs b/tests/sqlx/tests/encrypted_domain/scalars/mod.rs index 72c64f87..9ef10f0e 100644 --- a/tests/sqlx/tests/encrypted_domain/scalars/mod.rs +++ b/tests/sqlx/tests/encrypted_domain/scalars/mod.rs @@ -7,3 +7,10 @@ //! automatically. The old per-type `scalars/.rs` files are gone. eql_tests::scalar_types!(matrix_suites); + +// NOTE: the `text_match` / `text_smoke` behavioural suites live in the sibling +// `encrypted_domain/text/` module tree, NOT here. The matrix-inventory gate +// discovers scalar types from every `scalars::::` test-name prefix +// (`tasks`/`mise.toml`), so a `scalars::text_smoke::` module would be +// mis-discovered as a scalar type `text_smoke` and pollute the snapshot / +// catalog cross-check. Keeping them out of `scalars::` avoids that. diff --git a/tests/sqlx/tests/encrypted_domain/signed.rs b/tests/sqlx/tests/encrypted_domain/signed.rs new file mode 100644 index 00000000..e00f9299 --- /dev/null +++ b/tests/sqlx/tests/encrypted_domain/signed.rs @@ -0,0 +1,53 @@ +//! Sign-boundary coverage for **signed** scalars (`int`, `date`) — the +//! `SignedScalar` delta on top of the uniform ordered matrix. +//! +//! ORE encrypts signed values as an offset from a numeric origin (`0` for +//! integers, the epoch for dates). This suite asserts the ORE block ordering is +//! **monotonic across that origin**: a fixture below the origin orders before +//! the origin, which orders before a fixture above it — through the encrypted +//! `_ord` domain, with no decryption. +//! +//! It is deliberately **outside** the `scalars::::` namespace (like the +//! `text_match` suites) so the matrix-inventory snapshot — which pins the +//! *uniform* per-type test set — does not see it. The generic body bounds on +//! `eql_tests::scalar_domains::SignedScalar`, so a `text` (`!SignedScalar`) +//! instantiation is a **compile error**: lexicographic text has no origin. +// `SignedScalar` is the bound; its supertrait methods (`min_pivot`/`max_pivot` +// from `OrderedScalar`, `PG_TYPE` from `ScalarType`) are reachable through it. +use eql_tests::scalar_domains::{fetch_fixture_payload, sql_string_literal, SignedScalar}; +use sqlx::PgPool; + +/// `min_pivot() < origin() < max_pivot()` holds through the encrypted `_ord` +/// domain's `<` operator (ORE block comparison), spanning the sign boundary. +async fn sign_boundary_is_monotonic(pool: &PgPool) -> anyhow::Result<()> { + let d = format!("eql_v3.{}_ord", T::PG_TYPE); + + // Fixtures straddling the origin: min is below it, max above it. + let below = sql_string_literal(&fetch_fixture_payload::(pool, T::min_pivot()).await?); + let origin = sql_string_literal(&fetch_fixture_payload::(pool, T::origin()).await?); + let above = sql_string_literal(&fetch_fixture_payload::(pool, T::max_pivot()).await?); + + let sql = format!( + "SELECT \ + ({below}::jsonb::{d} < {origin}::jsonb::{d}) AND \ + ({origin}::jsonb::{d} < {above}::jsonb::{d}) AND \ + ({below}::jsonb::{d} < {above}::jsonb::{d})" + ); + let monotonic: bool = sqlx::query_scalar(&sql).fetch_one(pool).await?; + assert!( + monotonic, + "{}: ORE ordering must be monotonic across the origin (below < origin < above):\n{sql}", + T::PG_TYPE, + ); + Ok(()) +} + +#[sqlx::test(fixtures(path = "../../fixtures", scripts("eql_v2_int4")))] +async fn int4_sign_boundary(pool: PgPool) -> anyhow::Result<()> { + sign_boundary_is_monotonic::(&pool).await +} + +#[sqlx::test(fixtures(path = "../../fixtures", scripts("eql_v2_date")))] +async fn date_sign_boundary(pool: PgPool) -> anyhow::Result<()> { + sign_boundary_is_monotonic::(&pool).await +} diff --git a/tests/sqlx/tests/encrypted_domain/text/text_match.rs b/tests/sqlx/tests/encrypted_domain/text/text_match.rs new file mode 100644 index 00000000..171edfcf --- /dev/null +++ b/tests/sqlx/tests/encrypted_domain/text/text_match.rs @@ -0,0 +1,135 @@ +//! Match-containment coverage for `eql_v3.text_match` — separate from the +//! ordered matrix because `@>` is asymmetric/probabilistic, not a total order. +//! Asserts against the generated `eql_v2_text` fixtures (which carry `bf`). +use sqlx::PgPool; + +const TABLE: &str = "fixtures.eql_v2_text"; + +async fn payload_for(pool: &PgPool, plaintext: &str) -> anyhow::Result { + Ok(sqlx::query_scalar::<_, serde_json::Value>(&format!( + "SELECT payload::jsonb FROM {TABLE} WHERE plaintext = $1" + )) + .bind(plaintext) + .fetch_one(pool) + .await?) +} + +#[sqlx::test(fixtures(path = "../../../fixtures", scripts("eql_v2_text")))] +async fn value_matches_itself(pool: PgPool) -> anyhow::Result<()> { + let p = payload_for(&pool, "aardvark").await?; + let hit: bool = sqlx::query_scalar( + "SELECT ($1::jsonb::eql_v3.text_match) @> ($1::jsonb::eql_v3.text_match)", + ) + .bind(&p) + .fetch_one(&pool) + .await?; + assert!(hit, "a value's bloom filter must contain itself"); + Ok(()) +} + +#[sqlx::test(fixtures(path = "../../../fixtures", scripts("eql_v2_text")))] +async fn haystack_contains_substring_needle(pool: PgPool) -> anyhow::Result<()> { + let hay = payload_for(&pool, "aardvark").await?; + let needle = payload_for(&pool, "aard").await?; + let hit: bool = sqlx::query_scalar( + "SELECT ($1::jsonb::eql_v3.text_match) @> ($2::jsonb::eql_v3.text_match)", + ) + .bind(&hay) + .bind(&needle) + .fetch_one(&pool) + .await?; + assert!(hit, "'aardvark' bloom must contain 'aard' (shared ngrams)"); + Ok(()) +} + +#[sqlx::test(fixtures(path = "../../../fixtures", scripts("eql_v2_text")))] +async fn disjoint_value_does_not_match(pool: PgPool) -> anyhow::Result<()> { + // A bloom filter is probabilistic and admits false positives, so a true + // negative is only deterministic for inputs that share no n-grams. "aard" + // (3-grams `aar`, `ard`) and "zzzz" (`zzz`) are chosen ngram-disjoint in + // TEXT_FIXTURES (crates/eql-scalars/src/lib.rs) precisely for this assertion; + // keep them disjoint if the fixture list changes. + let hay = payload_for(&pool, "aard").await?; + let needle = payload_for(&pool, "zzzz").await?; + let hit: bool = sqlx::query_scalar( + "SELECT ($1::jsonb::eql_v3.text_match) @> ($2::jsonb::eql_v3.text_match)", + ) + .bind(&hay) + .bind(&needle) + .fetch_one(&pool) + .await?; + assert!( + !hit, + "'aard' must not contain disjoint 'zzzz' (no shared ngrams)" + ); + Ok(()) +} + +#[sqlx::test(fixtures(path = "../../../fixtures", scripts("eql_v2_text")))] +async fn match_uses_functional_index(pool: PgPool) -> anyhow::Result<()> { + // Explicit extractor form `match_term(col) @> match_term(needle)`. Forces + // `enable_seqscan = off` so this is an index-VALIDITY proof on the small + // fixture (not a cost-preference one), and uses the node-type-aware + // `assert_index_scan_uses` rather than a plan substring match. + let mut tx = pool.begin().await?; + sqlx::query("SET LOCAL enable_seqscan = off") + .execute(&mut *tx) + .await?; + sqlx::query(&format!( + "CREATE INDEX text_match_idx ON {TABLE} USING gin (eql_v3.match_term(payload::eql_v3.text_match))" + )) + .execute(&mut *tx) + .await?; + + // Needle embedded via an uncorrelated subquery so the helper receives a + // hardcoded query (it interpolates directly and takes no binds). + let query = format!( + "SELECT 1 FROM {TABLE} \ + WHERE eql_v3.match_term(payload::eql_v3.text_match) \ + @> eql_v3.match_term((SELECT payload::jsonb FROM {TABLE} WHERE plaintext = 'aard')::eql_v3.text_match)" + ); + eql_tests::matrix::assert_index_scan_uses( + &mut *tx, + &query, + "text_match_idx", + "explicit match_term(col) @> match_term(needle) must engage the functional GIN index", + ) + .await?; + Ok(()) +} + +/// Companion to `match_uses_functional_index` proving the **bare operator** form +/// `WHERE col @> needle` (not the explicit `match_term(col) @> match_term(needle)`) +/// reaches the GIN index — i.e. the generated `@>` wrapper inlines through +/// `match_term` to the native array-containment the index supports. Forces +/// `enable_seqscan = off` so this is an index-**validity** proof on the small +/// fixture, not a cost-preference one, and uses the node-type-aware +/// `assert_index_scan_uses` rather than a plan substring match. +#[sqlx::test(fixtures(path = "../../../fixtures", scripts("eql_v2_text")))] +async fn bare_operator_uses_functional_index(pool: PgPool) -> anyhow::Result<()> { + let mut tx = pool.begin().await?; + sqlx::query("SET LOCAL enable_seqscan = off") + .execute(&mut *tx) + .await?; + sqlx::query(&format!( + "CREATE INDEX text_match_idx ON {TABLE} USING gin (eql_v3.match_term(payload::eql_v3.text_match))" + )) + .execute(&mut *tx) + .await?; + + // The needle is embedded via an uncorrelated subquery so the helper receives + // a hardcoded query string (it interpolates directly and takes no binds). + let query = format!( + "SELECT 1 FROM {TABLE} \ + WHERE (payload::eql_v3.text_match) \ + @> ((SELECT payload::jsonb FROM {TABLE} WHERE plaintext = 'aard')::eql_v3.text_match)" + ); + eql_tests::matrix::assert_index_scan_uses( + &mut *tx, + &query, + "text_match_idx", + "bare `@>` operator on text_match must engage the functional GIN index", + ) + .await?; + Ok(()) +} diff --git a/tests/sqlx/tests/encrypted_domain/text/text_smoke.rs b/tests/sqlx/tests/encrypted_domain/text/text_smoke.rs new file mode 100644 index 00000000..b62fab67 --- /dev/null +++ b/tests/sqlx/tests/encrypted_domain/text/text_smoke.rs @@ -0,0 +1,65 @@ +//! Literal-payload smoke tests for the generated `eql_v3.text_match` surface: +//! `@>` containment engages (supported wrapper) and `=` raises (blocker). +//! Uses hand-written jsonb payloads carrying `bf` — no encryption/fixtures +//! needed. The fixture-backed containment behaviour lives in `text_match.rs`. +use sqlx::PgPool; + +#[sqlx::test] +async fn text_match_at_contains_engages(pool: PgPool) -> anyhow::Result<()> { + // self-containment: a filter contains a subset of itself + let hit: bool = sqlx::query_scalar( + "SELECT ('{\"v\":\"2\",\"i\":{},\"c\":\"x\",\"bf\":[1,2,3]}'::jsonb::eql_v3.text_match) + @> ('{\"v\":\"2\",\"i\":{},\"c\":\"x\",\"bf\":[2]}'::jsonb::eql_v3.text_match)", + ) + .fetch_one(&pool) + .await?; + assert!(hit, "[1,2,3] @> [2] must hold"); + Ok(()) +} + +#[sqlx::test] +async fn text_match_eq_is_blocked(pool: PgPool) -> anyhow::Result<()> { + let err = sqlx::query( + "SELECT ('{\"v\":\"2\",\"i\":{},\"c\":\"x\",\"bf\":[1]}'::jsonb::eql_v3.text_match) + = ('{\"v\":\"2\",\"i\":{},\"c\":\"x\",\"bf\":[1]}'::jsonb::eql_v3.text_match)", + ) + .execute(&pool) + .await + .unwrap_err(); + assert!( + format!("{err}").contains("not supported"), + "= must be blocked on text_match" + ); + Ok(()) +} + +#[sqlx::test] +async fn empty_bloom_has_empty_set_semantics(pool: PgPool) -> anyhow::Result<()> { + // A value too short to tokenize (e.g. the empty string) yields an empty + // bloom filter (`bf: []`). Containment then follows empty-set semantics: + // everything contains the empty set; the empty set contains nothing. Uses + // literal payloads so the assertion is deterministic and independent of how + // the encryptor renders a `bf` for a degenerate plaintext. + const NON_EMPTY: &str = + "'{\"v\":\"2\",\"i\":{},\"c\":\"x\",\"bf\":[1,2,3]}'::jsonb::eql_v3.text_match"; + const EMPTY: &str = "'{\"v\":\"2\",\"i\":{},\"c\":\"x\",\"bf\":[]}'::jsonb::eql_v3.text_match"; + + let everything_contains_empty: bool = + sqlx::query_scalar(&format!("SELECT ({NON_EMPTY}) @> ({EMPTY})")) + .fetch_one(&pool) + .await?; + assert!( + everything_contains_empty, + "every filter must contain the empty filter" + ); + + let empty_contains_nothing: bool = + sqlx::query_scalar(&format!("SELECT ({EMPTY}) @> ({NON_EMPTY})")) + .fetch_one(&pool) + .await?; + assert!( + !empty_contains_nothing, + "empty filter must not contain a non-empty one" + ); + Ok(()) +}