From 80b8683836065d1551611f6dc6b67d56403ea483 Mon Sep 17 00:00:00 2001 From: anandgupta42 Date: Mon, 8 Jun 2026 12:01:09 -0700 Subject: [PATCH 01/11] Document dbt review improvement loop --- ...-08-dbt-pr-review-self-improvement-loop.md | 263 ++++++++++++++++++ .../src/altimate/review/orchestrate.ts | 3 +- .../opencode/test/altimate/review.test.ts | 124 +++++++++ 3 files changed, 389 insertions(+), 1 deletion(-) create mode 100644 docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md diff --git a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md new file mode 100644 index 000000000..bddce7b35 --- /dev/null +++ b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md @@ -0,0 +1,263 @@ +# dbt PR Review Self-Improvement Loop + +Date: 2026-06-08 + +## Goal + +Make `altimate review` the highest-signal dbt PR reviewer by adding only checks +that survive an end-to-end value test: + +1. The finding is backed by deterministic evidence, warehouse execution, or a + clearly bounded advisory rule. +2. It catches a real dbt PR risk in the public demo or sourced benchmark corpus. +3. It does not add false positives to corrected/safe cases. +4. It produces a review comment that a data engineer would act on. +5. It degrades loudly when required artifacts, schema, or warehouse access are + missing. + +## Recommended `/goal` Command + +```text +/goal Objective: make the dbt PR reviewer demo-parity reliable and measurably +higher-signal by closing the current public-demo gaps while preserving the +existing benchmark floor. + +Quantified acceptance target: +- Public demo matrix reaches expected verdicts for all 6 demo branches: + safe-refactor=APPROVE, join-key-breakage=REQUEST_CHANGES, + test-removal=COMMENT, new-pii-exposure=REQUEST_CHANGES, + mart-select-star=COMMENT, incremental-without-guard=COMMENT. +- Real-world corpus remains at 15/15 caught bad cases and 0/5 false positives + on corrected cases. +- At least one warehouse-backed DuckDB e2e proves a real dbt build/test pass + that the reviewer catches pre-merge. +- Every blocking finding is produced by deterministic core evidence or + warehouse-observed impact, not advisory AI or heuristic equivalence. + +Operating rules: +- Always start by pulling latest main of + /Users/anandgupta/codebase/altimate-core-internal with fast-forward only. +- Prefer implementing deterministic SQL/dbt analysis in altimate-core, not + altimate-code. Avoid regex/custom parsing in altimate-code when core + AST/parser support can own it. +- altimate-code should mostly orchestrate, map core findings to review + categories/severity, format verdicts, and run GitHub/CI integration. +- For every candidate improvement: + 1. Write a short spec: risk, deterministic evidence, what must not be flagged. + 2. Add bad and safe/corrected fixtures. + 3. Implement in altimate-core when possible. + 4. Build local core Node package and link altimate-code to it only when e2e + needs the local native package. + 5. Validate with the narrowest useful tests first, then corpus/demo/warehouse + only at acceptance checkpoints. + 6. Keep only if it improves real review value with no new false positives; + otherwise revert/reject or make advisory-only. +- Use /Users/anandgupta/codebase/altimate-code/demo/dbt-pr-review-demo for demo + branch e2e. +- Compile base dbt artifacts into target-base and head artifacts into target + before review when structural/equivalence lanes need base-vs-head compiled SQL. +- Use local DuckDB first; use BigQuery where warehouse behavior matters and + credentials are available. Trino/Postgres/ClickHouse/DuckDB can be spawned + locally. Snowflake later. +- Treat altimate-core equivalence as heuristic unless there is a sound proof + path. Do not block or approve solely from heuristic equivalence. Block only on + reproducible deterministic facts or warehouse-observed impact. +- Track results in + docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md. + +Iteration speed: +- Optimize for fast learning loops. Do not run full test suites or expensive + builds by default. +- Start with the narrowest relevant unit test or direct function smoke test. +- Only run broader tests after a candidate passes the focused check. +- Avoid rebuilding altimate-core unless the change affects native exports, NAPI + bindings, or e2e verification requires the local Node package. +- Prefer `cargo test -p altimate-core ` over full cargo + test. +- Prefer `bun test test/altimate/.ts` over full package + tests. +- Use direct Node smoke tests against `crates/altimate-core-node` before + relinking altimate-code. +- Run the real-world corpus and demo matrix only at acceptance checkpoints, not + after every edit. +- Run warehouse e2e only when the value claim depends on observed data + behavior. +- Keep a short validation ladder for each candidate: smoke -> focused unit -> + focused integration -> corpus/demo -> warehouse e2e if needed. + +Initial backlog: +1. Remove safe-refactor noise: base compiled artifact support and/or equivalence + degradation handling. +2. Tighten PII precision: do not flag low-confidence Name such as + customer_name, but still flag email/SSN/phone/payment identifiers. +3. Reduce low-value missing_table_alias comments for dbt CTE-heavy SQL. +4. Expand core structural diff rules for high-signal dbt semantic regressions. +5. Add warehouse data-diff e2e cases that prove real row/value impact. +``` + +## Skill vs Goal + +Create a skill once this loop stabilizes across several iterations. A skill is +the right home for reusable operating instructions: how to pull core, choose +core-vs-code ownership, build local NAPI, run the demo matrix, and apply the +validation ladder. The active `/goal` should still include the quantified +objective above because it provides the stop condition and prevents open-ended +"make it better" work. + +## Baseline From Dry Run + +Commands run from `packages/opencode` unless noted: + +```bash +bun run --conditions=browser script/review-realworld-eval.ts +``` + +Result: + +- Sourced bad-case catch rate: 15/15. +- Corrected-case false positives: 0/5. + +Demo matrix run against `demo/dbt-pr-review-demo` branches with the local dbt +virtualenv and `altimate review --mode gate --json`: + +| Branch | Expected | Actual | Finding categories | +|---|---:|---:|---| +| `demo/safe-refactor` | `APPROVE` | `COMMENT` | `semantic_change`, `pii_exposure`, `sql_quality` | +| `demo/join-key-breakage` | `REQUEST_CHANGES` | `COMMENT` | `semantic_change`, `pii_exposure`, `sql_correctness`, `sql_quality` | +| `demo/test-removal` | `COMMENT` | `COMMENT` | `test_coverage` | +| `demo/new-pii-exposure` | `REQUEST_CHANGES` | `REQUEST_CHANGES` | `pii_exposure`, `semantic_change`, `sql_quality` | +| `demo/mart-select-star` | `COMMENT` | `COMMENT` | `semantic_change`, `pii_exposure`, `sql_quality`, `test_coverage` | +| `demo/incremental-without-guard` | `COMMENT` | `COMMENT` | `semantic_change`, `pii_exposure`, `join_risk`, `warehouse_cost`, `sql_quality`, `materialization`, `sql_correctness`, `test_coverage` | + +Observed problems: + +- `safe-refactor` is noisy. The semantic lane cannot decide equivalence because + base compiled SQL is unavailable, `Name` classification flags `customer_name` + as PII, and `missing_table_alias` flags harmless CTE aliasing. +- `join-key-breakage` is not blocked. The useful join-key mismatch appears from + the advisory AI lane, not the deterministic lane, so it cannot gate. +- Several demo branches are degraded because base-vs-head compiled artifacts are + incomplete. A safe review must not pretend heuristic equivalence is a proof. + +## Loop Contract + +For each candidate improvement: + +1. **Spec**: one paragraph stating the risk, the deterministic evidence, and + what must not be flagged. +2. **Fixtures**: at least one bad case and one corrected/safe countercase. +3. **Implementation**: smallest scoped change in the reviewer or demo workflow. +4. **Validation**: + - Unit or harness test for the detector. + - `bun run --conditions=browser script/review-realworld-eval.ts`. + - Demo branch matrix for affected branches. + - Warehouse-backed run when the value claim depends on actual data impact. +5. **Decision**: + - Keep only if it improves catch rate or demo parity with zero new false + positives. + - Reject or keep advisory-only if evidence is heuristic, schema-dependent, + or too broad. + +## First Candidate Backlog + +### 1. Deterministic Join-Key Regression Detector + +Status: pushed down to `altimate-core-internal` as structural diff rule +`SC010 join_key_regression` in +`crates/altimate-core/src/review/structural_diff.rs`. `altimate-code` only maps +the core finding into the review verdict. + +Risk: a PR changes a join predicate from matching the same business key to +joining unrelated identifiers, e.g. `orders.customer_id = customers.customer_id` +becomes `orders.order_id = customers.customer_id`. + +Evidence: + +- Base and head SQL are both available. +- A changed `JOIN ... ON` equality compares identifier columns ending in `_id`. +- The base predicate joined same-name/same-stem keys, while the head predicate + joins different stems. + +Do not flag: + +- Intentional bridge joins such as `orders.order_id = order_items.order_id`. +- Joins where the changed key stems still match. +- Cases without a base predicate to compare. + +Expected result: + +- `demo/join-key-breakage` gets a deterministic `join_risk` or + `sql_correctness` critical finding and blocks in gate mode. +- `demo/safe-refactor` remains quiet for this detector. + +Validation: + +- `git pull --ff-only origin main` completed in + `/Users/anandgupta/codebase/altimate-core-internal`; latest core main was + `a413804` before this local rule change. +- `cargo test -p altimate-core review::structural_diff --lib` passed with the + new `SC010` bad case and safe countercases. +- `bun test --timeout 30000 test/altimate/review.test.ts` passed. +- `bun run --conditions=browser script/review-realworld-eval.ts` stayed at + 15/15 sourced catches and 0/5 corrected false positives. +- Demo matrix: `demo/join-key-breakage` improved from `COMMENT` to + `REQUEST_CHANGES` with deterministic `join_risk`. +- The detector itself stayed quiet on `demo/safe-refactor` and on the + same-key bridge-join countercase. + +### 2. Base Compiled Artifact Support In Demo CI + +Risk: safe refactors degrade to "could not prove equivalent" because the review +has only head-side compiled SQL. + +Evidence: + +- `compiled.ts` already supports `target-base/compiled//`. +- The demo workflow currently runs only head-side `dbt compile`. + +Do not flag: + +- A safe CTE rename where base and head compiled SQL are equivalent or where no + deterministic structural difference exists. + +Expected result: + +- The demo workflow compiles base into `target-base` before compiling head. +- `demo/safe-refactor` no longer emits an unknown semantic-change warning due + solely to missing base compiled SQL. + +### 3. PII Classification Precision Floor + +Risk: high-noise PII comments on ordinary names such as `customer_name` reduce +trust and make safe PRs look risky. + +Evidence: + +- `classify_pii` labels `customer_name` as `Name` at 75% confidence in the demo. +- The stronger demo/security value is email/SSN/phone/payment identifiers and + source-propagated sensitive columns. + +Do not flag: + +- `Name` classification below a high confidence threshold. +- Pre-existing PII columns not introduced by the PR. + +Expected result: + +- `demo/safe-refactor` no longer reports `customer_name`. +- `demo/new-pii-exposure` still reports `email`. + +## Warehouse E2E Policy + +Use local DuckDB first for fast end-to-end checks, then add BigQuery runs when a +candidate needs production-style warehouse behavior. Trino, Postgres, +ClickHouse, and DuckDB can be spawned locally. Snowflake is deferred until a +separate account is available. + +Warehouse-backed findings must include: + +- The connection used. +- Base/head SQL or model relation names. +- Key columns. +- Row/value delta summary. +- A skip/degraded state when credentials or drivers are unavailable. diff --git a/packages/opencode/src/altimate/review/orchestrate.ts b/packages/opencode/src/altimate/review/orchestrate.ts index a72778a64..ef546845d 100644 --- a/packages/opencode/src/altimate/review/orchestrate.ts +++ b/packages/opencode/src/altimate/review/orchestrate.ts @@ -813,6 +813,7 @@ const STRUCTURAL_CATEGORY: Record = { coalesce_removed: "semantic_change", removed_predicate: "semantic_change", type_narrowing: "contract_violation", + join_key_regression: "join_risk", } async function structuralChangeLane(ctx: ModelContext, runner: ReviewRunner): Promise { @@ -828,7 +829,7 @@ async function structuralChangeLane(ctx: ModelContext, runner: ReviewRunner): Pr const out: Finding[] = [] for (const f of raw) { const cat: ReviewCategory = STRUCTURAL_CATEGORY[f.rule] ?? "semantic_change" - const sev = clampSeverity(cat, "warning", "high") + const sev = clampSeverity(cat, f.severity === "error" ? "critical" : "warning", "high") out.push( makeFinding({ severity: sev, diff --git a/packages/opencode/test/altimate/review.test.ts b/packages/opencode/test/altimate/review.test.ts index 0faf6a191..881821d06 100644 --- a/packages/opencode/test/altimate/review.test.ts +++ b/packages/opencode/test/altimate/review.test.ts @@ -569,6 +569,130 @@ describe("orchestrate", () => { ).toBe(true) }) + test("core SC010 join-key regression maps to critical join_risk and blocks", async () => { + const oldSql = ` + select * + from orders + left join customers + on orders.customer_id = customers.customer_id + ` + const newSql = ` + select * + from orders + left join customers + on orders.order_id = customers.customer_id + ` + let sawBase = "" + let sawHead = "" + const runner: ReviewRunner = { + ...fakeRunner({}), + async structuralDiff(baseSql, headSql) { + sawBase = baseSql + sawHead = headSql + return [ + { + code: "SC010", + rule: "join_key_regression", + severity: "error", + message: + "A join key changed from matching the same identifier stem to `orders.order_id = customers.customer_id`.", + }, + ] + }, + } + const env = await runReview({ + changedFiles: [ + { + path: "models/marts/fct_customer_orders.sql", + status: "modified", + diff: "+on orders.order_id = customers.customer_id\n-on orders.customer_id = customers.customer_id\n", + }, + ], + config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["semantic_change"], ai: false }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: async (_f, side) => (side === "new" ? newSql : oldSql), + getCompiled: async (_f, side) => (side === "new" ? newSql : oldSql), + generatedAt: "2026-06-08T00:00:00Z", + }) + expect(sawBase).toBe(oldSql) + expect(sawHead).toBe(newSql) + const f = env.findings.find( + (x) => x.evidence?.tool === "altimate_core.structural_diff" && (x.evidence?.result as any)?.code === "SC010", + ) + expect(f).toBeDefined() + expect(f!.severity).toBe("critical") + expect(f!.category).toBe("join_risk") + expect(env.verdict).toBe("REQUEST_CHANGES") + }) + + test("core structural diff quiet on CTE rename and bridge join countercases", async () => { + const runner: ReviewRunner = { + ...fakeRunner({}), + async structuralDiff() { + return [] + }, + } + const safeEnv = await runReview({ + changedFiles: [ + { + path: "models/marts/fct_customer_orders.sql", + status: "modified", + diff: + "+on order_records.customer_id = customer_records.customer_id\n" + + "-on orders.customer_id = customers.customer_id\n", + }, + ], + config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["semantic_change"], ai: false }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: async (_f, side) => + side === "new" + ? "select * from order_records left join customer_records on order_records.customer_id = customer_records.customer_id" + : "select * from orders left join customers on orders.customer_id = customers.customer_id", + getCompiled: async (_f, side) => + side === "new" + ? "select * from order_records left join customer_records on order_records.customer_id = customer_records.customer_id" + : "select * from orders left join customers on orders.customer_id = customers.customer_id", + generatedAt: "2026-06-08T00:00:00Z", + }) + expect( + safeEnv.findings.some( + (x) => x.evidence?.tool === "altimate_core.structural_diff" && (x.evidence?.result as any)?.code === "SC010", + ), + ).toBe(false) + + const bridgeEnv = await runReview({ + changedFiles: [ + { + path: "models/marts/fct_order_items.sql", + status: "modified", + diff: "+on orders.order_id = order_items.order_id\n-on orders.customer_id = customers.customer_id\n", + }, + ], + config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["semantic_change"], ai: false }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: async (_f, side) => + side === "new" + ? "select * from orders left join order_items on orders.order_id = order_items.order_id" + : "select * from orders left join customers on orders.customer_id = customers.customer_id", + getCompiled: async (_f, side) => + side === "new" + ? "select * from orders left join order_items on orders.order_id = order_items.order_id" + : "select * from orders left join customers on orders.customer_id = customers.customer_id", + generatedAt: "2026-06-08T00:00:00Z", + }) + expect( + bridgeEnv.findings.some( + (x) => x.evidence?.tool === "altimate_core.structural_diff" && (x.evidence?.result as any)?.code === "SC010", + ), + ).toBe(false) + }) + test("core L033 portability → regex portability twin suppressed, idempotency concern survives", async () => { const sql = "select getdate() as loaded_at from {{ ref('a') }}" const files: ChangedFile[] = [{ path: "models/marts/m.sql", status: "modified", diff: "+ , getdate() as loaded_at\n" }] From b5060e0ce6aaf90e568823bf7f199193aab982a8 Mon Sep 17 00:00:00 2001 From: anandgupta42 Date: Mon, 8 Jun 2026 12:20:38 -0700 Subject: [PATCH 02/11] Reduce safe refactor review noise --- docs/docs/usage/dbt-pr-review.md | 7 ++- ...-08-dbt-pr-review-self-improvement-loop.md | 26 +++++++-- .../src/altimate/native/altimate-core.ts | 2 +- .../opencode/src/altimate/native/types.ts | 1 + .../opencode/src/altimate/review/runner.ts | 3 +- packages/opencode/src/cli/cmd/review.ts | 2 +- .../test/altimate/review-runner.test.ts | 53 ++++++++++++++++++- 7 files changed, 85 insertions(+), 9 deletions(-) diff --git a/docs/docs/usage/dbt-pr-review.md b/docs/docs/usage/dbt-pr-review.md index e4b63774b..dbf97d4d3 100644 --- a/docs/docs/usage/dbt-pr-review.md +++ b/docs/docs/usage/dbt-pr-review.md @@ -332,8 +332,11 @@ reviewer does **not** re-implement Jinja — it consumes dbt's own compiled outp from `target/compiled//…` (HEAD) and `target-base/compiled/…` (BASE), produced by `dbt compile`. To enable full equivalence verdicts, compile both the base and head refs in CI (the base into `target-base/`, the Recce - convention). Without compiled SQL the engine lanes fall back to raw and stay - *undecidable* (never fabricated) — the `dbt-patterns` lane still runs. + convention). For proof-grade equivalence, also produce `target/catalog.json` + with `dbt docs generate`; the catalog carries complete warehouse columns that + `manifest.json` often lacks. Without compiled SQL or complete columns the + engine lanes stay *undecidable* (never fabricated) — the `dbt-patterns` lane + still runs. ## What it checks — deterministic rule catalog diff --git a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md index bddce7b35..58c74bfe4 100644 --- a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md +++ b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md @@ -208,12 +208,16 @@ Validation: ### 2. Base Compiled Artifact Support In Demo CI Risk: safe refactors degrade to "could not prove equivalent" because the review -has only head-side compiled SQL. +has incomplete proof inputs: missing base compiled SQL or incomplete schema +columns. Evidence: - `compiled.ts` already supports `target-base/compiled//`. - The demo workflow currently runs only head-side `dbt compile`. +- `demo/safe-refactor` became decidable after `dbt docs generate` produced + `target/catalog.json`, but core equivalence then falsely compared CTE alias + names inside the join filter. Do not flag: @@ -223,8 +227,24 @@ Do not flag: Expected result: - The demo workflow compiles base into `target-base` before compiling head. -- `demo/safe-refactor` no longer emits an unknown semantic-change warning due - solely to missing base compiled SQL. +- The demo workflow produces `target/catalog.json` when a warehouse-backed dbt + project is available, so equivalence has complete columns. +- `demo/safe-refactor` emits no findings with local core and DuckDB catalog + artifacts. + +Implemented result: + +- Core `L012 missing_table_alias` now ignores CTE references, removing the + `order_records/customer_records` alias noise while preserving real table alias + lint. +- Core equivalence now resolves join predicate columns through CTE aliases to + base `table.column` provenance, so CTE alias renames are equivalent but a + changed join key remains material. +- `altimate-code` now threads the detected dialect into + `altimate_core.equivalence` and `--no-ai` correctly disables the advisory lane. +- Local DuckDB safe-refactor validation: `APPROVE`, zero findings. +- Real-world corpus floor preserved: 15/15 bad cases caught, 0/5 false + positives. ### 3. PII Classification Precision Floor diff --git a/packages/opencode/src/altimate/native/altimate-core.ts b/packages/opencode/src/altimate/native/altimate-core.ts index a41296ee7..2d58bad38 100644 --- a/packages/opencode/src/altimate/native/altimate-core.ts +++ b/packages/opencode/src/altimate/native/altimate-core.ts @@ -320,7 +320,7 @@ export function registerAll(): void { register("altimate_core.equivalence", async (params) => { try { const schema = schemaOrEmpty(params.schema_path, params.schema_context) - const raw = await core.checkEquivalence(params.sql1, params.sql2, schema) + const raw = await core.checkEquivalence(params.sql1, params.sql2, schema, params.dialect) const data = toData(raw) return ok(true, data) } catch (e) { diff --git a/packages/opencode/src/altimate/native/types.ts b/packages/opencode/src/altimate/native/types.ts index 7b47f3068..f3e72e562 100644 --- a/packages/opencode/src/altimate/native/types.ts +++ b/packages/opencode/src/altimate/native/types.ts @@ -879,6 +879,7 @@ export interface AltimateCoreTestgenParams { export interface AltimateCoreEquivalenceParams { sql1: string sql2: string + dialect?: string schema_path?: string schema_context?: Record } diff --git a/packages/opencode/src/altimate/review/runner.ts b/packages/opencode/src/altimate/review/runner.ts index 1ffda22e6..f26b6add6 100644 --- a/packages/opencode/src/altimate/review/runner.ts +++ b/packages/opencode/src/altimate/review/runner.ts @@ -470,12 +470,13 @@ export function createDispatcherRunner(opts: DispatcherRunnerOptions): ReviewRun } }, - async equivalence(oldSql: string, newSql: string): Promise { + async equivalence(oldSql: string, newSql: string, dialect?: string): Promise { try { const schema = await resolveSchema() const res = await Dispatcher.call("altimate_core.equivalence", { sql1: oldSql, sql2: newSql, + dialect, schema_context: schema, }) const data = (res.data ?? {}) as Record diff --git a/packages/opencode/src/cli/cmd/review.ts b/packages/opencode/src/cli/cmd/review.ts index f3bd9094f..7b594025c 100644 --- a/packages/opencode/src/cli/cmd/review.ts +++ b/packages/opencode/src/cli/cmd/review.ts @@ -58,7 +58,7 @@ export const ReviewCommand = cmd({ manifestPath: args.manifest as string | undefined, mode: args.mode as ReviewMode | undefined, severityThreshold: args.severity as Severity | undefined, - noAi: args.ai === false, + noAi: args.noAi === true || args.ai === false, }) if (args.output) await fs.writeFile(args.output as string, JSON.stringify(env, null, 2)) diff --git a/packages/opencode/test/altimate/review-runner.test.ts b/packages/opencode/test/altimate/review-runner.test.ts index f055149fb..40e38c8ad 100644 --- a/packages/opencode/test/altimate/review-runner.test.ts +++ b/packages/opencode/test/altimate/review-runner.test.ts @@ -1,9 +1,17 @@ -import { describe, expect, test } from "bun:test" +import { afterEach, describe, expect, spyOn, test } from "bun:test" import { writeFileSync } from "node:fs" import path from "node:path" import { createDispatcherRunner } from "../../src/altimate/review/runner" +import { Dispatcher } from "../../src/altimate/native" import { tmpdir } from "../fixture/fixture" +let dispatcherSpy: ReturnType | undefined + +afterEach(() => { + dispatcherSpy?.mockRestore() + dispatcherSpy = undefined +}) + describe("review manifest loading", () => { test("loads a valid manifest without initializing the native dispatcher", async () => { await using tmp = await tmpdir() @@ -36,4 +44,47 @@ describe("review manifest loading", () => { testCount: 0, }) }) + + test("threads adapter dialect into core equivalence", async () => { + await using tmp = await tmpdir() + const manifestPath = path.join(tmp.path, "manifest.json") + writeFileSync( + manifestPath, + JSON.stringify({ + metadata: { adapter_type: "duckdb" }, + nodes: { + "model.demo.orders": { + resource_type: "model", + name: "orders", + original_file_path: "models/orders.sql", + config: { materialized: "table" }, + depends_on: { nodes: [] }, + columns: { id: { name: "id", data_type: "integer" } }, + }, + }, + sources: {}, + }), + ) + + let seenParams: any + dispatcherSpy = spyOn(Dispatcher, "call").mockImplementation((async (method: string, params: any) => { + expect(method).toBe("altimate_core.equivalence") + seenParams = params + return { + success: true, + data: { + equivalent: true, + confidence: 0.95, + differences: [], + validation_errors: [], + }, + } + }) as any) + + const runner = createDispatcherRunner({ manifestPath }) + const result = await runner.equivalence("select id from orders", "select id from orders", "duckdb") + + expect(result).toEqual({ decided: true, equivalent: true, differences: [], confidence: "high" }) + expect(seenParams.dialect).toBe("duckdb") + }) }) From 94b452655533f82e8936b8058bd9c2fea141d59d Mon Sep 17 00:00:00 2001 From: anandgupta42 Date: Mon, 8 Jun 2026 12:26:24 -0700 Subject: [PATCH 03/11] Tighten PII review precision --- ...-08-dbt-pr-review-self-improvement-loop.md | 18 +++++ .../src/altimate/review/orchestrate.ts | 34 +++++++-- .../opencode/test/altimate/review.test.ts | 73 +++++++++++++++++++ 3 files changed, 119 insertions(+), 6 deletions(-) diff --git a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md index 58c74bfe4..00f6ef4e2 100644 --- a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md +++ b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md @@ -267,6 +267,24 @@ Expected result: - `demo/safe-refactor` no longer reports `customer_name`. - `demo/new-pii-exposure` still reports `email`. +Implemented result: + +- Diff-scoped core PII classification is the authoritative PR-review PII + comment when lineage/classification are available. +- High-confidence non-low-risk PII introduced into `marts/` or `reporting/` + is critical and blockable from `altimate_core.classify_pii` evidence. +- Low-risk `Name`/`Address` classifications require at least 90% confidence to + surface, so `first_name`/`customer_name`-style weak signals do not create + noisy comments. +- Fallback `dbt-patterns`/`rule-catalog` PII twins are suppressed for files + where the diff-scoped core classifier ran; fallback remains available when + core lineage/classification is unavailable. +- `demo/new-pii-exposure` now reports one critical core PII finding for + `email` plus the core equivalence warning, and still returns + `REQUEST_CHANGES`. +- Real-world corpus floor preserved: 15/15 bad cases caught, 0/5 false + positives. + ## Warehouse E2E Policy Use local DuckDB first for fast end-to-end checks, then add BigQuery runs when a diff --git a/packages/opencode/src/altimate/review/orchestrate.ts b/packages/opencode/src/altimate/review/orchestrate.ts index ef546845d..8ddb5bedf 100644 --- a/packages/opencode/src/altimate/review/orchestrate.ts +++ b/packages/opencode/src/altimate/review/orchestrate.ts @@ -933,12 +933,18 @@ async function piiClassifyLane(ctx: ModelContext, runner: ReviewRunner, dialect: if (!newCols.length) return [] const pii = await runner.classifyPii(newCols) const out: Finding[] = [] + const highExposureLayer = /(^|\/)(marts|reporting)\//.test(file.path) for (const p of pii) { - if (p.confidence < 0.7 || !p.column) continue + const classification = String(p.classification ?? "") + const lowRiskClass = /^(name|address)$/i.test(classification) + const minConfidence = lowRiskClass ? 0.9 : 0.7 + if (p.confidence < minConfidence || !p.column) continue const col = p.column.toLowerCase() + const highConfidence = p.confidence >= 0.9 + const severity: Severity = highExposureLayer && highConfidence && !lowRiskClass ? "critical" : "warning" out.push( makeFinding({ - severity: "warning", + severity: clampSeverity("pii_exposure", severity, highConfidence ? "high" : "medium"), category: "pii_exposure", title: `${model}: exposes ${p.classification} column \`${p.column}\``, body: @@ -947,7 +953,7 @@ async function piiClassifyLane(ctx: ModelContext, runner: ReviewRunner, dialect: (p.masking ? `; suggested masking: \`${p.masking}\`.` : "."), file: file.path, model, - confidence: p.confidence >= 0.9 ? "high" : "medium", + confidence: highConfidence ? "high" : "medium", evidence: { tool: "altimate_core.classify_pii", result: p }, ruleKey: `pii_exposure:${col}`, }), @@ -1071,7 +1077,8 @@ export async function runReview(input: OrchestrateInput): Promise() + if (input.runner.classifyPii && input.runner.columnLineage) { + for (const ctx of ctxByPath.values()) { + if (ctx.engineNewSql && ctx.file.status !== "deleted") diffScopedPiiFiles.add(ctx.file.path) + } + } // Per file, the concatenated rule names core's AST lint actually emitted — // so we suppress a regex twin only when core genuinely covered that concern. const coreRulesByFile = new Map() @@ -1161,13 +1174,22 @@ export async function runReview(input: OrchestrateInput): Promise { - if (!coreRanFiles.has(f.file)) return true const tool = f.evidence?.tool + const fallbackRule = String((f.evidence?.result as any)?.rule ?? "").replace(/_/g, "-") + if ( + (tool === "dbt-patterns" || tool === "rule-catalog") && + diffScopedPiiFiles.has(f.file) && + f.category === "pii_exposure" && + (fallbackRule === "pii-into-mart" || fallbackRule === "select-pii-columns") + ) { + return false + } + if (!coreRanFiles.has(f.file)) return true if (tool !== "dbt-patterns" && tool !== "rule-catalog") return true // Structural twins: the check code survives on evidence.result.rule (ruleKey // is stripped by the zod schema). Suppress ONLY if core actually emitted the // equivalent rule for this file (else the issue would fall through). - const code = String((f.evidence?.result as any)?.rule ?? "").replace(/_/g, "-") + const code = fallbackRule const coreMatcher = CORE_AST_COVERED[code] if (coreMatcher && coreMatcher.test(coreRulesByFile.get(f.file) ?? "")) return false // Portability twins: drop a regex finding ONLY IF it is itself a portability diff --git a/packages/opencode/test/altimate/review.test.ts b/packages/opencode/test/altimate/review.test.ts index 881821d06..036687969 100644 --- a/packages/opencode/test/altimate/review.test.ts +++ b/packages/opencode/test/altimate/review.test.ts @@ -1333,6 +1333,79 @@ describe("orchestrate", () => { expect(env.verdict).toBe("REQUEST_CHANGES") }) + test("diff-scoped core PII is the single blocker for newly introduced mart email", async () => { + const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+ email\n" }] + const runner: ReviewRunner = { + ...fakeRunner({}), + async detectPii() { + return { columns: ["email"] } + }, + async columnLineage(sql) { + return sql.includes("email") + ? [ + { source: "customers.customer_name", target: "customer_name" }, + { source: "customers.email", target: "email" }, + ] + : [{ source: "customers.customer_name", target: "customer_name" }] + }, + async classifyPii(columns) { + return columns.map((column) => ({ + column, + classification: "Email", + confidence: 0.95, + masking: "'***MASKED***'", + })) + }, + } + const env = await runReview({ + changedFiles: files, + config: { ...DEFAULT_REVIEW_CONFIG }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: content("select customer_name, email from customers", "select customer_name from customers"), + }) + const pii = env.findings.filter((f) => f.category === "pii_exposure") + expect(pii).toHaveLength(1) + expect(pii[0].severity).toBe("critical") + expect(pii[0].evidence?.tool).toBe("altimate_core.classify_pii") + expect(env.verdict).toBe("REQUEST_CHANGES") + }) + + test("low-confidence name PII does not surface or fall back to regex PII comments", async () => { + const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+ first_name\n" }] + const runner: ReviewRunner = { + ...fakeRunner({}), + async detectPii() { + return { columns: ["first_name"] } + }, + async columnLineage(sql) { + return sql.includes("first_name") + ? [ + { source: "customers.customer_id", target: "customer_id" }, + { source: "customers.first_name", target: "first_name" }, + ] + : [{ source: "customers.customer_id", target: "customer_id" }] + }, + async classifyPii(columns) { + return columns.map((column) => ({ + column, + classification: "Name", + confidence: 0.85, + })) + }, + } + const env = await runReview({ + changedFiles: files, + config: { ...DEFAULT_REVIEW_CONFIG }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: content("select customer_id, first_name from customers", "select customer_id from customers"), + }) + expect(env.findings.some((f) => f.category === "pii_exposure")).toBe(false) + }) + test("clean change with no manifest → degraded, APPROVE/COMMENT, lint-only labeled", async () => { const files: ChangedFile[] = [{ path: "models/staging/stg_x.sql", status: "modified", diff: "+select 1\n" }] const runner: ReviewRunner = { From 1b050926f9c51d5d67a96408ca9fe68d54eebec2 Mon Sep 17 00:00:00 2001 From: anandgupta42 Date: Mon, 8 Jun 2026 12:32:56 -0700 Subject: [PATCH 04/11] Add DuckDB data diff e2e coverage --- ...-08-dbt-pr-review-self-improvement-loop.md | 37 ++++++++++++++++ packages/drivers/src/duckdb.ts | 37 +++++++++------- .../altimate/data-diff-duckdb-e2e.test.ts | 44 +++++++++++++++++++ 3 files changed, 101 insertions(+), 17 deletions(-) create mode 100644 packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts diff --git a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md index 00f6ef4e2..841ae5441 100644 --- a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md +++ b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md @@ -285,6 +285,43 @@ Implemented result: - Real-world corpus floor preserved: 15/15 bad cases caught, 0/5 false positives. +## Current Acceptance Checkpoint + +Demo matrix with local core, base artifacts in `target-base`, head artifacts in +`target`, and AI disabled: + +| Branch | Expected | Actual | Deterministic evidence | +|---|---:|---:|---| +| `demo/safe-refactor` | `APPROVE` | `APPROVE` | no findings | +| `demo/join-key-breakage` | `REQUEST_CHANGES` | `REQUEST_CHANGES` | `altimate_core.structural_diff:SC010`, core equivalence | +| `demo/test-removal` | `COMMENT` | `COMMENT` | `dbt-patterns:removed_tests` | +| `demo/new-pii-exposure` | `REQUEST_CHANGES` | `REQUEST_CHANGES` | `altimate_core.classify_pii`, core equivalence | +| `demo/mart-select-star` | `COMMENT` | `COMMENT` | core equivalence warning | +| `demo/incremental-without-guard` | `COMMENT` | `COMMENT` | core dbt config lint + deterministic catalog cost notes | + +DuckDB e2e proof: + +- Branch: `demo/join-key-breakage`. +- `dbt build --profiles-dir . --target dev`: passed `PASS=14 WARN=0 ERROR=0`. +- Reviewer: `REQUEST_CHANGES` with critical `join_risk` from + `altimate_core.structural_diff` rule `join_key_regression` (`SC010`). + +Warehouse data-diff e2e proof: + +- Test: `packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts`. +- Connection: local DuckDB via `@altimateai/drivers/duckdb`. +- Base/head relations: `base_orders` and `head_orders`. +- Key columns: `order_id`; compared value column: `amount`. +- Observed warehouse delta: one row only in head and one updated value. +- Validation: + `ALTIMATE_RUN_WAREHOUSE_E2E=1 bun test --timeout 30000 test/altimate/data-diff-duckdb-e2e.test.ts`. +- Default fast-loop behavior: + `bun test --timeout 30000 test/altimate/data-diff-duckdb-e2e.test.ts` + skips unless `ALTIMATE_RUN_WAREHOUSE_E2E=1` is set. +- Implementation note: fixed the DuckDB connector to use the two-argument + constructor when no open options are required; Bun's native binding can miss + the callback when `undefined` is passed as the second argument. + ## Warehouse E2E Policy Use local DuckDB first for fast end-to-end checks, then add BigQuery runs when a diff --git a/packages/drivers/src/duckdb.ts b/packages/drivers/src/duckdb.ts index 3ccca467a..d177adc93 100644 --- a/packages/drivers/src/duckdb.ts +++ b/packages/drivers/src/duckdb.ts @@ -56,26 +56,29 @@ export async function connect(config: ConnectionConfig): Promise { new Promise((resolve, reject) => { let resolved = false let timeout: ReturnType | undefined + let instance: any const opts = accessMode ? { access_mode: accessMode } : undefined - const instance = new duckdb.Database( - dbPath, - opts, - (err: Error | null) => { - if (resolved) { if (instance && typeof instance.close === "function") instance.close(); return } - resolved = true - if (timeout) clearTimeout(timeout) - if (err) { - const msg = err.message || String(err) - if (msg.toLowerCase().includes("locked") || msg.includes("SQLITE_BUSY") || msg.includes("DUCKDB_LOCKED")) { - reject(new Error("DUCKDB_LOCKED")) - } else { - reject(err) - } + const onOpen = (err: Error | null) => { + if (resolved) { + if (instance && typeof instance.close === "function") instance.close() + return + } + resolved = true + if (timeout) clearTimeout(timeout) + if (err) { + const msg = err.message || String(err) + if (msg.toLowerCase().includes("locked") || msg.includes("SQLITE_BUSY") || msg.includes("DUCKDB_LOCKED")) { + reject(new Error("DUCKDB_LOCKED")) } else { - resolve(instance) + reject(err) } - }, - ) + } else { + resolve(instance) + } + } + instance = opts + ? new duckdb.Database(dbPath, opts, onOpen) + : new duckdb.Database(dbPath, onOpen) // Bun: native callback may not fire; fall back after 2s timeout = setTimeout(() => { if (!resolved) { diff --git a/packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts b/packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts new file mode 100644 index 000000000..3da67fec9 --- /dev/null +++ b/packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts @@ -0,0 +1,44 @@ +import { afterAll, beforeAll, describe, expect, test } from "bun:test" + +import { runDataDiff } from "../../src/altimate/native/connections/data-diff" +import * as Registry from "../../src/altimate/native/connections/registry" + +const RUN = process.env.ALTIMATE_RUN_WAREHOUSE_E2E === "1" +const e2eTest = RUN ? test : test.skip + +describe("data.diff DuckDB e2e", () => { + beforeAll(() => { + process.env.ALTIMATE_TELEMETRY_DISABLED = "true" + }) + + afterAll(() => { + delete process.env.ALTIMATE_TELEMETRY_DISABLED + Registry.reset() + }) + + e2eTest("detects value and row deltas through a real DuckDB warehouse", async () => { + Registry.reset() + Registry.setConfigs({ duck_e2e: { type: "duckdb", path: ":memory:" } }) + const conn = await Registry.get("duck_e2e") + + await conn.execute("CREATE TABLE base_orders (order_id INTEGER, amount INTEGER)") + await conn.execute("CREATE TABLE head_orders (order_id INTEGER, amount INTEGER)") + await conn.execute("INSERT INTO base_orders VALUES (1, 100), (2, 200)") + await conn.execute("INSERT INTO head_orders VALUES (1, 100), (2, 250), (3, 300)") + + const result = await runDataDiff({ + source: "base_orders", + target: "head_orders", + key_columns: ["order_id"], + extra_columns: ["amount"], + source_warehouse: "duck_e2e", + target_warehouse: "duck_e2e", + algorithm: "joindiff", + }) + + expect(result.success).toBe(true) + expect((result.outcome as any)?.mode).toBe("diff") + expect((result.outcome as any)?.stats?.exclusive_table2).toBe(1) + expect((result.outcome as any)?.stats?.updated).toBe(1) + }) +}) From 49fc784cd725c7e584a41b94607c2c180b5a066e Mon Sep 17 00:00:00 2001 From: anandgupta42 Date: Tue, 9 Jun 2026 17:51:48 -0700 Subject: [PATCH 05/11] fix: address dbt review parity comments --- docs/docs/usage/dbt-pr-review.md | 6 +- ...-08-dbt-pr-review-self-improvement-loop.md | 8 +-- packages/drivers/src/duckdb.ts | 6 ++ packages/drivers/test/driver-security.test.ts | 56 +++++++++++++++---- .../src/altimate/review/orchestrate.ts | 21 ++++++- .../opencode/test/altimate/review.test.ts | 38 +++++++++++++ 6 files changed, 115 insertions(+), 20 deletions(-) diff --git a/docs/docs/usage/dbt-pr-review.md b/docs/docs/usage/dbt-pr-review.md index dbf97d4d3..e23cd8ddd 100644 --- a/docs/docs/usage/dbt-pr-review.md +++ b/docs/docs/usage/dbt-pr-review.md @@ -334,9 +334,9 @@ reviewer does **not** re-implement Jinja — it consumes dbt's own compiled outp the base and head refs in CI (the base into `target-base/`, the Recce convention). For proof-grade equivalence, also produce `target/catalog.json` with `dbt docs generate`; the catalog carries complete warehouse columns that - `manifest.json` often lacks. Without compiled SQL or complete columns the - engine lanes stay *undecidable* (never fabricated) — the `dbt-patterns` lane - still runs. + `manifest.json` often lacks. Without compiled SQL or complete columns, + schema-dependent engine lanes such as equivalence and PII stay *undecidable* + (never fabricated) — the `dbt-patterns` lane still runs. ## What it checks — deterministic rule catalog diff --git a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md index 841ae5441..8823d2da6 100644 --- a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md +++ b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md @@ -35,8 +35,8 @@ Quantified acceptance target: warehouse-observed impact, not advisory AI or heuristic equivalence. Operating rules: -- Always start by pulling latest main of - /Users/anandgupta/codebase/altimate-core-internal with fast-forward only. +- Always start by pulling latest main of the sibling `altimate-core-internal` + checkout with fast-forward only. - Prefer implementing deterministic SQL/dbt analysis in altimate-core, not altimate-code. Avoid regex/custom parsing in altimate-code when core AST/parser support can own it. @@ -52,8 +52,8 @@ Operating rules: only at acceptance checkpoints. 6. Keep only if it improves real review value with no new false positives; otherwise revert/reject or make advisory-only. -- Use /Users/anandgupta/codebase/altimate-code/demo/dbt-pr-review-demo for demo - branch e2e. +- Use the `demo/dbt-pr-review-demo` checkout under the altimate-code workspace + for demo branch e2e. - Compile base dbt artifacts into target-base and head artifacts into target before review when structural/equivalence lanes need base-vs-head compiled SQL. - Use local DuckDB first; use BigQuery where warehouse behavior matters and diff --git a/packages/drivers/src/duckdb.ts b/packages/drivers/src/duckdb.ts index d177adc93..1aa2fdba7 100644 --- a/packages/drivers/src/duckdb.ts +++ b/packages/drivers/src/duckdb.ts @@ -57,8 +57,13 @@ export async function connect(config: ConnectionConfig): Promise { let resolved = false let timeout: ReturnType | undefined let instance: any + let pendingOpen: Error | null | undefined const opts = accessMode ? { access_mode: accessMode } : undefined const onOpen = (err: Error | null) => { + if (!instance) { + pendingOpen = err + return + } if (resolved) { if (instance && typeof instance.close === "function") instance.close() return @@ -79,6 +84,7 @@ export async function connect(config: ConnectionConfig): Promise { instance = opts ? new duckdb.Database(dbPath, opts, onOpen) : new duckdb.Database(dbPath, onOpen) + if (pendingOpen !== undefined) onOpen(pendingOpen) // Bun: native callback may not fire; fall back after 2s timeout = setTimeout(() => { if (!resolved) { diff --git a/packages/drivers/test/driver-security.test.ts b/packages/drivers/test/driver-security.test.ts index 1fd4a9435..a590235ff 100644 --- a/packages/drivers/test/driver-security.test.ts +++ b/packages/drivers/test/driver-security.test.ts @@ -12,6 +12,10 @@ import { describe, test, expect, mock, beforeEach, afterEach } from "bun:test" // DuckDB: wrapDuckDBError + lock retry // --------------------------------------------------------------------------- describe("DuckDB driver", () => { + function openCallback(optsOrCb: any, cb?: (err: Error | null) => void): (err: Error | null) => void { + return typeof optsOrCb === "function" ? optsOrCb : cb! + } + // Test wrapDuckDBError logic inline (it's a closure, so we test via connect behavior) describe("wrapDuckDBError", () => { test("wraps SQLITE_BUSY errors with user-friendly message", async () => { @@ -28,8 +32,8 @@ describe("DuckDB driver", () => { mock.module("duckdb", () => ({ default: { Database: class { - constructor(_path: string, _opts: any, cb: (err: Error | null) => void) { - setTimeout(() => cb(null), 0) + constructor(_path: string, optsOrCb: any, cb?: (err: Error | null) => void) { + setTimeout(() => openCallback(optsOrCb, cb)(null), 0) } connect() { return mockDb.connect() @@ -62,8 +66,8 @@ describe("DuckDB driver", () => { mock.module("duckdb", () => ({ default: { Database: class { - constructor(_path: string, _opts: any, cb: (err: Error | null) => void) { - setTimeout(() => cb(null), 0) + constructor(_path: string, optsOrCb: any, cb?: (err: Error | null) => void) { + setTimeout(() => openCallback(optsOrCb, cb)(null), 0) } connect() { return { @@ -97,8 +101,8 @@ describe("DuckDB driver", () => { mock.module("duckdb", () => ({ default: { Database: class { - constructor(_path: string, _opts: any, cb: (err: Error | null) => void) { - setTimeout(() => cb(null), 0) + constructor(_path: string, optsOrCb: any, cb?: (err: Error | null) => void) { + setTimeout(() => openCallback(optsOrCb, cb)(null), 0) } connect() { return { @@ -135,14 +139,16 @@ describe("DuckDB driver", () => { mock.module("duckdb", () => ({ default: { Database: class { - constructor(_path: string, opts: any, cb: (err: Error | null) => void) { + constructor(_path: string, optsOrCb: any, cb?: (err: Error | null) => void) { + const opts = typeof optsOrCb === "function" ? undefined : optsOrCb + const done = openCallback(optsOrCb, cb) connectAttempts++ if (connectAttempts === 1 && !opts?.access_mode) { // First attempt fails with lock error - setTimeout(() => cb(new Error("DUCKDB_LOCKED: file is locked")), 0) + setTimeout(() => done(new Error("DUCKDB_LOCKED: file is locked")), 0) } else { // READ_ONLY retry succeeds - setTimeout(() => cb(null), 0) + setTimeout(() => done(null), 0) } } connect() { @@ -171,8 +177,8 @@ describe("DuckDB driver", () => { mock.module("duckdb", () => ({ default: { Database: class { - constructor(_path: string, _opts: any, cb: (err: Error | null) => void) { - setTimeout(() => cb(new Error("DUCKDB_LOCKED")), 0) + constructor(_path: string, optsOrCb: any, cb?: (err: Error | null) => void) { + setTimeout(() => openCallback(optsOrCb, cb)(new Error("DUCKDB_LOCKED")), 0) } connect() { return {} @@ -195,6 +201,34 @@ describe("DuckDB driver", () => { expect(e.message).toBe("DUCKDB_LOCKED") } }) + + test("handles native open callback invoked synchronously", async () => { + mock.module("duckdb", () => ({ + default: { + Database: class { + constructor(_path: string, optsOrCb: any, cb?: (err: Error | null) => void) { + openCallback(optsOrCb, cb)(null) + } + connect() { + return { + all: (_sql: string, cb: (err: Error | null, rows: any[]) => void) => { + cb(null, [{ ok: 1 }]) + }, + } + } + close(cb: any) { + if (cb) cb(null) + } + }, + }, + })) + + const { connect } = await import("../src/duckdb") + const connector = await connect({ type: "duckdb", path: ":memory:" }) + await connector.connect() + expect(await connector.execute("SELECT 1")).toMatchObject({ columns: ["ok"], rows: [[1]], row_count: 1 }) + await connector.close() + }) }) }) diff --git a/packages/opencode/src/altimate/review/orchestrate.ts b/packages/opencode/src/altimate/review/orchestrate.ts index 8ddb5bedf..e5341faea 100644 --- a/packages/opencode/src/altimate/review/orchestrate.ts +++ b/packages/opencode/src/altimate/review/orchestrate.ts @@ -962,6 +962,15 @@ async function piiClassifyLane(ctx: ModelContext, runner: ReviewRunner, dialect: return out } +async function introducedOutputColumns(ctx: ModelContext, runner: ReviewRunner, dialect: string): Promise | undefined> { + if (!runner.columnLineage || !ctx.engineNewSql || ctx.file.status === "deleted") return undefined + const headCols = [...new Set((await runner.columnLineage(ctx.engineNewSql, dialect)).map((e) => e.target).filter(Boolean))] + if (!headCols.length) return new Set() + if (!ctx.engineOldSql) return new Set(headCols.map((c) => c.toLowerCase())) + const baseCols = new Set((await runner.columnLineage(ctx.engineOldSql, dialect)).map((e) => e.target.toLowerCase())) + return new Set(headCols.filter((c) => !baseCols.has(c.toLowerCase())).map((c) => c.toLowerCase())) +} + function piiLane(file: ChangedFile & { kind: string }, columns: string[]): Finding[] { if (file.status === "deleted" || !columns.length) return [] const model = modelNameFromPath(file.path) @@ -1077,8 +1086,16 @@ export async function runReview(input: OrchestrateInput): Promise !diffScopedPiiCols.has(col.toLowerCase())) + : ctx.pii + all.push(piiLane(ctx.file, fallbackPii)) + } // PII classification always runs (cheap name-pattern check, diff-scoped to // newly-introduced columns) — exposing PII is a hard-floor concern that // shouldn't depend on the cost tier enabling the pii_exposure lane. diff --git a/packages/opencode/test/altimate/review.test.ts b/packages/opencode/test/altimate/review.test.ts index 036687969..a6c06fd1c 100644 --- a/packages/opencode/test/altimate/review.test.ts +++ b/packages/opencode/test/altimate/review.test.ts @@ -1372,6 +1372,44 @@ describe("orchestrate", () => { expect(env.verdict).toBe("REQUEST_CHANGES") }) + test("broad PII detector remains fallback for existing output columns", async () => { + const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+ lower(email) as email\n" }] + const runner: ReviewRunner = { + ...fakeRunner({}), + async detectPii() { + return { columns: ["email"] } + }, + async columnLineage(sql) { + return sql.includes("email") + ? [ + { source: "customers.customer_id", target: "customer_id" }, + { source: "customers.email", target: "email" }, + ] + : [{ source: "customers.customer_id", target: "customer_id" }] + }, + async classifyPii(columns) { + return columns.map((column) => ({ + column, + classification: "Email", + confidence: 0.95, + masking: "'***MASKED***'", + })) + }, + } + const env = await runReview({ + changedFiles: files, + config: { ...DEFAULT_REVIEW_CONFIG }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: content("select customer_id, lower(email) as email from customers", "select customer_id, email from customers"), + }) + const pii = env.findings.filter((f) => f.category === "pii_exposure") + expect(pii).toHaveLength(1) + expect(pii[0].evidence?.tool).toBe("schema.detect_pii") + expect(env.verdict).toBe("REQUEST_CHANGES") + }) + test("low-confidence name PII does not surface or fall back to regex PII comments", async () => { const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+ first_name\n" }] const runner: ReviewRunner = { From 21e5adff70675e670f7ae9804e48f4bbb6bc8165 Mon Sep 17 00:00:00 2001 From: anandgupta42 Date: Tue, 9 Jun 2026 18:06:25 -0700 Subject: [PATCH 06/11] chore: remove non-functional planning docs from dbt review PR --- ...-08-dbt-pr-review-self-improvement-loop.md | 338 ------------------ 1 file changed, 338 deletions(-) delete mode 100644 docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md diff --git a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md deleted file mode 100644 index 8823d2da6..000000000 --- a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md +++ /dev/null @@ -1,338 +0,0 @@ -# dbt PR Review Self-Improvement Loop - -Date: 2026-06-08 - -## Goal - -Make `altimate review` the highest-signal dbt PR reviewer by adding only checks -that survive an end-to-end value test: - -1. The finding is backed by deterministic evidence, warehouse execution, or a - clearly bounded advisory rule. -2. It catches a real dbt PR risk in the public demo or sourced benchmark corpus. -3. It does not add false positives to corrected/safe cases. -4. It produces a review comment that a data engineer would act on. -5. It degrades loudly when required artifacts, schema, or warehouse access are - missing. - -## Recommended `/goal` Command - -```text -/goal Objective: make the dbt PR reviewer demo-parity reliable and measurably -higher-signal by closing the current public-demo gaps while preserving the -existing benchmark floor. - -Quantified acceptance target: -- Public demo matrix reaches expected verdicts for all 6 demo branches: - safe-refactor=APPROVE, join-key-breakage=REQUEST_CHANGES, - test-removal=COMMENT, new-pii-exposure=REQUEST_CHANGES, - mart-select-star=COMMENT, incremental-without-guard=COMMENT. -- Real-world corpus remains at 15/15 caught bad cases and 0/5 false positives - on corrected cases. -- At least one warehouse-backed DuckDB e2e proves a real dbt build/test pass - that the reviewer catches pre-merge. -- Every blocking finding is produced by deterministic core evidence or - warehouse-observed impact, not advisory AI or heuristic equivalence. - -Operating rules: -- Always start by pulling latest main of the sibling `altimate-core-internal` - checkout with fast-forward only. -- Prefer implementing deterministic SQL/dbt analysis in altimate-core, not - altimate-code. Avoid regex/custom parsing in altimate-code when core - AST/parser support can own it. -- altimate-code should mostly orchestrate, map core findings to review - categories/severity, format verdicts, and run GitHub/CI integration. -- For every candidate improvement: - 1. Write a short spec: risk, deterministic evidence, what must not be flagged. - 2. Add bad and safe/corrected fixtures. - 3. Implement in altimate-core when possible. - 4. Build local core Node package and link altimate-code to it only when e2e - needs the local native package. - 5. Validate with the narrowest useful tests first, then corpus/demo/warehouse - only at acceptance checkpoints. - 6. Keep only if it improves real review value with no new false positives; - otherwise revert/reject or make advisory-only. -- Use the `demo/dbt-pr-review-demo` checkout under the altimate-code workspace - for demo branch e2e. -- Compile base dbt artifacts into target-base and head artifacts into target - before review when structural/equivalence lanes need base-vs-head compiled SQL. -- Use local DuckDB first; use BigQuery where warehouse behavior matters and - credentials are available. Trino/Postgres/ClickHouse/DuckDB can be spawned - locally. Snowflake later. -- Treat altimate-core equivalence as heuristic unless there is a sound proof - path. Do not block or approve solely from heuristic equivalence. Block only on - reproducible deterministic facts or warehouse-observed impact. -- Track results in - docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md. - -Iteration speed: -- Optimize for fast learning loops. Do not run full test suites or expensive - builds by default. -- Start with the narrowest relevant unit test or direct function smoke test. -- Only run broader tests after a candidate passes the focused check. -- Avoid rebuilding altimate-core unless the change affects native exports, NAPI - bindings, or e2e verification requires the local Node package. -- Prefer `cargo test -p altimate-core ` over full cargo - test. -- Prefer `bun test test/altimate/.ts` over full package - tests. -- Use direct Node smoke tests against `crates/altimate-core-node` before - relinking altimate-code. -- Run the real-world corpus and demo matrix only at acceptance checkpoints, not - after every edit. -- Run warehouse e2e only when the value claim depends on observed data - behavior. -- Keep a short validation ladder for each candidate: smoke -> focused unit -> - focused integration -> corpus/demo -> warehouse e2e if needed. - -Initial backlog: -1. Remove safe-refactor noise: base compiled artifact support and/or equivalence - degradation handling. -2. Tighten PII precision: do not flag low-confidence Name such as - customer_name, but still flag email/SSN/phone/payment identifiers. -3. Reduce low-value missing_table_alias comments for dbt CTE-heavy SQL. -4. Expand core structural diff rules for high-signal dbt semantic regressions. -5. Add warehouse data-diff e2e cases that prove real row/value impact. -``` - -## Skill vs Goal - -Create a skill once this loop stabilizes across several iterations. A skill is -the right home for reusable operating instructions: how to pull core, choose -core-vs-code ownership, build local NAPI, run the demo matrix, and apply the -validation ladder. The active `/goal` should still include the quantified -objective above because it provides the stop condition and prevents open-ended -"make it better" work. - -## Baseline From Dry Run - -Commands run from `packages/opencode` unless noted: - -```bash -bun run --conditions=browser script/review-realworld-eval.ts -``` - -Result: - -- Sourced bad-case catch rate: 15/15. -- Corrected-case false positives: 0/5. - -Demo matrix run against `demo/dbt-pr-review-demo` branches with the local dbt -virtualenv and `altimate review --mode gate --json`: - -| Branch | Expected | Actual | Finding categories | -|---|---:|---:|---| -| `demo/safe-refactor` | `APPROVE` | `COMMENT` | `semantic_change`, `pii_exposure`, `sql_quality` | -| `demo/join-key-breakage` | `REQUEST_CHANGES` | `COMMENT` | `semantic_change`, `pii_exposure`, `sql_correctness`, `sql_quality` | -| `demo/test-removal` | `COMMENT` | `COMMENT` | `test_coverage` | -| `demo/new-pii-exposure` | `REQUEST_CHANGES` | `REQUEST_CHANGES` | `pii_exposure`, `semantic_change`, `sql_quality` | -| `demo/mart-select-star` | `COMMENT` | `COMMENT` | `semantic_change`, `pii_exposure`, `sql_quality`, `test_coverage` | -| `demo/incremental-without-guard` | `COMMENT` | `COMMENT` | `semantic_change`, `pii_exposure`, `join_risk`, `warehouse_cost`, `sql_quality`, `materialization`, `sql_correctness`, `test_coverage` | - -Observed problems: - -- `safe-refactor` is noisy. The semantic lane cannot decide equivalence because - base compiled SQL is unavailable, `Name` classification flags `customer_name` - as PII, and `missing_table_alias` flags harmless CTE aliasing. -- `join-key-breakage` is not blocked. The useful join-key mismatch appears from - the advisory AI lane, not the deterministic lane, so it cannot gate. -- Several demo branches are degraded because base-vs-head compiled artifacts are - incomplete. A safe review must not pretend heuristic equivalence is a proof. - -## Loop Contract - -For each candidate improvement: - -1. **Spec**: one paragraph stating the risk, the deterministic evidence, and - what must not be flagged. -2. **Fixtures**: at least one bad case and one corrected/safe countercase. -3. **Implementation**: smallest scoped change in the reviewer or demo workflow. -4. **Validation**: - - Unit or harness test for the detector. - - `bun run --conditions=browser script/review-realworld-eval.ts`. - - Demo branch matrix for affected branches. - - Warehouse-backed run when the value claim depends on actual data impact. -5. **Decision**: - - Keep only if it improves catch rate or demo parity with zero new false - positives. - - Reject or keep advisory-only if evidence is heuristic, schema-dependent, - or too broad. - -## First Candidate Backlog - -### 1. Deterministic Join-Key Regression Detector - -Status: pushed down to `altimate-core-internal` as structural diff rule -`SC010 join_key_regression` in -`crates/altimate-core/src/review/structural_diff.rs`. `altimate-code` only maps -the core finding into the review verdict. - -Risk: a PR changes a join predicate from matching the same business key to -joining unrelated identifiers, e.g. `orders.customer_id = customers.customer_id` -becomes `orders.order_id = customers.customer_id`. - -Evidence: - -- Base and head SQL are both available. -- A changed `JOIN ... ON` equality compares identifier columns ending in `_id`. -- The base predicate joined same-name/same-stem keys, while the head predicate - joins different stems. - -Do not flag: - -- Intentional bridge joins such as `orders.order_id = order_items.order_id`. -- Joins where the changed key stems still match. -- Cases without a base predicate to compare. - -Expected result: - -- `demo/join-key-breakage` gets a deterministic `join_risk` or - `sql_correctness` critical finding and blocks in gate mode. -- `demo/safe-refactor` remains quiet for this detector. - -Validation: - -- `git pull --ff-only origin main` completed in - `/Users/anandgupta/codebase/altimate-core-internal`; latest core main was - `a413804` before this local rule change. -- `cargo test -p altimate-core review::structural_diff --lib` passed with the - new `SC010` bad case and safe countercases. -- `bun test --timeout 30000 test/altimate/review.test.ts` passed. -- `bun run --conditions=browser script/review-realworld-eval.ts` stayed at - 15/15 sourced catches and 0/5 corrected false positives. -- Demo matrix: `demo/join-key-breakage` improved from `COMMENT` to - `REQUEST_CHANGES` with deterministic `join_risk`. -- The detector itself stayed quiet on `demo/safe-refactor` and on the - same-key bridge-join countercase. - -### 2. Base Compiled Artifact Support In Demo CI - -Risk: safe refactors degrade to "could not prove equivalent" because the review -has incomplete proof inputs: missing base compiled SQL or incomplete schema -columns. - -Evidence: - -- `compiled.ts` already supports `target-base/compiled//`. -- The demo workflow currently runs only head-side `dbt compile`. -- `demo/safe-refactor` became decidable after `dbt docs generate` produced - `target/catalog.json`, but core equivalence then falsely compared CTE alias - names inside the join filter. - -Do not flag: - -- A safe CTE rename where base and head compiled SQL are equivalent or where no - deterministic structural difference exists. - -Expected result: - -- The demo workflow compiles base into `target-base` before compiling head. -- The demo workflow produces `target/catalog.json` when a warehouse-backed dbt - project is available, so equivalence has complete columns. -- `demo/safe-refactor` emits no findings with local core and DuckDB catalog - artifacts. - -Implemented result: - -- Core `L012 missing_table_alias` now ignores CTE references, removing the - `order_records/customer_records` alias noise while preserving real table alias - lint. -- Core equivalence now resolves join predicate columns through CTE aliases to - base `table.column` provenance, so CTE alias renames are equivalent but a - changed join key remains material. -- `altimate-code` now threads the detected dialect into - `altimate_core.equivalence` and `--no-ai` correctly disables the advisory lane. -- Local DuckDB safe-refactor validation: `APPROVE`, zero findings. -- Real-world corpus floor preserved: 15/15 bad cases caught, 0/5 false - positives. - -### 3. PII Classification Precision Floor - -Risk: high-noise PII comments on ordinary names such as `customer_name` reduce -trust and make safe PRs look risky. - -Evidence: - -- `classify_pii` labels `customer_name` as `Name` at 75% confidence in the demo. -- The stronger demo/security value is email/SSN/phone/payment identifiers and - source-propagated sensitive columns. - -Do not flag: - -- `Name` classification below a high confidence threshold. -- Pre-existing PII columns not introduced by the PR. - -Expected result: - -- `demo/safe-refactor` no longer reports `customer_name`. -- `demo/new-pii-exposure` still reports `email`. - -Implemented result: - -- Diff-scoped core PII classification is the authoritative PR-review PII - comment when lineage/classification are available. -- High-confidence non-low-risk PII introduced into `marts/` or `reporting/` - is critical and blockable from `altimate_core.classify_pii` evidence. -- Low-risk `Name`/`Address` classifications require at least 90% confidence to - surface, so `first_name`/`customer_name`-style weak signals do not create - noisy comments. -- Fallback `dbt-patterns`/`rule-catalog` PII twins are suppressed for files - where the diff-scoped core classifier ran; fallback remains available when - core lineage/classification is unavailable. -- `demo/new-pii-exposure` now reports one critical core PII finding for - `email` plus the core equivalence warning, and still returns - `REQUEST_CHANGES`. -- Real-world corpus floor preserved: 15/15 bad cases caught, 0/5 false - positives. - -## Current Acceptance Checkpoint - -Demo matrix with local core, base artifacts in `target-base`, head artifacts in -`target`, and AI disabled: - -| Branch | Expected | Actual | Deterministic evidence | -|---|---:|---:|---| -| `demo/safe-refactor` | `APPROVE` | `APPROVE` | no findings | -| `demo/join-key-breakage` | `REQUEST_CHANGES` | `REQUEST_CHANGES` | `altimate_core.structural_diff:SC010`, core equivalence | -| `demo/test-removal` | `COMMENT` | `COMMENT` | `dbt-patterns:removed_tests` | -| `demo/new-pii-exposure` | `REQUEST_CHANGES` | `REQUEST_CHANGES` | `altimate_core.classify_pii`, core equivalence | -| `demo/mart-select-star` | `COMMENT` | `COMMENT` | core equivalence warning | -| `demo/incremental-without-guard` | `COMMENT` | `COMMENT` | core dbt config lint + deterministic catalog cost notes | - -DuckDB e2e proof: - -- Branch: `demo/join-key-breakage`. -- `dbt build --profiles-dir . --target dev`: passed `PASS=14 WARN=0 ERROR=0`. -- Reviewer: `REQUEST_CHANGES` with critical `join_risk` from - `altimate_core.structural_diff` rule `join_key_regression` (`SC010`). - -Warehouse data-diff e2e proof: - -- Test: `packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts`. -- Connection: local DuckDB via `@altimateai/drivers/duckdb`. -- Base/head relations: `base_orders` and `head_orders`. -- Key columns: `order_id`; compared value column: `amount`. -- Observed warehouse delta: one row only in head and one updated value. -- Validation: - `ALTIMATE_RUN_WAREHOUSE_E2E=1 bun test --timeout 30000 test/altimate/data-diff-duckdb-e2e.test.ts`. -- Default fast-loop behavior: - `bun test --timeout 30000 test/altimate/data-diff-duckdb-e2e.test.ts` - skips unless `ALTIMATE_RUN_WAREHOUSE_E2E=1` is set. -- Implementation note: fixed the DuckDB connector to use the two-argument - constructor when no open options are required; Bun's native binding can miss - the callback when `undefined` is passed as the second argument. - -## Warehouse E2E Policy - -Use local DuckDB first for fast end-to-end checks, then add BigQuery runs when a -candidate needs production-style warehouse behavior. Trino, Postgres, -ClickHouse, and DuckDB can be spawned locally. Snowflake is deferred until a -separate account is available. - -Warehouse-backed findings must include: - -- The connection used. -- Base/head SQL or model relation names. -- Key columns. -- Row/value delta summary. -- A skip/degraded state when credentials or drivers are unavailable. From 173f9d94ca146be2c6f32fe6db7fd325c892dcc1 Mon Sep 17 00:00:00 2001 From: anandgupta42 Date: Tue, 9 Jun 2026 18:06:44 -0700 Subject: [PATCH 07/11] chore: drop non-functional dbt review docs edit --- docs/docs/usage/dbt-pr-review.md | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/docs/docs/usage/dbt-pr-review.md b/docs/docs/usage/dbt-pr-review.md index e23cd8ddd..e4b63774b 100644 --- a/docs/docs/usage/dbt-pr-review.md +++ b/docs/docs/usage/dbt-pr-review.md @@ -332,11 +332,8 @@ reviewer does **not** re-implement Jinja — it consumes dbt's own compiled outp from `target/compiled//…` (HEAD) and `target-base/compiled/…` (BASE), produced by `dbt compile`. To enable full equivalence verdicts, compile both the base and head refs in CI (the base into `target-base/`, the Recce - convention). For proof-grade equivalence, also produce `target/catalog.json` - with `dbt docs generate`; the catalog carries complete warehouse columns that - `manifest.json` often lacks. Without compiled SQL or complete columns, - schema-dependent engine lanes such as equivalence and PII stay *undecidable* - (never fabricated) — the `dbt-patterns` lane still runs. + convention). Without compiled SQL the engine lanes fall back to raw and stay + *undecidable* (never fabricated) — the `dbt-patterns` lane still runs. ## What it checks — deterministic rule catalog From 7c7b6dbabe5acb609190a73087aeb1f7f760af99 Mon Sep 17 00:00:00 2001 From: anandgupta42 Date: Tue, 9 Jun 2026 18:23:23 -0700 Subject: [PATCH 08/11] test: harden dbt review parity edge cases --- packages/drivers/test/driver-security.test.ts | 55 ++++++++++ .../src/altimate/review/orchestrate.ts | 82 +++++++------- .../altimate/data-diff-duckdb-e2e.test.ts | 50 +++++++++ .../opencode/test/altimate/review.test.ts | 101 ++++++++++++++++++ 4 files changed, 249 insertions(+), 39 deletions(-) diff --git a/packages/drivers/test/driver-security.test.ts b/packages/drivers/test/driver-security.test.ts index a590235ff..1b48965b4 100644 --- a/packages/drivers/test/driver-security.test.ts +++ b/packages/drivers/test/driver-security.test.ts @@ -229,6 +229,61 @@ describe("DuckDB driver", () => { expect(await connector.execute("SELECT 1")).toMatchObject({ columns: ["ok"], rows: [[1]], row_count: 1 }) await connector.close() }) + + test("retries read-only when native open synchronously reports a file lock", async () => { + let connectAttempts = 0 + mock.module("duckdb", () => ({ + default: { + Database: class { + constructor(_path: string, optsOrCb: any, cb?: (err: Error | null) => void) { + const opts = typeof optsOrCb === "function" ? undefined : optsOrCb + const done = openCallback(optsOrCb, cb) + connectAttempts++ + done(connectAttempts === 1 && !opts?.access_mode ? new Error("DUCKDB_LOCKED: file is locked") : null) + } + connect() { + return { + all: (_sql: string, cb: (err: Error | null, rows: any[]) => void) => { + cb(null, [{ ok: 1 }]) + }, + } + } + close(cb: any) { + if (cb) cb(null) + } + }, + }, + })) + + const { connect } = await import("../src/duckdb") + const connector = await connect({ type: "duckdb", path: "/tmp/sync-lock.duckdb" }) + await connector.connect() + expect(connectAttempts).toBe(2) + expect(await connector.execute("SELECT 1")).toMatchObject({ columns: ["ok"], rows: [[1]], row_count: 1 }) + await connector.close() + }) + + test("propagates synchronous non-lock open errors without connecting undefined db", async () => { + mock.module("duckdb", () => ({ + default: { + Database: class { + constructor(_path: string, optsOrCb: any, cb?: (err: Error | null) => void) { + openCallback(optsOrCb, cb)(new Error("catalog is corrupt")) + } + connect() { + throw new Error("should not connect") + } + close(cb: any) { + if (cb) cb(null) + } + }, + }, + })) + + const { connect } = await import("../src/duckdb") + const connector = await connect({ type: "duckdb", path: "/tmp/corrupt.duckdb" }) + await expect(connector.connect()).rejects.toThrow("catalog is corrupt") + }) }) }) diff --git a/packages/opencode/src/altimate/review/orchestrate.ts b/packages/opencode/src/altimate/review/orchestrate.ts index e5341faea..aa37efcfb 100644 --- a/packages/opencode/src/altimate/review/orchestrate.ts +++ b/packages/opencode/src/altimate/review/orchestrate.ts @@ -916,22 +916,38 @@ async function siblingConsistencyLane(ctxs: ModelContext[], runner: ReviewRunner * only NEW PII (a column whose name appears in the added diff), so it never nags * about pre-existing PII columns the PR didn't touch. */ -async function piiClassifyLane(ctx: ModelContext, runner: ReviewRunner, dialect: string): Promise { - if (!runner.classifyPii) return [] +async function piiClassifyLane( + ctx: ModelContext, + runner: ReviewRunner, + dialect: string, +): Promise<{ findings: Finding[]; introducedColumns: Set; completed: boolean }> { + const empty = { findings: [], introducedColumns: new Set(), completed: false } + if (!runner.classifyPii) return empty const { file, engineNewSql } = ctx - if (!engineNewSql || file.status === "deleted") return [] + if (!engineNewSql || file.status === "deleted") return empty const model = modelNameFromPath(file.path) - if (!runner.columnLineage) return [] + if (!runner.columnLineage) return empty // Output columns the change INTRODUCED = head targets − base targets (precise; // a pre-existing column merely mentioned in the diff is NOT counted). - const headCols = [...new Set((await runner.columnLineage(engineNewSql, dialect)).map((e) => e.target).filter(Boolean))] - if (!headCols.length) return [] - const baseCols = ctx.engineOldSql - ? new Set((await runner.columnLineage(ctx.engineOldSql, dialect)).map((e) => e.target.toLowerCase())) - : undefined - const newCols = baseCols ? headCols.filter((c) => !baseCols.has(c.toLowerCase())) : headCols - if (!newCols.length) return [] - const pii = await runner.classifyPii(newCols) + let newCols: string[] + try { + const headCols = [...new Set((await runner.columnLineage(engineNewSql, dialect)).map((e) => e.target).filter(Boolean))] + if (!headCols.length) return { ...empty, completed: true } + const baseCols = ctx.engineOldSql + ? new Set((await runner.columnLineage(ctx.engineOldSql, dialect)).map((e) => e.target.toLowerCase())) + : undefined + newCols = baseCols ? headCols.filter((c) => !baseCols.has(c.toLowerCase())) : headCols + } catch { + return empty + } + const introducedColumns = new Set(newCols.map((c) => c.toLowerCase())) + if (!newCols.length) return { findings: [], introducedColumns, completed: true } + let pii + try { + pii = await runner.classifyPii(newCols) + } catch { + return empty + } const out: Finding[] = [] const highExposureLayer = /(^|\/)(marts|reporting)\//.test(file.path) for (const p of pii) { @@ -959,16 +975,7 @@ async function piiClassifyLane(ctx: ModelContext, runner: ReviewRunner, dialect: }), ) } - return out -} - -async function introducedOutputColumns(ctx: ModelContext, runner: ReviewRunner, dialect: string): Promise | undefined> { - if (!runner.columnLineage || !ctx.engineNewSql || ctx.file.status === "deleted") return undefined - const headCols = [...new Set((await runner.columnLineage(ctx.engineNewSql, dialect)).map((e) => e.target).filter(Boolean))] - if (!headCols.length) return new Set() - if (!ctx.engineOldSql) return new Set(headCols.map((c) => c.toLowerCase())) - const baseCols = new Set((await runner.columnLineage(ctx.engineOldSql, dialect)).map((e) => e.target.toLowerCase())) - return new Set(headCols.filter((c) => !baseCols.has(c.toLowerCase())).map((c) => c.toLowerCase())) + return { findings: out, introducedColumns, completed: true } } function piiLane(file: ChangedFile & { kind: string }, columns: string[]): Finding[] { @@ -1061,6 +1068,7 @@ export async function runReview(input: OrchestrateInput): Promise() for (const ctx of ctxByPath.values()) { const tasks: Promise[] = [] // Engine lanes consume COMPILED SQL (rendered by dbt) when available. @@ -1086,20 +1094,19 @@ export async function runReview(input: OrchestrateInput): Promise !diffScopedPiiCols.has(col.toLowerCase())) + const fallbackPii = diffScopedPii.completed + ? ctx.pii.filter((col) => !diffScopedPii.introducedColumns.has(col.toLowerCase())) : ctx.pii all.push(piiLane(ctx.file, fallbackPii)) } - // PII classification always runs (cheap name-pattern check, diff-scoped to - // newly-introduced columns) — exposing PII is a hard-floor concern that - // shouldn't depend on the cost tier enabling the pii_exposure lane. - tasks.push(piiClassifyLane(ctx, input.runner, dialect)) + all.push(diffScopedPii.findings) // Deterministic dbt anti-pattern detectors run on RAW SQL + diff (need Jinja). if (lanes.has("dbt_patterns")) { all.push(detectModelPatterns(ctx.file, ctx.newSql, input.rubric)) @@ -1163,12 +1170,9 @@ export async function runReview(input: OrchestrateInput): Promise() - if (input.runner.classifyPii && input.runner.columnLineage) { - for (const ctx of ctxByPath.values()) { - if (ctx.engineNewSql && ctx.file.status !== "deleted") diffScopedPiiFiles.add(ctx.file.path) - } - } + const schemaPiiFiles = new Set( + flat.filter((f) => f.category === "pii_exposure" && f.evidence?.tool === "schema.detect_pii").map((f) => f.file), + ) // Per file, the concatenated rule names core's AST lint actually emitted — // so we suppress a regex twin only when core genuinely covered that concern. const coreRulesByFile = new Map() @@ -1195,7 +1199,7 @@ export async function runReview(input: OrchestrateInput): Promise { expect((result.outcome as any)?.stats?.exclusive_table2).toBe(1) expect((result.outcome as any)?.stats?.updated).toBe(1) }) + + e2eTest("does not report differences for identical DuckDB tables", async () => { + Registry.reset() + Registry.setConfigs({ duck_e2e: { type: "duckdb", path: ":memory:" } }) + const conn = await Registry.get("duck_e2e") + + await conn.execute("CREATE TABLE base_orders (order_id INTEGER, amount INTEGER)") + await conn.execute("CREATE TABLE head_orders (order_id INTEGER, amount INTEGER)") + await conn.execute("INSERT INTO base_orders VALUES (1, 100), (2, 200)") + await conn.execute("INSERT INTO head_orders VALUES (1, 100), (2, 200)") + + const result = await runDataDiff({ + source: "base_orders", + target: "head_orders", + key_columns: ["order_id"], + extra_columns: ["amount"], + source_warehouse: "duck_e2e", + target_warehouse: "duck_e2e", + algorithm: "joindiff", + }) + + expect(result.success).toBe(true) + expect((result.outcome as any)?.stats?.exclusive_table1).toBe(0) + expect((result.outcome as any)?.stats?.exclusive_table2).toBe(0) + expect((result.outcome as any)?.stats?.updated).toBe(0) + }) + + e2eTest("auto-discovers comparable columns when extra_columns is omitted", async () => { + Registry.reset() + Registry.setConfigs({ duck_e2e: { type: "duckdb", path: ":memory:" } }) + const conn = await Registry.get("duck_e2e") + + await conn.execute("CREATE TABLE base_orders (order_id INTEGER, amount INTEGER, updated_at TIMESTAMP)") + await conn.execute("CREATE TABLE head_orders (order_id INTEGER, amount INTEGER, updated_at TIMESTAMP)") + await conn.execute("INSERT INTO base_orders VALUES (1, 100, TIMESTAMP '2026-01-01 00:00:00'), (2, 200, TIMESTAMP '2026-01-01 00:00:00')") + await conn.execute("INSERT INTO head_orders VALUES (1, 100, TIMESTAMP '2026-01-02 00:00:00'), (2, 250, TIMESTAMP '2026-01-02 00:00:00')") + + const result = await runDataDiff({ + source: "base_orders", + target: "head_orders", + key_columns: ["order_id"], + source_warehouse: "duck_e2e", + target_warehouse: "duck_e2e", + algorithm: "joindiff", + }) + + expect(result.success).toBe(true) + expect((result.outcome as any)?.stats?.updated).toBe(1) + expect(result.excluded_audit_columns).toContain("updated_at") + }) }) diff --git a/packages/opencode/test/altimate/review.test.ts b/packages/opencode/test/altimate/review.test.ts index a6c06fd1c..88363b6a7 100644 --- a/packages/opencode/test/altimate/review.test.ts +++ b/packages/opencode/test/altimate/review.test.ts @@ -1410,6 +1410,107 @@ describe("orchestrate", () => { expect(env.verdict).toBe("REQUEST_CHANGES") }) + test("PII fallback survives core lineage failure instead of aborting review", async () => { + const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+ email\n" }] + const runner: ReviewRunner = { + ...fakeRunner({}), + async detectPii() { + return { columns: ["email"] } + }, + async columnLineage() { + throw new Error("lineage parser crashed") + }, + async classifyPii(columns) { + return columns.map((column) => ({ + column, + classification: "Email", + confidence: 0.95, + })) + }, + } + const env = await runReview({ + changedFiles: files, + config: { ...DEFAULT_REVIEW_CONFIG }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: content("select customer_id, email from customers", "select customer_id from customers"), + }) + const pii = env.findings.filter((f) => f.category === "pii_exposure") + expect(pii).toHaveLength(1) + expect(pii[0].evidence?.tool).toBe("schema.detect_pii") + expect(env.verdict).toBe("REQUEST_CHANGES") + }) + + test("PII fallback survives core classification failure instead of hiding risk", async () => { + const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+ email\n" }] + const runner: ReviewRunner = { + ...fakeRunner({}), + async detectPii() { + return { columns: ["email"] } + }, + async columnLineage(sql) { + return sql.includes("email") + ? [ + { source: "customers.customer_id", target: "customer_id" }, + { source: "customers.email", target: "email" }, + ] + : [{ source: "customers.customer_id", target: "customer_id" }] + }, + async classifyPii() { + throw new Error("classifier unavailable") + }, + } + const env = await runReview({ + changedFiles: files, + config: { ...DEFAULT_REVIEW_CONFIG }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: content("select customer_id, email from customers", "select customer_id from customers"), + }) + const pii = env.findings.filter((f) => f.category === "pii_exposure") + expect(pii).toHaveLength(1) + expect(pii[0].evidence?.tool).toBe("schema.detect_pii") + expect(env.verdict).toBe("REQUEST_CHANGES") + }) + + test("case-variant PII detector output does not duplicate core diff-scoped finding", async () => { + const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+ Email as EMAIL\n" }] + const runner: ReviewRunner = { + ...fakeRunner({}), + async detectPii() { + return { columns: ["EMAIL", "email"] } + }, + async columnLineage(sql) { + return sql.toLowerCase().includes("email") + ? [ + { source: "customers.customer_id", target: "customer_id" }, + { source: "customers.email", target: "EMAIL" }, + ] + : [{ source: "customers.customer_id", target: "customer_id" }] + }, + async classifyPii(columns) { + return columns.map((column) => ({ + column, + classification: "Email", + confidence: 0.95, + })) + }, + } + const env = await runReview({ + changedFiles: files, + config: { ...DEFAULT_REVIEW_CONFIG }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: content("select customer_id, Email as EMAIL from customers", "select customer_id from customers"), + }) + const pii = env.findings.filter((f) => f.category === "pii_exposure") + expect(pii).toHaveLength(1) + expect(pii[0].evidence?.tool).toBe("altimate_core.classify_pii") + }) + test("low-confidence name PII does not surface or fall back to regex PII comments", async () => { const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+ first_name\n" }] const runner: ReviewRunner = { From 85f103f7b15d05c738149cc0db9ce3390e947262 Mon Sep 17 00:00:00 2001 From: anandgupta42 Date: Tue, 9 Jun 2026 20:40:40 -0700 Subject: [PATCH 09/11] fix: address consensus review findings on dbt PR reviewer (PR #919) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fixes raised by the multi-model review of #919, prioritizing the PII false-negative that 5 models flagged. - PII recall (MAJOR): the diff-scoped PII dedup keyed off whether the classifier *ran* and off ALL introduced columns, so a high-risk column core's classifier MISSED (e.g. `ssn`) was dropped from the broad `schema.detect_pii` fallback AND had its regex twin suppressed — silently. Now dedup against the columns core actually CLASSIFIED (`classifiedColumns`), and make the regex-twin suppression column-aware (`pii_into_mart` via its matched `result.line`; `select-pii-columns` only when a precise detector actually emitted a finding). A column core never classified now stays eligible for the broad detector and the regex net — including on trivial/lite tiers that don't run the `pii_exposure` lane. Low-confidence name/address suppression is preserved (core *considered* those columns). - Structural escalation (MAJOR): only `join_risk` (SC010 join-key regression) escalates a core `error` to critical/blocking. Other structural rules core marks `error` stay advisory (warning), so the lane never over-blocks on a rule it wasn't designed to gate. - DuckDB (MINOR): arm the 2s fallback timer BEFORE replaying a synchronous open callback so a sync resolve/reject can clear it (no dangling timer delaying process exit); release the half-open handle on open error. - Concurrency (MINOR): run the diff-scoped PII lane as a concurrent task instead of an inline `await`, so its native lineage/classify calls overlap the other per-file engine lanes. - Polish: align `ReviewRunner.equivalence` dialect param to optional (matches the impl); named constants for the PII confidence thresholds; document that PII exposure severity intentionally escalates only for `marts/reporting`; comment the `--no-ai`/`--ai=false` dual check. Adds regression tests: core-missed column surfaces via broad fallback; core-missed high-risk PII still caught by the regex twin when the broad PII lane is off; non-join `error` structural rule stays advisory; sub-error join-key regression stays a warning. Existing 76 review + 26 driver tests still pass. Co-Authored-By: Claude Opus 4.8 (1M context) --- packages/drivers/src/duckdb.ts | 21 ++- .../src/altimate/review/orchestrate.ts | 112 +++++++++++--- packages/opencode/src/cli/cmd/review.ts | 3 + .../opencode/test/altimate/review.test.ts | 137 ++++++++++++++++++ 4 files changed, 249 insertions(+), 24 deletions(-) diff --git a/packages/drivers/src/duckdb.ts b/packages/drivers/src/duckdb.ts index 1aa2fdba7..867840d0a 100644 --- a/packages/drivers/src/duckdb.ts +++ b/packages/drivers/src/duckdb.ts @@ -57,20 +57,33 @@ export async function connect(config: ConnectionConfig): Promise { let resolved = false let timeout: ReturnType | undefined let instance: any + // Sentinel for an open callback that fired synchronously (before + // `instance` was assigned): `undefined` = not yet fired, `null` = + // fired with success, `Error` = fired with failure. Replayed once + // `instance` exists. let pendingOpen: Error | null | undefined const opts = accessMode ? { access_mode: accessMode } : undefined + const closeQuietly = () => { + try { + if (instance && typeof instance.close === "function") instance.close() + } catch { + // best-effort cleanup of a half-open handle + } + } const onOpen = (err: Error | null) => { if (!instance) { pendingOpen = err return } if (resolved) { - if (instance && typeof instance.close === "function") instance.close() + closeQuietly() return } resolved = true if (timeout) clearTimeout(timeout) if (err) { + // Open failed — release the half-open handle so it doesn't leak. + closeQuietly() const msg = err.message || String(err) if (msg.toLowerCase().includes("locked") || msg.includes("SQLITE_BUSY") || msg.includes("DUCKDB_LOCKED")) { reject(new Error("DUCKDB_LOCKED")) @@ -84,14 +97,16 @@ export async function connect(config: ConnectionConfig): Promise { instance = opts ? new duckdb.Database(dbPath, opts, onOpen) : new duckdb.Database(dbPath, onOpen) - if (pendingOpen !== undefined) onOpen(pendingOpen) - // Bun: native callback may not fire; fall back after 2s + // Bun: native callback may not fire; fall back after 2s. Arm the timer + // BEFORE replaying a synchronous callback so a sync resolve/reject can + // actually clear it (otherwise it lingers ~2s and delays process exit). timeout = setTimeout(() => { if (!resolved) { resolved = true reject(new Error(`Timed out opening DuckDB database "${dbPath}"`)) } }, 2000) + if (pendingOpen !== undefined) onOpen(pendingOpen) }) try { diff --git a/packages/opencode/src/altimate/review/orchestrate.ts b/packages/opencode/src/altimate/review/orchestrate.ts index aa37efcfb..ffa0c5615 100644 --- a/packages/opencode/src/altimate/review/orchestrate.ts +++ b/packages/opencode/src/altimate/review/orchestrate.ts @@ -117,7 +117,7 @@ export interface ReviewRunner { impact(model: string): Promise grade(sql: string, dialect: string): Promise check(sql: string, dialect: string, baseSql?: string): Promise - equivalence(oldSql: string, newSql: string, dialect: string): Promise + equivalence(oldSql: string, newSql: string, dialect?: string): Promise detectPii(sql: string, dialect: string): Promise<{ columns: string[] }> /** * Lexical scan (reserved-word aliases + dialect operators) — the curated lists @@ -816,6 +816,20 @@ const STRUCTURAL_CATEGORY: Record = { join_key_regression: "join_risk", } +// PII classification thresholds for the diff-scoped lane. Names/addresses are +// common, lower-sensitivity PII that the broad name-pattern detector over-flags, +// so the precise lane requires higher confidence before surfacing them. Kept a +// named constant (not a magic number) and deliberately narrow: broadening the +// low-risk set to variants would suppress MORE real columns, hurting recall. +const PII_LOW_RISK_CLASS = /^(name|address)$/i +const PII_LOW_RISK_MIN_CONFIDENCE = 0.9 +const PII_DEFAULT_MIN_CONFIDENCE = 0.7 +// Column-name tokens the regex PII twins (`pii_into_mart`) key off — used to +// recover which column a coarse regex finding flagged so suppression can be +// column-aware (suppress only when core actually classified THAT column). +const PII_TOKEN_RE = + /\b(email|ssn|social_security|phone_number|first_name|last_name|full_name|street_address|date_of_birth|dob|passport|credit_card)\b/i + async function structuralChangeLane(ctx: ModelContext, runner: ReviewRunner): Promise { if (!runner.structuralDiff) return [] // Use the ENGINE (compiled) SQL — structural_diff parses with a real SQL parser, @@ -829,7 +843,11 @@ async function structuralChangeLane(ctx: ModelContext, runner: ReviewRunner): Pr const out: Finding[] = [] for (const f of raw) { const cat: ReviewCategory = STRUCTURAL_CATEGORY[f.rule] ?? "semantic_change" - const sev = clampSeverity(cat, f.severity === "error" ? "critical" : "warning", "high") + // Only `join_risk` (SC010 join-key regression) is allowed to block: a changed + // join key silently corrupts row sets. Other structural rules core marks + // `error` stay advisory (warning) so this lane never over-blocks on a rule + // it wasn't designed to gate. + const sev = clampSeverity(cat, cat === "join_risk" && f.severity === "error" ? "critical" : "warning", "high") out.push( makeFinding({ severity: sev, @@ -920,8 +938,8 @@ async function piiClassifyLane( ctx: ModelContext, runner: ReviewRunner, dialect: string, -): Promise<{ findings: Finding[]; introducedColumns: Set; completed: boolean }> { - const empty = { findings: [], introducedColumns: new Set(), completed: false } +): Promise<{ findings: Finding[]; classifiedColumns: Set; completed: boolean }> { + const empty = { findings: [], classifiedColumns: new Set(), completed: false } if (!runner.classifyPii) return empty const { file, engineNewSql } = ctx if (!engineNewSql || file.status === "deleted") return empty @@ -940,20 +958,29 @@ async function piiClassifyLane( } catch { return empty } - const introducedColumns = new Set(newCols.map((c) => c.toLowerCase())) - if (!newCols.length) return { findings: [], introducedColumns, completed: true } + if (!newCols.length) return { findings: [], classifiedColumns: new Set(), completed: true } let pii try { pii = await runner.classifyPii(newCols) } catch { return empty } + // Columns core's classifier actually CONSIDERED (returned a verdict for), even + // if below threshold. The broad fallback and the regex twins dedup against THIS + // set — not all introduced columns — so a column core simply MISSED (never + // classified) stays eligible for the broad name-pattern detector instead of + // being silently dropped. + const classifiedColumns = new Set(pii.map((p) => String(p.column ?? "").toLowerCase()).filter(Boolean)) const out: Finding[] = [] + // "Exposure" severity only escalates for published/broadly-read layers + // (marts/reporting). This is intentionally narrower than where grain tests + // apply (`missingGrainTestLane` also includes `intermediate`): an internal + // intermediate model surfacing PII is a warning, not a hard block. const highExposureLayer = /(^|\/)(marts|reporting)\//.test(file.path) for (const p of pii) { const classification = String(p.classification ?? "") - const lowRiskClass = /^(name|address)$/i.test(classification) - const minConfidence = lowRiskClass ? 0.9 : 0.7 + const lowRiskClass = PII_LOW_RISK_CLASS.test(classification) + const minConfidence = lowRiskClass ? PII_LOW_RISK_MIN_CONFIDENCE : PII_DEFAULT_MIN_CONFIDENCE if (p.confidence < minConfidence || !p.column) continue const col = p.column.toLowerCase() const highConfidence = p.confidence >= 0.9 @@ -975,7 +1002,7 @@ async function piiClassifyLane( }), ) } - return { findings: out, introducedColumns, completed: true } + return { findings: out, classifiedColumns, completed: true } } function piiLane(file: ChangedFile & { kind: string }, columns: string[]): Finding[] { @@ -1068,7 +1095,13 @@ export async function runReview(input: OrchestrateInput): Promise() + // Files where the diff-scoped PII classifier completed (looked at the change), + // those where it actually emitted a finding, and — per file — the lowercased + // columns it classified. These drive column-aware dedup of the coarse regex + // PII twins in the merge step below. + const diffScopedPiiCompletedFiles = new Set() + const diffScopedPiiFindingFiles = new Set() + const classifiedPiiColumnsByFile = new Map>() for (const ctx of ctxByPath.values()) { const tasks: Promise[] = [] // Engine lanes consume COMPILED SQL (rendered by dbt) when available. @@ -1098,15 +1131,31 @@ export async function runReview(input: OrchestrateInput): Promise !diffScopedPii.introducedColumns.has(col.toLowerCase())) - : ctx.pii - all.push(piiLane(ctx.file, fallbackPii)) - } - all.push(diffScopedPii.findings) + // Run it as a concurrent task (not an inline await) so the heavy native + // lineage/classify calls overlap the other engine lanes for this file. + const currentCtx = ctx + tasks.push( + (async () => { + const diffScopedPii = await piiClassifyLane(currentCtx, input.runner, dialect) + if (diffScopedPii.completed) { + diffScopedPiiCompletedFiles.add(currentCtx.file.path) + classifiedPiiColumnsByFile.set(currentCtx.file.path, diffScopedPii.classifiedColumns) + } + if (diffScopedPii.findings.length) diffScopedPiiFindingFiles.add(currentCtx.file.path) + const findings = [...diffScopedPii.findings] + if (lanes.has("pii_exposure")) { + // The broad fallback covers output PII columns the precise lane did NOT + // classify (pre-existing columns, or ones core's classifier missed) — + // dedup against the columns core actually CONSIDERED, never all + // introduced columns, so a missed column still surfaces here. + const fallbackPii = diffScopedPii.completed + ? currentCtx.pii.filter((col) => !diffScopedPii.classifiedColumns.has(col.toLowerCase())) + : currentCtx.pii + findings.push(...piiLane(currentCtx.file, fallbackPii)) + } + return findings + })(), + ) // Deterministic dbt anti-pattern detectors run on RAW SQL + diff (need Jinja). if (lanes.has("dbt_patterns")) { all.push(detectModelPatterns(ctx.file, ctx.newSql, input.rubric)) @@ -1197,13 +1246,34 @@ export async function runReview(input: OrchestrateInput): Promise { const tool = f.evidence?.tool const fallbackRule = String((f.evidence?.result as any)?.rule ?? "").replace(/_/g, "-") + // Dedup the coarse regex PII twins against the precise deterministic PII + // detectors — but only when a better detector genuinely COVERS the same + // column. Suppressing whenever the classifier merely "ran" hid real PII + // (a high-risk column core's classifier missed but the regex would catch, + // especially on tiers without the broad `pii_exposure` lane). if ( (tool === "dbt-patterns" || tool === "rule-catalog") && - (diffScopedPiiFiles.has(f.file) || schemaPiiFiles.has(f.file)) && f.category === "pii_exposure" && (fallbackRule === "pii-into-mart" || fallbackRule === "select-pii-columns") ) { - return false + // The broad `schema.detect_pii` lane already flagged this file → redundant. + if (schemaPiiFiles.has(f.file)) return false + if (fallbackRule === "pii-into-mart") { + // `pii_into_mart` carries the matched column in `result.line`: suppress + // only when core's classifier actually CONSIDERED that column (flagged or + // deliberately muted). A column core never classified falls through. + const token = PII_TOKEN_RE.exec(String((f.evidence?.result as any)?.line ?? ""))?.[1]?.toLowerCase() + if (token && classifiedPiiColumnsByFile.get(f.file)?.has(token)) return false + // No recoverable column: defer to the prior file-level behavior so the + // twin isn't double-reported when core ran for the file. + if (!token && diffScopedPiiCompletedFiles.has(f.file)) return false + return true + } + // `select-pii-columns` doesn't preserve its column. Suppress only when a + // precise detector actually EMITTED a PII finding for the file (so the + // exposure is reported by the better source, not silently dropped). + if (diffScopedPiiFindingFiles.has(f.file)) return false + return true } if (!coreRanFiles.has(f.file)) return true if (tool !== "dbt-patterns" && tool !== "rule-catalog") return true diff --git a/packages/opencode/src/cli/cmd/review.ts b/packages/opencode/src/cli/cmd/review.ts index 7b594025c..ce766cd2e 100644 --- a/packages/opencode/src/cli/cmd/review.ts +++ b/packages/opencode/src/cli/cmd/review.ts @@ -58,6 +58,9 @@ export const ReviewCommand = cmd({ manifestPath: args.manifest as string | undefined, mode: args.mode as ReviewMode | undefined, severityThreshold: args.severity as Severity | undefined, + // The flag is registered as `--no-ai`, so yargs sets `args.noAi`. We also + // accept `args.ai === false` to cover yargs' boolean auto-negation + // (`--ai=false`) and programmatic callers that pass `ai: false`. noAi: args.noAi === true || args.ai === false, }) diff --git a/packages/opencode/test/altimate/review.test.ts b/packages/opencode/test/altimate/review.test.ts index 88363b6a7..251c4f469 100644 --- a/packages/opencode/test/altimate/review.test.ts +++ b/packages/opencode/test/altimate/review.test.ts @@ -1545,6 +1545,143 @@ describe("orchestrate", () => { expect(env.findings.some((f) => f.category === "pii_exposure")).toBe(false) }) + test("diff-scoped PII recall: a column core's classifier MISSED still surfaces via the broad fallback", async () => { + // Regression guard: when core `classify_pii` returns NOTHING for an introduced + // column (it missed it), that column must remain eligible for the broad + // `schema.detect_pii` detector — it must not be silently dropped just because + // the diff-scoped lane ran for the file. + const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+ ssn\n" }] + const runner: ReviewRunner = { + ...fakeRunner({}), + async detectPii() { + return { columns: ["ssn"] } + }, + async columnLineage(sql) { + return sql.includes("ssn") + ? [ + { source: "customers.customer_id", target: "customer_id" }, + { source: "customers.ssn", target: "ssn" }, + ] + : [{ source: "customers.customer_id", target: "customer_id" }] + }, + async classifyPii() { + return [] + }, + } + const env = await runReview({ + changedFiles: files, + config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["pii_exposure"] }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: content("select customer_id, ssn from customers", "select customer_id from customers"), + }) + const pii = env.findings.filter((f) => f.category === "pii_exposure") + expect(pii).toHaveLength(1) + expect(pii[0].evidence?.tool).toBe("schema.detect_pii") + expect(env.verdict).toBe("REQUEST_CHANGES") + }) + + test("core-missed high-risk PII still caught by the regex twin when the broad PII lane is off", async () => { + // Lower tiers (trivial/lite) run `dbt_patterns` but not `pii_exposure`. If core + // classification ran but missed a high-risk column, the regex twin is the only + // safety net — it must NOT be suppressed for a column core never classified. + const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+ ssn\n" }] + const runner: ReviewRunner = { + ...fakeRunner({}), + async detectPii() { + return { columns: ["ssn"] } + }, + async columnLineage(sql) { + return sql.includes("ssn") + ? [ + { source: "customers.customer_id", target: "customer_id" }, + { source: "customers.ssn", target: "ssn" }, + ] + : [{ source: "customers.customer_id", target: "customer_id" }] + }, + async classifyPii() { + return [] + }, + } + const env = await runReview({ + changedFiles: files, + config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["dbt_patterns"] }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: content("select customer_id, ssn from customers", "select customer_id from customers"), + }) + const pii = env.findings.filter((f) => f.category === "pii_exposure") + expect( + pii.some((f) => f.evidence?.tool === "dbt-patterns" && (f.evidence?.result as any)?.rule === "pii_into_mart"), + ).toBe(true) + expect(env.verdict).toBe("REQUEST_CHANGES") + }) + + test("structural error severity blocks only for join_risk; other rules stay advisory", async () => { + // SC003 group_by_change is `semantic_change`, not `join_risk`. Even at core + // `severity: "error"` it must NOT be escalated to critical — only join-key + // regressions are allowed to hard-block. + const oldSql = "select g, sum(x) as x from t group by g" + const newSql = "select g, h, sum(x) as x from t group by g, h" + const runner: ReviewRunner = { + ...fakeRunner({}), + async structuralDiff() { + return [{ code: "SC003", rule: "group_by_change", severity: "error", message: "grain changed" }] + }, + } + const env = await runReview({ + changedFiles: [{ path: "models/marts/fct_orders.sql", status: "modified", diff: "+group by g, h\n" }], + config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["semantic_change"], ai: false }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: async (_f, side) => (side === "new" ? newSql : oldSql), + getCompiled: async (_f, side) => (side === "new" ? newSql : oldSql), + generatedAt: "2026-06-08T00:00:00Z", + }) + const f = env.findings.find( + (x) => x.evidence?.tool === "altimate_core.structural_diff" && (x.evidence?.result as any)?.code === "SC003", + ) + expect(f).toBeDefined() + expect(f!.severity).toBe("warning") + expect(f!.category).toBe("semantic_change") + }) + + test("join_key_regression below error severity stays advisory (warning, not critical)", async () => { + const oldSql = "select * from orders left join customers on orders.customer_id = customers.customer_id" + const newSql = "select * from orders left join customers on orders.order_id = customers.customer_id" + const runner: ReviewRunner = { + ...fakeRunner({}), + async structuralDiff() { + return [{ code: "SC010", rule: "join_key_regression", severity: "warning", message: "join key changed" }] + }, + } + const env = await runReview({ + changedFiles: [ + { + path: "models/marts/fct_customer_orders.sql", + status: "modified", + diff: "+on orders.order_id = customers.customer_id\n", + }, + ], + config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["semantic_change"], ai: false }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: async (_f, side) => (side === "new" ? newSql : oldSql), + getCompiled: async (_f, side) => (side === "new" ? newSql : oldSql), + generatedAt: "2026-06-08T00:00:00Z", + }) + const f = env.findings.find( + (x) => x.evidence?.tool === "altimate_core.structural_diff" && (x.evidence?.result as any)?.code === "SC010", + ) + expect(f).toBeDefined() + expect(f!.severity).toBe("warning") + expect(f!.category).toBe("join_risk") + }) + test("clean change with no manifest → degraded, APPROVE/COMMENT, lint-only labeled", async () => { const files: ChangedFile[] = [{ path: "models/staging/stg_x.sql", status: "modified", diff: "+select 1\n" }] const runner: ReviewRunner = { From 842ebf12cc90fe6e288ea01850744823b175807f Mon Sep 17 00:00:00 2001 From: anandgupta42 Date: Wed, 10 Jun 2026 00:39:37 -0700 Subject: [PATCH 10/11] fix: address PR #919 review comments on dbt PR reviewer MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Resolve the three unresolved bot-review threads on PR #919: - `orchestrate.ts` (`piiClassifyLane`): when `columnLineage` resolves but derives zero output targets, return `empty` (`completed: false`) instead of `completed: true`. Zero head targets is lineage *uncertainty* (the classifier considered nothing), not a confident "no new PII", so the file must not be marked PII-complete — keeping the deterministic regex fallback alive. Distinct from the `!newCols.length` case, where lineage resolved columns but none are newly introduced (genuinely done). Defensive/intent fix: behavior-neutral for the current detectors (the tokenless suppression path is unreachable because `pii_into_mart` always carries a recoverable column token), but it makes the `completed` contract honest and removes a latent footgun. - `review.ts`: correct the misleading `noAi` comment. Only `--no-ai` is registered, so `--ai=false` is not a supported CLI flag; `args.ai === false` only covers programmatic callers. - `driver-security.test.ts`: assert the DuckDB lock retry actually requests `READ_ONLY` (record `access_mode` per attempt), not just that two attempts occurred. Tests: add two `review.test.ts` regressions — a zero-target-lineage guard and a token-gated suppression guard (core muted a low-confidence column → the coarse regex twin is suppressed). Review suite 82 pass, driver suite 26 pass. Reviewed via multi-model consensus (Claude + 7 models); the lone objection was test completeness, now incorporated. Co-Authored-By: Claude Opus 4.8 (1M context) --- packages/drivers/test/driver-security.test.ts | 6 ++ .../src/altimate/review/orchestrate.ts | 7 +- packages/opencode/src/cli/cmd/review.ts | 7 +- .../opencode/test/altimate/review.test.ts | 78 +++++++++++++++++++ 4 files changed, 94 insertions(+), 4 deletions(-) diff --git a/packages/drivers/test/driver-security.test.ts b/packages/drivers/test/driver-security.test.ts index 1b48965b4..6a0f8d8c5 100644 --- a/packages/drivers/test/driver-security.test.ts +++ b/packages/drivers/test/driver-security.test.ts @@ -136,12 +136,14 @@ describe("DuckDB driver", () => { describe("connect retry with READ_ONLY", () => { test("retries with READ_ONLY when file DB is locked on initial connect", async () => { let connectAttempts = 0 + const accessModes: Array = [] mock.module("duckdb", () => ({ default: { Database: class { constructor(_path: string, optsOrCb: any, cb?: (err: Error | null) => void) { const opts = typeof optsOrCb === "function" ? undefined : optsOrCb const done = openCallback(optsOrCb, cb) + accessModes.push(opts?.access_mode) connectAttempts++ if (connectAttempts === 1 && !opts?.access_mode) { // First attempt fails with lock error @@ -169,6 +171,10 @@ describe("DuckDB driver", () => { const connector = await connect({ type: "duckdb", path: "/tmp/test.duckdb" }) await connector.connect() expect(connectAttempts).toBe(2) // First failed, second succeeded in READ_ONLY + // The retry must specifically request READ_ONLY — two attempts alone don't + // prove the lock was worked around correctly. + expect(accessModes[0]).toBeUndefined() + expect(accessModes[1]).toBe("READ_ONLY") await connector.close() }) diff --git a/packages/opencode/src/altimate/review/orchestrate.ts b/packages/opencode/src/altimate/review/orchestrate.ts index ffa0c5615..68cefafde 100644 --- a/packages/opencode/src/altimate/review/orchestrate.ts +++ b/packages/opencode/src/altimate/review/orchestrate.ts @@ -950,7 +950,12 @@ async function piiClassifyLane( let newCols: string[] try { const headCols = [...new Set((await runner.columnLineage(engineNewSql, dialect)).map((e) => e.target).filter(Boolean))] - if (!headCols.length) return { ...empty, completed: true } + // Zero head targets means lineage could NOT resolve this model's output + // columns — the classifier considered nothing, so this is uncertainty, not a + // confident "no new PII". Returning `completed: false` keeps the deterministic + // regex fallback alive (vs. the `!newCols.length` case below, where lineage + // DID resolve columns but none are newly introduced — that is genuinely done). + if (!headCols.length) return empty const baseCols = ctx.engineOldSql ? new Set((await runner.columnLineage(ctx.engineOldSql, dialect)).map((e) => e.target.toLowerCase())) : undefined diff --git a/packages/opencode/src/cli/cmd/review.ts b/packages/opencode/src/cli/cmd/review.ts index ce766cd2e..c271dbe5f 100644 --- a/packages/opencode/src/cli/cmd/review.ts +++ b/packages/opencode/src/cli/cmd/review.ts @@ -58,9 +58,10 @@ export const ReviewCommand = cmd({ manifestPath: args.manifest as string | undefined, mode: args.mode as ReviewMode | undefined, severityThreshold: args.severity as Severity | undefined, - // The flag is registered as `--no-ai`, so yargs sets `args.noAi`. We also - // accept `args.ai === false` to cover yargs' boolean auto-negation - // (`--ai=false`) and programmatic callers that pass `ai: false`. + // The flag is registered as `--no-ai`, so yargs sets `args.noAi`. No `ai` + // option is declared, so `--ai=false` is NOT a supported CLI flag; the + // `args.ai === false` check only covers programmatic callers that pass + // `ai: false` directly. noAi: args.noAi === true || args.ai === false, }) diff --git a/packages/opencode/test/altimate/review.test.ts b/packages/opencode/test/altimate/review.test.ts index 251c4f469..0e94f0ff4 100644 --- a/packages/opencode/test/altimate/review.test.ts +++ b/packages/opencode/test/altimate/review.test.ts @@ -1619,6 +1619,84 @@ describe("orchestrate", () => { expect(env.verdict).toBe("REQUEST_CHANGES") }) + test("zero-target lineage does not mark the file PII-complete, so the regex twin survives", async () => { + // Regression guard: when `columnLineage` RESOLVES but yields no output targets + // (the parser couldn't derive columns — uncertainty, not "no new PII"), the + // diff-scoped lane must not flag the file as completed. Otherwise the file + // enters the PII-completed/classified sets and a deterministic regex twin could + // be suppressed for a column core never actually classified. + const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+ ssn\n" }] + const runner: ReviewRunner = { + ...fakeRunner({}), + async detectPii() { + return { columns: ["ssn"] } + }, + async columnLineage() { + // Lineage resolves but derives zero output columns for every input. + return [] + }, + async classifyPii() { + return [] + }, + } + const env = await runReview({ + changedFiles: files, + config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["dbt_patterns"] }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: content("select customer_id, ssn from customers", "select customer_id from customers"), + }) + const pii = env.findings.filter((f) => f.category === "pii_exposure") + expect( + pii.some((f) => f.evidence?.tool === "dbt-patterns" && (f.evidence?.result as any)?.rule === "pii_into_mart"), + ).toBe(true) + expect(env.verdict).toBe("REQUEST_CHANGES") + }) + + test("core considered the matched column (muted as low-confidence) → the regex twin is suppressed", async () => { + // Inverse of the recall guards above: suppression must be COLUMN-aware, not + // merely "the classifier ran". Here core's `classify_pii` DID consider `ssn` + // but muted it (confidence below threshold → no diff-scoped finding). Because + // the column was considered, the coarse `pii_into_mart` regex twin for the + // same column is redundant and must be suppressed — otherwise the same column + // is double-reported. This exercises the token-gated suppression branch + // (`classifiedPiiColumnsByFile.has(token)`), distinct from the "never + // classified → twin survives" path. + const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+ ssn\n" }] + const runner: ReviewRunner = { + ...fakeRunner({}), + async detectPii() { + return { columns: ["ssn"] } + }, + async columnLineage(sql) { + return sql.includes("ssn") + ? [ + { source: "customers.customer_id", target: "customer_id" }, + { source: "customers.ssn", target: "ssn" }, + ] + : [{ source: "customers.customer_id", target: "customer_id" }] + }, + async classifyPii() { + // Considered `ssn` but is not confident → muted, no diff-scoped finding, + // yet `ssn` still lands in the classified-columns set. + return [{ column: "ssn", classification: "SSN", confidence: 0.1 }] + }, + } + const env = await runReview({ + changedFiles: files, + config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["dbt_patterns"] }, + rubric: DEFAULT_RUBRIC, + mode: "gate", + runner, + getContent: content("select customer_id, ssn from customers", "select customer_id from customers"), + }) + const piiIntoMart = env.findings.filter( + (f) => f.evidence?.tool === "dbt-patterns" && (f.evidence?.result as any)?.rule === "pii_into_mart", + ) + expect(piiIntoMart).toHaveLength(0) + }) + test("structural error severity blocks only for join_risk; other rules stay advisory", async () => { // SC003 group_by_change is `semantic_change`, not `join_risk`. Even at core // `severity: "error"` it must NOT be escalated to critical — only join-key From 89f77a75cacad2e82ea76729bf6d7a8349ac3a3f Mon Sep 17 00:00:00 2001 From: anandgupta42 Date: Wed, 10 Jun 2026 00:53:57 -0700 Subject: [PATCH 11/11] fix: drop unsupported dialect arg from checkEquivalence call MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `core.checkEquivalence` (altimate-core@0.4.0) is typed `(sqlA, sqlB, schema)` — 3 params, no dialect (unlike `columnLineage`/`compareQueries`). A 4th `dialect` arg was added to this call earlier on the branch (b5060e0ce6), which the napi binding silently ignores at runtime but which fails `tsgo` typecheck (TS2554) and blocks the pre-push hook. Align with the binding contract and the sibling call site in `native/sql/register.ts`. Behavior-neutral. Co-Authored-By: Claude Opus 4.8 (1M context) --- packages/opencode/src/altimate/native/altimate-core.ts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/packages/opencode/src/altimate/native/altimate-core.ts b/packages/opencode/src/altimate/native/altimate-core.ts index 2d58bad38..a41296ee7 100644 --- a/packages/opencode/src/altimate/native/altimate-core.ts +++ b/packages/opencode/src/altimate/native/altimate-core.ts @@ -320,7 +320,7 @@ export function registerAll(): void { register("altimate_core.equivalence", async (params) => { try { const schema = schemaOrEmpty(params.schema_path, params.schema_context) - const raw = await core.checkEquivalence(params.sql1, params.sql2, schema, params.dialect) + const raw = await core.checkEquivalence(params.sql1, params.sql2, schema) const data = toData(raw) return ok(true, data) } catch (e) {