From 80b8683836065d1551611f6dc6b67d56403ea483 Mon Sep 17 00:00:00 2001
From: anandgupta42 <anand@altimate.ai>
Date: Mon, 8 Jun 2026 12:01:09 -0700
Subject: [PATCH 01/11] Document dbt review improvement loop

---
 ...-08-dbt-pr-review-self-improvement-loop.md | 263 ++++++++++++++++++
 .../src/altimate/review/orchestrate.ts        |   3 +-
 .../opencode/test/altimate/review.test.ts     | 124 +++++++++
 3 files changed, 389 insertions(+), 1 deletion(-)
 create mode 100644 docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md

diff --git a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md
new file mode 100644
index 000000000..bddce7b35
--- /dev/null
+++ b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md
@@ -0,0 +1,263 @@
+# dbt PR Review Self-Improvement Loop
+
+Date: 2026-06-08
+
+## Goal
+
+Make `altimate review` the highest-signal dbt PR reviewer by adding only checks
+that survive an end-to-end value test:
+
+1. The finding is backed by deterministic evidence, warehouse execution, or a
+   clearly bounded advisory rule.
+2. It catches a real dbt PR risk in the public demo or sourced benchmark corpus.
+3. It does not add false positives to corrected/safe cases.
+4. It produces a review comment that a data engineer would act on.
+5. It degrades loudly when required artifacts, schema, or warehouse access are
+   missing.
+
+## Recommended `/goal` Command
+
+```text
+/goal Objective: make the dbt PR reviewer demo-parity reliable and measurably
+higher-signal by closing the current public-demo gaps while preserving the
+existing benchmark floor.
+
+Quantified acceptance target:
+- Public demo matrix reaches expected verdicts for all 6 demo branches:
+  safe-refactor=APPROVE, join-key-breakage=REQUEST_CHANGES,
+  test-removal=COMMENT, new-pii-exposure=REQUEST_CHANGES,
+  mart-select-star=COMMENT, incremental-without-guard=COMMENT.
+- Real-world corpus remains at 15/15 caught bad cases and 0/5 false positives
+  on corrected cases.
+- At least one warehouse-backed DuckDB e2e proves a real dbt build/test pass
+  that the reviewer catches pre-merge.
+- Every blocking finding is produced by deterministic core evidence or
+  warehouse-observed impact, not advisory AI or heuristic equivalence.
+
+Operating rules:
+- Always start by pulling latest main of
+  /Users/anandgupta/codebase/altimate-core-internal with fast-forward only.
+- Prefer implementing deterministic SQL/dbt analysis in altimate-core, not
+  altimate-code. Avoid regex/custom parsing in altimate-code when core
+  AST/parser support can own it.
+- altimate-code should mostly orchestrate, map core findings to review
+  categories/severity, format verdicts, and run GitHub/CI integration.
+- For every candidate improvement:
+  1. Write a short spec: risk, deterministic evidence, what must not be flagged.
+  2. Add bad and safe/corrected fixtures.
+  3. Implement in altimate-core when possible.
+  4. Build local core Node package and link altimate-code to it only when e2e
+     needs the local native package.
+  5. Validate with the narrowest useful tests first, then corpus/demo/warehouse
+     only at acceptance checkpoints.
+  6. Keep only if it improves real review value with no new false positives;
+     otherwise revert/reject or make advisory-only.
+- Use /Users/anandgupta/codebase/altimate-code/demo/dbt-pr-review-demo for demo
+  branch e2e.
+- Compile base dbt artifacts into target-base and head artifacts into target
+  before review when structural/equivalence lanes need base-vs-head compiled SQL.
+- Use local DuckDB first; use BigQuery where warehouse behavior matters and
+  credentials are available. Trino/Postgres/ClickHouse/DuckDB can be spawned
+  locally. Snowflake later.
+- Treat altimate-core equivalence as heuristic unless there is a sound proof
+  path. Do not block or approve solely from heuristic equivalence. Block only on
+  reproducible deterministic facts or warehouse-observed impact.
+- Track results in
+  docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md.
+
+Iteration speed:
+- Optimize for fast learning loops. Do not run full test suites or expensive
+  builds by default.
+- Start with the narrowest relevant unit test or direct function smoke test.
+- Only run broader tests after a candidate passes the focused check.
+- Avoid rebuilding altimate-core unless the change affects native exports, NAPI
+  bindings, or e2e verification requires the local Node package.
+- Prefer `cargo test -p altimate-core <module_or_test_name>` over full cargo
+  test.
+- Prefer `bun test test/altimate/<specific-test-file>.ts` over full package
+  tests.
+- Use direct Node smoke tests against `crates/altimate-core-node` before
+  relinking altimate-code.
+- Run the real-world corpus and demo matrix only at acceptance checkpoints, not
+  after every edit.
+- Run warehouse e2e only when the value claim depends on observed data
+  behavior.
+- Keep a short validation ladder for each candidate: smoke -> focused unit ->
+  focused integration -> corpus/demo -> warehouse e2e if needed.
+
+Initial backlog:
+1. Remove safe-refactor noise: base compiled artifact support and/or equivalence
+   degradation handling.
+2. Tighten PII precision: do not flag low-confidence Name such as
+   customer_name, but still flag email/SSN/phone/payment identifiers.
+3. Reduce low-value missing_table_alias comments for dbt CTE-heavy SQL.
+4. Expand core structural diff rules for high-signal dbt semantic regressions.
+5. Add warehouse data-diff e2e cases that prove real row/value impact.
+```
+
+## Skill vs Goal
+
+Create a skill once this loop stabilizes across several iterations. A skill is
+the right home for reusable operating instructions: how to pull core, choose
+core-vs-code ownership, build local NAPI, run the demo matrix, and apply the
+validation ladder. The active `/goal` should still include the quantified
+objective above because it provides the stop condition and prevents open-ended
+"make it better" work.
+
+## Baseline From Dry Run
+
+Commands run from `packages/opencode` unless noted:
+
+```bash
+bun run --conditions=browser script/review-realworld-eval.ts
+```
+
+Result:
+
+- Sourced bad-case catch rate: 15/15.
+- Corrected-case false positives: 0/5.
+
+Demo matrix run against `demo/dbt-pr-review-demo` branches with the local dbt
+virtualenv and `altimate review --mode gate --json`:
+
+| Branch | Expected | Actual | Finding categories |
+|---|---:|---:|---|
+| `demo/safe-refactor` | `APPROVE` | `COMMENT` | `semantic_change`, `pii_exposure`, `sql_quality` |
+| `demo/join-key-breakage` | `REQUEST_CHANGES` | `COMMENT` | `semantic_change`, `pii_exposure`, `sql_correctness`, `sql_quality` |
+| `demo/test-removal` | `COMMENT` | `COMMENT` | `test_coverage` |
+| `demo/new-pii-exposure` | `REQUEST_CHANGES` | `REQUEST_CHANGES` | `pii_exposure`, `semantic_change`, `sql_quality` |
+| `demo/mart-select-star` | `COMMENT` | `COMMENT` | `semantic_change`, `pii_exposure`, `sql_quality`, `test_coverage` |
+| `demo/incremental-without-guard` | `COMMENT` | `COMMENT` | `semantic_change`, `pii_exposure`, `join_risk`, `warehouse_cost`, `sql_quality`, `materialization`, `sql_correctness`, `test_coverage` |
+
+Observed problems:
+
+- `safe-refactor` is noisy. The semantic lane cannot decide equivalence because
+  base compiled SQL is unavailable, `Name` classification flags `customer_name`
+  as PII, and `missing_table_alias` flags harmless CTE aliasing.
+- `join-key-breakage` is not blocked. The useful join-key mismatch appears from
+  the advisory AI lane, not the deterministic lane, so it cannot gate.
+- Several demo branches are degraded because base-vs-head compiled artifacts are
+  incomplete. A safe review must not pretend heuristic equivalence is a proof.
+
+## Loop Contract
+
+For each candidate improvement:
+
+1. **Spec**: one paragraph stating the risk, the deterministic evidence, and
+   what must not be flagged.
+2. **Fixtures**: at least one bad case and one corrected/safe countercase.
+3. **Implementation**: smallest scoped change in the reviewer or demo workflow.
+4. **Validation**:
+   - Unit or harness test for the detector.
+   - `bun run --conditions=browser script/review-realworld-eval.ts`.
+   - Demo branch matrix for affected branches.
+   - Warehouse-backed run when the value claim depends on actual data impact.
+5. **Decision**:
+   - Keep only if it improves catch rate or demo parity with zero new false
+     positives.
+   - Reject or keep advisory-only if evidence is heuristic, schema-dependent,
+     or too broad.
+
+## First Candidate Backlog
+
+### 1. Deterministic Join-Key Regression Detector
+
+Status: pushed down to `altimate-core-internal` as structural diff rule
+`SC010 join_key_regression` in
+`crates/altimate-core/src/review/structural_diff.rs`. `altimate-code` only maps
+the core finding into the review verdict.
+
+Risk: a PR changes a join predicate from matching the same business key to
+joining unrelated identifiers, e.g. `orders.customer_id = customers.customer_id`
+becomes `orders.order_id = customers.customer_id`.
+
+Evidence:
+
+- Base and head SQL are both available.
+- A changed `JOIN ... ON` equality compares identifier columns ending in `_id`.
+- The base predicate joined same-name/same-stem keys, while the head predicate
+  joins different stems.
+
+Do not flag:
+
+- Intentional bridge joins such as `orders.order_id = order_items.order_id`.
+- Joins where the changed key stems still match.
+- Cases without a base predicate to compare.
+
+Expected result:
+
+- `demo/join-key-breakage` gets a deterministic `join_risk` or
+  `sql_correctness` critical finding and blocks in gate mode.
+- `demo/safe-refactor` remains quiet for this detector.
+
+Validation:
+
+- `git pull --ff-only origin main` completed in
+  `/Users/anandgupta/codebase/altimate-core-internal`; latest core main was
+  `a413804` before this local rule change.
+- `cargo test -p altimate-core review::structural_diff --lib` passed with the
+  new `SC010` bad case and safe countercases.
+- `bun test --timeout 30000 test/altimate/review.test.ts` passed.
+- `bun run --conditions=browser script/review-realworld-eval.ts` stayed at
+  15/15 sourced catches and 0/5 corrected false positives.
+- Demo matrix: `demo/join-key-breakage` improved from `COMMENT` to
+  `REQUEST_CHANGES` with deterministic `join_risk`.
+- The detector itself stayed quiet on `demo/safe-refactor` and on the
+  same-key bridge-join countercase.
+
+### 2. Base Compiled Artifact Support In Demo CI
+
+Risk: safe refactors degrade to "could not prove equivalent" because the review
+has only head-side compiled SQL.
+
+Evidence:
+
+- `compiled.ts` already supports `target-base/compiled/<project>/<path>`.
+- The demo workflow currently runs only head-side `dbt compile`.
+
+Do not flag:
+
+- A safe CTE rename where base and head compiled SQL are equivalent or where no
+  deterministic structural difference exists.
+
+Expected result:
+
+- The demo workflow compiles base into `target-base` before compiling head.
+- `demo/safe-refactor` no longer emits an unknown semantic-change warning due
+  solely to missing base compiled SQL.
+
+### 3. PII Classification Precision Floor
+
+Risk: high-noise PII comments on ordinary names such as `customer_name` reduce
+trust and make safe PRs look risky.
+
+Evidence:
+
+- `classify_pii` labels `customer_name` as `Name` at 75% confidence in the demo.
+- The stronger demo/security value is email/SSN/phone/payment identifiers and
+  source-propagated sensitive columns.
+
+Do not flag:
+
+- `Name` classification below a high confidence threshold.
+- Pre-existing PII columns not introduced by the PR.
+
+Expected result:
+
+- `demo/safe-refactor` no longer reports `customer_name`.
+- `demo/new-pii-exposure` still reports `email`.
+
+## Warehouse E2E Policy
+
+Use local DuckDB first for fast end-to-end checks, then add BigQuery runs when a
+candidate needs production-style warehouse behavior. Trino, Postgres,
+ClickHouse, and DuckDB can be spawned locally. Snowflake is deferred until a
+separate account is available.
+
+Warehouse-backed findings must include:
+
+- The connection used.
+- Base/head SQL or model relation names.
+- Key columns.
+- Row/value delta summary.
+- A skip/degraded state when credentials or drivers are unavailable.
diff --git a/packages/opencode/src/altimate/review/orchestrate.ts b/packages/opencode/src/altimate/review/orchestrate.ts
index a72778a64..ef546845d 100644
--- a/packages/opencode/src/altimate/review/orchestrate.ts
+++ b/packages/opencode/src/altimate/review/orchestrate.ts
@@ -813,6 +813,7 @@ const STRUCTURAL_CATEGORY: Record<string, ReviewCategory> = {
   coalesce_removed: "semantic_change",
   removed_predicate: "semantic_change",
   type_narrowing: "contract_violation",
+  join_key_regression: "join_risk",
 }
 
 async function structuralChangeLane(ctx: ModelContext, runner: ReviewRunner): Promise<Finding[]> {
@@ -828,7 +829,7 @@ async function structuralChangeLane(ctx: ModelContext, runner: ReviewRunner): Pr
   const out: Finding[] = []
   for (const f of raw) {
     const cat: ReviewCategory = STRUCTURAL_CATEGORY[f.rule] ?? "semantic_change"
-    const sev = clampSeverity(cat, "warning", "high")
+    const sev = clampSeverity(cat, f.severity === "error" ? "critical" : "warning", "high")
     out.push(
       makeFinding({
         severity: sev,
diff --git a/packages/opencode/test/altimate/review.test.ts b/packages/opencode/test/altimate/review.test.ts
index 0faf6a191..881821d06 100644
--- a/packages/opencode/test/altimate/review.test.ts
+++ b/packages/opencode/test/altimate/review.test.ts
@@ -569,6 +569,130 @@ describe("orchestrate", () => {
     ).toBe(true)
   })
 
+  test("core SC010 join-key regression maps to critical join_risk and blocks", async () => {
+    const oldSql = `
+      select *
+      from orders
+      left join customers
+        on orders.customer_id = customers.customer_id
+    `
+    const newSql = `
+      select *
+      from orders
+      left join customers
+        on orders.order_id = customers.customer_id
+    `
+    let sawBase = ""
+    let sawHead = ""
+    const runner: ReviewRunner = {
+      ...fakeRunner({}),
+      async structuralDiff(baseSql, headSql) {
+        sawBase = baseSql
+        sawHead = headSql
+        return [
+          {
+            code: "SC010",
+            rule: "join_key_regression",
+            severity: "error",
+            message:
+              "A join key changed from matching the same identifier stem to `orders.order_id = customers.customer_id`.",
+          },
+        ]
+      },
+    }
+    const env = await runReview({
+      changedFiles: [
+        {
+          path: "models/marts/fct_customer_orders.sql",
+          status: "modified",
+          diff: "+on orders.order_id = customers.customer_id\n-on orders.customer_id = customers.customer_id\n",
+        },
+      ],
+      config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["semantic_change"], ai: false },
+      rubric: DEFAULT_RUBRIC,
+      mode: "gate",
+      runner,
+      getContent: async (_f, side) => (side === "new" ? newSql : oldSql),
+      getCompiled: async (_f, side) => (side === "new" ? newSql : oldSql),
+      generatedAt: "2026-06-08T00:00:00Z",
+    })
+    expect(sawBase).toBe(oldSql)
+    expect(sawHead).toBe(newSql)
+    const f = env.findings.find(
+      (x) => x.evidence?.tool === "altimate_core.structural_diff" && (x.evidence?.result as any)?.code === "SC010",
+    )
+    expect(f).toBeDefined()
+    expect(f!.severity).toBe("critical")
+    expect(f!.category).toBe("join_risk")
+    expect(env.verdict).toBe("REQUEST_CHANGES")
+  })
+
+  test("core structural diff quiet on CTE rename and bridge join countercases", async () => {
+    const runner: ReviewRunner = {
+      ...fakeRunner({}),
+      async structuralDiff() {
+        return []
+      },
+    }
+    const safeEnv = await runReview({
+      changedFiles: [
+        {
+          path: "models/marts/fct_customer_orders.sql",
+          status: "modified",
+          diff:
+            "+on order_records.customer_id = customer_records.customer_id\n" +
+            "-on orders.customer_id = customers.customer_id\n",
+        },
+      ],
+      config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["semantic_change"], ai: false },
+      rubric: DEFAULT_RUBRIC,
+      mode: "gate",
+      runner,
+      getContent: async (_f, side) =>
+        side === "new"
+          ? "select * from order_records left join customer_records on order_records.customer_id = customer_records.customer_id"
+          : "select * from orders left join customers on orders.customer_id = customers.customer_id",
+      getCompiled: async (_f, side) =>
+        side === "new"
+          ? "select * from order_records left join customer_records on order_records.customer_id = customer_records.customer_id"
+          : "select * from orders left join customers on orders.customer_id = customers.customer_id",
+      generatedAt: "2026-06-08T00:00:00Z",
+    })
+    expect(
+      safeEnv.findings.some(
+        (x) => x.evidence?.tool === "altimate_core.structural_diff" && (x.evidence?.result as any)?.code === "SC010",
+      ),
+    ).toBe(false)
+
+    const bridgeEnv = await runReview({
+      changedFiles: [
+        {
+          path: "models/marts/fct_order_items.sql",
+          status: "modified",
+          diff: "+on orders.order_id = order_items.order_id\n-on orders.customer_id = customers.customer_id\n",
+        },
+      ],
+      config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["semantic_change"], ai: false },
+      rubric: DEFAULT_RUBRIC,
+      mode: "gate",
+      runner,
+      getContent: async (_f, side) =>
+        side === "new"
+          ? "select * from orders left join order_items on orders.order_id = order_items.order_id"
+          : "select * from orders left join customers on orders.customer_id = customers.customer_id",
+      getCompiled: async (_f, side) =>
+        side === "new"
+          ? "select * from orders left join order_items on orders.order_id = order_items.order_id"
+          : "select * from orders left join customers on orders.customer_id = customers.customer_id",
+      generatedAt: "2026-06-08T00:00:00Z",
+    })
+    expect(
+      bridgeEnv.findings.some(
+        (x) => x.evidence?.tool === "altimate_core.structural_diff" && (x.evidence?.result as any)?.code === "SC010",
+      ),
+    ).toBe(false)
+  })
+
   test("core L033 portability → regex portability twin suppressed, idempotency concern survives", async () => {
     const sql = "select getdate() as loaded_at from {{ ref('a') }}"
     const files: ChangedFile[] = [{ path: "models/marts/m.sql", status: "modified", diff: "+    , getdate() as loaded_at\n" }]

From b5060e0ce6aaf90e568823bf7f199193aab982a8 Mon Sep 17 00:00:00 2001
From: anandgupta42 <anand@altimate.ai>
Date: Mon, 8 Jun 2026 12:20:38 -0700
Subject: [PATCH 02/11] Reduce safe refactor review noise

---
 docs/docs/usage/dbt-pr-review.md              |  7 ++-
 ...-08-dbt-pr-review-self-improvement-loop.md | 26 +++++++--
 .../src/altimate/native/altimate-core.ts      |  2 +-
 .../opencode/src/altimate/native/types.ts     |  1 +
 .../opencode/src/altimate/review/runner.ts    |  3 +-
 packages/opencode/src/cli/cmd/review.ts       |  2 +-
 .../test/altimate/review-runner.test.ts       | 53 ++++++++++++++++++-
 7 files changed, 85 insertions(+), 9 deletions(-)

diff --git a/docs/docs/usage/dbt-pr-review.md b/docs/docs/usage/dbt-pr-review.md
index e4b63774b..dbf97d4d3 100644
--- a/docs/docs/usage/dbt-pr-review.md
+++ b/docs/docs/usage/dbt-pr-review.md
@@ -332,8 +332,11 @@ reviewer does **not** re-implement Jinja — it consumes dbt's own compiled outp
    from `target/compiled/<project>/…` (HEAD) and `target-base/compiled/…` (BASE),
    produced by `dbt compile`. To enable full equivalence verdicts, compile both
    the base and head refs in CI (the base into `target-base/`, the Recce
-   convention). Without compiled SQL the engine lanes fall back to raw and stay
-   *undecidable* (never fabricated) — the `dbt-patterns` lane still runs.
+   convention). For proof-grade equivalence, also produce `target/catalog.json`
+   with `dbt docs generate`; the catalog carries complete warehouse columns that
+   `manifest.json` often lacks. Without compiled SQL or complete columns the
+   engine lanes stay *undecidable* (never fabricated) — the `dbt-patterns` lane
+   still runs.
 
 ## What it checks — deterministic rule catalog
 
diff --git a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md
index bddce7b35..58c74bfe4 100644
--- a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md
+++ b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md
@@ -208,12 +208,16 @@ Validation:
 ### 2. Base Compiled Artifact Support In Demo CI
 
 Risk: safe refactors degrade to "could not prove equivalent" because the review
-has only head-side compiled SQL.
+has incomplete proof inputs: missing base compiled SQL or incomplete schema
+columns.
 
 Evidence:
 
 - `compiled.ts` already supports `target-base/compiled/<project>/<path>`.
 - The demo workflow currently runs only head-side `dbt compile`.
+- `demo/safe-refactor` became decidable after `dbt docs generate` produced
+  `target/catalog.json`, but core equivalence then falsely compared CTE alias
+  names inside the join filter.
 
 Do not flag:
 
@@ -223,8 +227,24 @@ Do not flag:
 Expected result:
 
 - The demo workflow compiles base into `target-base` before compiling head.
-- `demo/safe-refactor` no longer emits an unknown semantic-change warning due
-  solely to missing base compiled SQL.
+- The demo workflow produces `target/catalog.json` when a warehouse-backed dbt
+  project is available, so equivalence has complete columns.
+- `demo/safe-refactor` emits no findings with local core and DuckDB catalog
+  artifacts.
+
+Implemented result:
+
+- Core `L012 missing_table_alias` now ignores CTE references, removing the
+  `order_records/customer_records` alias noise while preserving real table alias
+  lint.
+- Core equivalence now resolves join predicate columns through CTE aliases to
+  base `table.column` provenance, so CTE alias renames are equivalent but a
+  changed join key remains material.
+- `altimate-code` now threads the detected dialect into
+  `altimate_core.equivalence` and `--no-ai` correctly disables the advisory lane.
+- Local DuckDB safe-refactor validation: `APPROVE`, zero findings.
+- Real-world corpus floor preserved: 15/15 bad cases caught, 0/5 false
+  positives.
 
 ### 3. PII Classification Precision Floor
 
diff --git a/packages/opencode/src/altimate/native/altimate-core.ts b/packages/opencode/src/altimate/native/altimate-core.ts
index a41296ee7..2d58bad38 100644
--- a/packages/opencode/src/altimate/native/altimate-core.ts
+++ b/packages/opencode/src/altimate/native/altimate-core.ts
@@ -320,7 +320,7 @@ export function registerAll(): void {
   register("altimate_core.equivalence", async (params) => {
     try {
       const schema = schemaOrEmpty(params.schema_path, params.schema_context)
-      const raw = await core.checkEquivalence(params.sql1, params.sql2, schema)
+      const raw = await core.checkEquivalence(params.sql1, params.sql2, schema, params.dialect)
       const data = toData(raw)
       return ok(true, data)
     } catch (e) {
diff --git a/packages/opencode/src/altimate/native/types.ts b/packages/opencode/src/altimate/native/types.ts
index 7b47f3068..f3e72e562 100644
--- a/packages/opencode/src/altimate/native/types.ts
+++ b/packages/opencode/src/altimate/native/types.ts
@@ -879,6 +879,7 @@ export interface AltimateCoreTestgenParams {
 export interface AltimateCoreEquivalenceParams {
   sql1: string
   sql2: string
+  dialect?: string
   schema_path?: string
   schema_context?: Record<string, any>
 }
diff --git a/packages/opencode/src/altimate/review/runner.ts b/packages/opencode/src/altimate/review/runner.ts
index 1ffda22e6..f26b6add6 100644
--- a/packages/opencode/src/altimate/review/runner.ts
+++ b/packages/opencode/src/altimate/review/runner.ts
@@ -470,12 +470,13 @@ export function createDispatcherRunner(opts: DispatcherRunnerOptions): ReviewRun
       }
     },
 
-    async equivalence(oldSql: string, newSql: string): Promise<EquivalenceResult> {
+    async equivalence(oldSql: string, newSql: string, dialect?: string): Promise<EquivalenceResult> {
       try {
         const schema = await resolveSchema()
         const res = await Dispatcher.call("altimate_core.equivalence", {
           sql1: oldSql,
           sql2: newSql,
+          dialect,
           schema_context: schema,
         })
         const data = (res.data ?? {}) as Record<string, any>
diff --git a/packages/opencode/src/cli/cmd/review.ts b/packages/opencode/src/cli/cmd/review.ts
index f3bd9094f..7b594025c 100644
--- a/packages/opencode/src/cli/cmd/review.ts
+++ b/packages/opencode/src/cli/cmd/review.ts
@@ -58,7 +58,7 @@ export const ReviewCommand = cmd({
         manifestPath: args.manifest as string | undefined,
         mode: args.mode as ReviewMode | undefined,
         severityThreshold: args.severity as Severity | undefined,
-        noAi: args.ai === false,
+        noAi: args.noAi === true || args.ai === false,
       })
 
       if (args.output) await fs.writeFile(args.output as string, JSON.stringify(env, null, 2))
diff --git a/packages/opencode/test/altimate/review-runner.test.ts b/packages/opencode/test/altimate/review-runner.test.ts
index f055149fb..40e38c8ad 100644
--- a/packages/opencode/test/altimate/review-runner.test.ts
+++ b/packages/opencode/test/altimate/review-runner.test.ts
@@ -1,9 +1,17 @@
-import { describe, expect, test } from "bun:test"
+import { afterEach, describe, expect, spyOn, test } from "bun:test"
 import { writeFileSync } from "node:fs"
 import path from "node:path"
 import { createDispatcherRunner } from "../../src/altimate/review/runner"
+import { Dispatcher } from "../../src/altimate/native"
 import { tmpdir } from "../fixture/fixture"
 
+let dispatcherSpy: ReturnType<typeof spyOn> | undefined
+
+afterEach(() => {
+  dispatcherSpy?.mockRestore()
+  dispatcherSpy = undefined
+})
+
 describe("review manifest loading", () => {
   test("loads a valid manifest without initializing the native dispatcher", async () => {
     await using tmp = await tmpdir()
@@ -36,4 +44,47 @@ describe("review manifest loading", () => {
       testCount: 0,
     })
   })
+
+  test("threads adapter dialect into core equivalence", async () => {
+    await using tmp = await tmpdir()
+    const manifestPath = path.join(tmp.path, "manifest.json")
+    writeFileSync(
+      manifestPath,
+      JSON.stringify({
+        metadata: { adapter_type: "duckdb" },
+        nodes: {
+          "model.demo.orders": {
+            resource_type: "model",
+            name: "orders",
+            original_file_path: "models/orders.sql",
+            config: { materialized: "table" },
+            depends_on: { nodes: [] },
+            columns: { id: { name: "id", data_type: "integer" } },
+          },
+        },
+        sources: {},
+      }),
+    )
+
+    let seenParams: any
+    dispatcherSpy = spyOn(Dispatcher, "call").mockImplementation((async (method: string, params: any) => {
+      expect(method).toBe("altimate_core.equivalence")
+      seenParams = params
+      return {
+        success: true,
+        data: {
+          equivalent: true,
+          confidence: 0.95,
+          differences: [],
+          validation_errors: [],
+        },
+      }
+    }) as any)
+
+    const runner = createDispatcherRunner({ manifestPath })
+    const result = await runner.equivalence("select id from orders", "select id from orders", "duckdb")
+
+    expect(result).toEqual({ decided: true, equivalent: true, differences: [], confidence: "high" })
+    expect(seenParams.dialect).toBe("duckdb")
+  })
 })

From 94b452655533f82e8936b8058bd9c2fea141d59d Mon Sep 17 00:00:00 2001
From: anandgupta42 <anand@altimate.ai>
Date: Mon, 8 Jun 2026 12:26:24 -0700
Subject: [PATCH 03/11] Tighten PII review precision

---
 ...-08-dbt-pr-review-self-improvement-loop.md | 18 +++++
 .../src/altimate/review/orchestrate.ts        | 34 +++++++--
 .../opencode/test/altimate/review.test.ts     | 73 +++++++++++++++++++
 3 files changed, 119 insertions(+), 6 deletions(-)

diff --git a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md
index 58c74bfe4..00f6ef4e2 100644
--- a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md
+++ b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md
@@ -267,6 +267,24 @@ Expected result:
 - `demo/safe-refactor` no longer reports `customer_name`.
 - `demo/new-pii-exposure` still reports `email`.
 
+Implemented result:
+
+- Diff-scoped core PII classification is the authoritative PR-review PII
+  comment when lineage/classification are available.
+- High-confidence non-low-risk PII introduced into `marts/` or `reporting/`
+  is critical and blockable from `altimate_core.classify_pii` evidence.
+- Low-risk `Name`/`Address` classifications require at least 90% confidence to
+  surface, so `first_name`/`customer_name`-style weak signals do not create
+  noisy comments.
+- Fallback `dbt-patterns`/`rule-catalog` PII twins are suppressed for files
+  where the diff-scoped core classifier ran; fallback remains available when
+  core lineage/classification is unavailable.
+- `demo/new-pii-exposure` now reports one critical core PII finding for
+  `email` plus the core equivalence warning, and still returns
+  `REQUEST_CHANGES`.
+- Real-world corpus floor preserved: 15/15 bad cases caught, 0/5 false
+  positives.
+
 ## Warehouse E2E Policy
 
 Use local DuckDB first for fast end-to-end checks, then add BigQuery runs when a
diff --git a/packages/opencode/src/altimate/review/orchestrate.ts b/packages/opencode/src/altimate/review/orchestrate.ts
index ef546845d..8ddb5bedf 100644
--- a/packages/opencode/src/altimate/review/orchestrate.ts
+++ b/packages/opencode/src/altimate/review/orchestrate.ts
@@ -933,12 +933,18 @@ async function piiClassifyLane(ctx: ModelContext, runner: ReviewRunner, dialect:
   if (!newCols.length) return []
   const pii = await runner.classifyPii(newCols)
   const out: Finding[] = []
+  const highExposureLayer = /(^|\/)(marts|reporting)\//.test(file.path)
   for (const p of pii) {
-    if (p.confidence < 0.7 || !p.column) continue
+    const classification = String(p.classification ?? "")
+    const lowRiskClass = /^(name|address)$/i.test(classification)
+    const minConfidence = lowRiskClass ? 0.9 : 0.7
+    if (p.confidence < minConfidence || !p.column) continue
     const col = p.column.toLowerCase()
+    const highConfidence = p.confidence >= 0.9
+    const severity: Severity = highExposureLayer && highConfidence && !lowRiskClass ? "critical" : "warning"
     out.push(
       makeFinding({
-        severity: "warning",
+        severity: clampSeverity("pii_exposure", severity, highConfidence ? "high" : "medium"),
         category: "pii_exposure",
         title: `${model}: exposes ${p.classification} column \`${p.column}\``,
         body:
@@ -947,7 +953,7 @@ async function piiClassifyLane(ctx: ModelContext, runner: ReviewRunner, dialect:
           (p.masking ? `; suggested masking: \`${p.masking}\`.` : "."),
         file: file.path,
         model,
-        confidence: p.confidence >= 0.9 ? "high" : "medium",
+        confidence: highConfidence ? "high" : "medium",
         evidence: { tool: "altimate_core.classify_pii", result: p },
         ruleKey: `pii_exposure:${col}`,
       }),
@@ -1071,7 +1077,8 @@ export async function runReview(input: OrchestrateInput): Promise<VerdictEnvelop
       tasks.push(columnBreakageLane(ctx, input.runner, getCompiled, dialect))
     }
     if (lanes.has("test_coverage")) tasks.push(missingGrainTestLane(ctx, input.runner))
-    if (lanes.has("pii_exposure")) all.push(piiLane(ctx.file, ctx.pii))
+    const hasDiffScopedPii = !!input.runner.classifyPii && !!input.runner.columnLineage
+    if (lanes.has("pii_exposure") && !hasDiffScopedPii) all.push(piiLane(ctx.file, ctx.pii))
     // PII classification always runs (cheap name-pattern check, diff-scoped to
     // newly-introduced columns) — exposing PII is a hard-floor concern that
     // shouldn't depend on the cost tier enabling the pii_exposure lane.
@@ -1139,6 +1146,12 @@ export async function runReview(input: OrchestrateInput): Promise<VerdictEnvelop
   // regex still covers functions core didn't flag (dialect-suppressed / not in
   // core's set) plus reserved words / types / operators that L033 doesn't do.
   const flat = all.flat()
+  const diffScopedPiiFiles = new Set<string>()
+  if (input.runner.classifyPii && input.runner.columnLineage) {
+    for (const ctx of ctxByPath.values()) {
+      if (ctx.engineNewSql && ctx.file.status !== "deleted") diffScopedPiiFiles.add(ctx.file.path)
+    }
+  }
   // Per file, the concatenated rule names core's AST lint actually emitted —
   // so we suppress a regex twin only when core genuinely covered that concern.
   const coreRulesByFile = new Map<string, string>()
@@ -1161,13 +1174,22 @@ export async function runReview(input: OrchestrateInput): Promise<VerdictEnvelop
     }
   }
   const merged = flat.filter((f) => {
-    if (!coreRanFiles.has(f.file)) return true
     const tool = f.evidence?.tool
+    const fallbackRule = String((f.evidence?.result as any)?.rule ?? "").replace(/_/g, "-")
+    if (
+      (tool === "dbt-patterns" || tool === "rule-catalog") &&
+      diffScopedPiiFiles.has(f.file) &&
+      f.category === "pii_exposure" &&
+      (fallbackRule === "pii-into-mart" || fallbackRule === "select-pii-columns")
+    ) {
+      return false
+    }
+    if (!coreRanFiles.has(f.file)) return true
     if (tool !== "dbt-patterns" && tool !== "rule-catalog") return true
     // Structural twins: the check code survives on evidence.result.rule (ruleKey
     // is stripped by the zod schema). Suppress ONLY if core actually emitted the
     // equivalent rule for this file (else the issue would fall through).
-    const code = String((f.evidence?.result as any)?.rule ?? "").replace(/_/g, "-")
+    const code = fallbackRule
     const coreMatcher = CORE_AST_COVERED[code]
     if (coreMatcher && coreMatcher.test(coreRulesByFile.get(f.file) ?? "")) return false
     // Portability twins: drop a regex finding ONLY IF it is itself a portability
diff --git a/packages/opencode/test/altimate/review.test.ts b/packages/opencode/test/altimate/review.test.ts
index 881821d06..036687969 100644
--- a/packages/opencode/test/altimate/review.test.ts
+++ b/packages/opencode/test/altimate/review.test.ts
@@ -1333,6 +1333,79 @@ describe("orchestrate", () => {
     expect(env.verdict).toBe("REQUEST_CHANGES")
   })
 
+  test("diff-scoped core PII is the single blocker for newly introduced mart email", async () => {
+    const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+    email\n" }]
+    const runner: ReviewRunner = {
+      ...fakeRunner({}),
+      async detectPii() {
+        return { columns: ["email"] }
+      },
+      async columnLineage(sql) {
+        return sql.includes("email")
+          ? [
+              { source: "customers.customer_name", target: "customer_name" },
+              { source: "customers.email", target: "email" },
+            ]
+          : [{ source: "customers.customer_name", target: "customer_name" }]
+      },
+      async classifyPii(columns) {
+        return columns.map((column) => ({
+          column,
+          classification: "Email",
+          confidence: 0.95,
+          masking: "'***MASKED***'",
+        }))
+      },
+    }
+    const env = await runReview({
+      changedFiles: files,
+      config: { ...DEFAULT_REVIEW_CONFIG },
+      rubric: DEFAULT_RUBRIC,
+      mode: "gate",
+      runner,
+      getContent: content("select customer_name, email from customers", "select customer_name from customers"),
+    })
+    const pii = env.findings.filter((f) => f.category === "pii_exposure")
+    expect(pii).toHaveLength(1)
+    expect(pii[0].severity).toBe("critical")
+    expect(pii[0].evidence?.tool).toBe("altimate_core.classify_pii")
+    expect(env.verdict).toBe("REQUEST_CHANGES")
+  })
+
+  test("low-confidence name PII does not surface or fall back to regex PII comments", async () => {
+    const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+    first_name\n" }]
+    const runner: ReviewRunner = {
+      ...fakeRunner({}),
+      async detectPii() {
+        return { columns: ["first_name"] }
+      },
+      async columnLineage(sql) {
+        return sql.includes("first_name")
+          ? [
+              { source: "customers.customer_id", target: "customer_id" },
+              { source: "customers.first_name", target: "first_name" },
+            ]
+          : [{ source: "customers.customer_id", target: "customer_id" }]
+      },
+      async classifyPii(columns) {
+        return columns.map((column) => ({
+          column,
+          classification: "Name",
+          confidence: 0.85,
+        }))
+      },
+    }
+    const env = await runReview({
+      changedFiles: files,
+      config: { ...DEFAULT_REVIEW_CONFIG },
+      rubric: DEFAULT_RUBRIC,
+      mode: "gate",
+      runner,
+      getContent: content("select customer_id, first_name from customers", "select customer_id from customers"),
+    })
+    expect(env.findings.some((f) => f.category === "pii_exposure")).toBe(false)
+  })
+
   test("clean change with no manifest → degraded, APPROVE/COMMENT, lint-only labeled", async () => {
     const files: ChangedFile[] = [{ path: "models/staging/stg_x.sql", status: "modified", diff: "+select 1\n" }]
     const runner: ReviewRunner = {

From 1b050926f9c51d5d67a96408ca9fe68d54eebec2 Mon Sep 17 00:00:00 2001
From: anandgupta42 <anand@altimate.ai>
Date: Mon, 8 Jun 2026 12:32:56 -0700
Subject: [PATCH 04/11] Add DuckDB data diff e2e coverage

---
 ...-08-dbt-pr-review-self-improvement-loop.md | 37 ++++++++++++++++
 packages/drivers/src/duckdb.ts                | 37 +++++++++-------
 .../altimate/data-diff-duckdb-e2e.test.ts     | 44 +++++++++++++++++++
 3 files changed, 101 insertions(+), 17 deletions(-)
 create mode 100644 packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts

diff --git a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md
index 00f6ef4e2..841ae5441 100644
--- a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md
+++ b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md
@@ -285,6 +285,43 @@ Implemented result:
 - Real-world corpus floor preserved: 15/15 bad cases caught, 0/5 false
   positives.
 
+## Current Acceptance Checkpoint
+
+Demo matrix with local core, base artifacts in `target-base`, head artifacts in
+`target`, and AI disabled:
+
+| Branch | Expected | Actual | Deterministic evidence |
+|---|---:|---:|---|
+| `demo/safe-refactor` | `APPROVE` | `APPROVE` | no findings |
+| `demo/join-key-breakage` | `REQUEST_CHANGES` | `REQUEST_CHANGES` | `altimate_core.structural_diff:SC010`, core equivalence |
+| `demo/test-removal` | `COMMENT` | `COMMENT` | `dbt-patterns:removed_tests` |
+| `demo/new-pii-exposure` | `REQUEST_CHANGES` | `REQUEST_CHANGES` | `altimate_core.classify_pii`, core equivalence |
+| `demo/mart-select-star` | `COMMENT` | `COMMENT` | core equivalence warning |
+| `demo/incremental-without-guard` | `COMMENT` | `COMMENT` | core dbt config lint + deterministic catalog cost notes |
+
+DuckDB e2e proof:
+
+- Branch: `demo/join-key-breakage`.
+- `dbt build --profiles-dir . --target dev`: passed `PASS=14 WARN=0 ERROR=0`.
+- Reviewer: `REQUEST_CHANGES` with critical `join_risk` from
+  `altimate_core.structural_diff` rule `join_key_regression` (`SC010`).
+
+Warehouse data-diff e2e proof:
+
+- Test: `packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts`.
+- Connection: local DuckDB via `@altimateai/drivers/duckdb`.
+- Base/head relations: `base_orders` and `head_orders`.
+- Key columns: `order_id`; compared value column: `amount`.
+- Observed warehouse delta: one row only in head and one updated value.
+- Validation:
+  `ALTIMATE_RUN_WAREHOUSE_E2E=1 bun test --timeout 30000 test/altimate/data-diff-duckdb-e2e.test.ts`.
+- Default fast-loop behavior:
+  `bun test --timeout 30000 test/altimate/data-diff-duckdb-e2e.test.ts`
+  skips unless `ALTIMATE_RUN_WAREHOUSE_E2E=1` is set.
+- Implementation note: fixed the DuckDB connector to use the two-argument
+  constructor when no open options are required; Bun's native binding can miss
+  the callback when `undefined` is passed as the second argument.
+
 ## Warehouse E2E Policy
 
 Use local DuckDB first for fast end-to-end checks, then add BigQuery runs when a
diff --git a/packages/drivers/src/duckdb.ts b/packages/drivers/src/duckdb.ts
index 3ccca467a..d177adc93 100644
--- a/packages/drivers/src/duckdb.ts
+++ b/packages/drivers/src/duckdb.ts
@@ -56,26 +56,29 @@ export async function connect(config: ConnectionConfig): Promise<Connector> {
         new Promise<any>((resolve, reject) => {
           let resolved = false
           let timeout: ReturnType<typeof setTimeout> | undefined
+          let instance: any
           const opts = accessMode ? { access_mode: accessMode } : undefined
-          const instance = new duckdb.Database(
-            dbPath,
-            opts,
-            (err: Error | null) => {
-              if (resolved) { if (instance && typeof instance.close === "function") instance.close(); return }
-              resolved = true
-              if (timeout) clearTimeout(timeout)
-              if (err) {
-                const msg = err.message || String(err)
-                if (msg.toLowerCase().includes("locked") || msg.includes("SQLITE_BUSY") || msg.includes("DUCKDB_LOCKED")) {
-                  reject(new Error("DUCKDB_LOCKED"))
-                } else {
-                  reject(err)
-                }
+          const onOpen = (err: Error | null) => {
+            if (resolved) {
+              if (instance && typeof instance.close === "function") instance.close()
+              return
+            }
+            resolved = true
+            if (timeout) clearTimeout(timeout)
+            if (err) {
+              const msg = err.message || String(err)
+              if (msg.toLowerCase().includes("locked") || msg.includes("SQLITE_BUSY") || msg.includes("DUCKDB_LOCKED")) {
+                reject(new Error("DUCKDB_LOCKED"))
               } else {
-                resolve(instance)
+                reject(err)
               }
-            },
-          )
+            } else {
+              resolve(instance)
+            }
+          }
+          instance = opts
+            ? new duckdb.Database(dbPath, opts, onOpen)
+            : new duckdb.Database(dbPath, onOpen)
           // Bun: native callback may not fire; fall back after 2s
           timeout = setTimeout(() => {
             if (!resolved) {
diff --git a/packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts b/packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts
new file mode 100644
index 000000000..3da67fec9
--- /dev/null
+++ b/packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts
@@ -0,0 +1,44 @@
+import { afterAll, beforeAll, describe, expect, test } from "bun:test"
+
+import { runDataDiff } from "../../src/altimate/native/connections/data-diff"
+import * as Registry from "../../src/altimate/native/connections/registry"
+
+const RUN = process.env.ALTIMATE_RUN_WAREHOUSE_E2E === "1"
+const e2eTest = RUN ? test : test.skip
+
+describe("data.diff DuckDB e2e", () => {
+  beforeAll(() => {
+    process.env.ALTIMATE_TELEMETRY_DISABLED = "true"
+  })
+
+  afterAll(() => {
+    delete process.env.ALTIMATE_TELEMETRY_DISABLED
+    Registry.reset()
+  })
+
+  e2eTest("detects value and row deltas through a real DuckDB warehouse", async () => {
+    Registry.reset()
+    Registry.setConfigs({ duck_e2e: { type: "duckdb", path: ":memory:" } })
+    const conn = await Registry.get("duck_e2e")
+
+    await conn.execute("CREATE TABLE base_orders (order_id INTEGER, amount INTEGER)")
+    await conn.execute("CREATE TABLE head_orders (order_id INTEGER, amount INTEGER)")
+    await conn.execute("INSERT INTO base_orders VALUES (1, 100), (2, 200)")
+    await conn.execute("INSERT INTO head_orders VALUES (1, 100), (2, 250), (3, 300)")
+
+    const result = await runDataDiff({
+      source: "base_orders",
+      target: "head_orders",
+      key_columns: ["order_id"],
+      extra_columns: ["amount"],
+      source_warehouse: "duck_e2e",
+      target_warehouse: "duck_e2e",
+      algorithm: "joindiff",
+    })
+
+    expect(result.success).toBe(true)
+    expect((result.outcome as any)?.mode).toBe("diff")
+    expect((result.outcome as any)?.stats?.exclusive_table2).toBe(1)
+    expect((result.outcome as any)?.stats?.updated).toBe(1)
+  })
+})

From 49fc784cd725c7e584a41b94607c2c180b5a066e Mon Sep 17 00:00:00 2001
From: anandgupta42 <anand@altimate.ai>
Date: Tue, 9 Jun 2026 17:51:48 -0700
Subject: [PATCH 05/11] fix: address dbt review parity comments

---
 docs/docs/usage/dbt-pr-review.md              |  6 +-
 ...-08-dbt-pr-review-self-improvement-loop.md |  8 +--
 packages/drivers/src/duckdb.ts                |  6 ++
 packages/drivers/test/driver-security.test.ts | 56 +++++++++++++++----
 .../src/altimate/review/orchestrate.ts        | 21 ++++++-
 .../opencode/test/altimate/review.test.ts     | 38 +++++++++++++
 6 files changed, 115 insertions(+), 20 deletions(-)

diff --git a/docs/docs/usage/dbt-pr-review.md b/docs/docs/usage/dbt-pr-review.md
index dbf97d4d3..e23cd8ddd 100644
--- a/docs/docs/usage/dbt-pr-review.md
+++ b/docs/docs/usage/dbt-pr-review.md
@@ -334,9 +334,9 @@ reviewer does **not** re-implement Jinja — it consumes dbt's own compiled outp
    the base and head refs in CI (the base into `target-base/`, the Recce
    convention). For proof-grade equivalence, also produce `target/catalog.json`
    with `dbt docs generate`; the catalog carries complete warehouse columns that
-   `manifest.json` often lacks. Without compiled SQL or complete columns the
-   engine lanes stay *undecidable* (never fabricated) — the `dbt-patterns` lane
-   still runs.
+   `manifest.json` often lacks. Without compiled SQL or complete columns,
+   schema-dependent engine lanes such as equivalence and PII stay *undecidable*
+   (never fabricated) — the `dbt-patterns` lane still runs.
 
 ## What it checks — deterministic rule catalog
 
diff --git a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md
index 841ae5441..8823d2da6 100644
--- a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md
+++ b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md
@@ -35,8 +35,8 @@ Quantified acceptance target:
   warehouse-observed impact, not advisory AI or heuristic equivalence.
 
 Operating rules:
-- Always start by pulling latest main of
-  /Users/anandgupta/codebase/altimate-core-internal with fast-forward only.
+- Always start by pulling latest main of the sibling `altimate-core-internal`
+  checkout with fast-forward only.
 - Prefer implementing deterministic SQL/dbt analysis in altimate-core, not
   altimate-code. Avoid regex/custom parsing in altimate-code when core
   AST/parser support can own it.
@@ -52,8 +52,8 @@ Operating rules:
      only at acceptance checkpoints.
   6. Keep only if it improves real review value with no new false positives;
      otherwise revert/reject or make advisory-only.
-- Use /Users/anandgupta/codebase/altimate-code/demo/dbt-pr-review-demo for demo
-  branch e2e.
+- Use the `demo/dbt-pr-review-demo` checkout under the altimate-code workspace
+  for demo branch e2e.
 - Compile base dbt artifacts into target-base and head artifacts into target
   before review when structural/equivalence lanes need base-vs-head compiled SQL.
 - Use local DuckDB first; use BigQuery where warehouse behavior matters and
diff --git a/packages/drivers/src/duckdb.ts b/packages/drivers/src/duckdb.ts
index d177adc93..1aa2fdba7 100644
--- a/packages/drivers/src/duckdb.ts
+++ b/packages/drivers/src/duckdb.ts
@@ -57,8 +57,13 @@ export async function connect(config: ConnectionConfig): Promise<Connector> {
           let resolved = false
           let timeout: ReturnType<typeof setTimeout> | undefined
           let instance: any
+          let pendingOpen: Error | null | undefined
           const opts = accessMode ? { access_mode: accessMode } : undefined
           const onOpen = (err: Error | null) => {
+            if (!instance) {
+              pendingOpen = err
+              return
+            }
             if (resolved) {
               if (instance && typeof instance.close === "function") instance.close()
               return
@@ -79,6 +84,7 @@ export async function connect(config: ConnectionConfig): Promise<Connector> {
           instance = opts
             ? new duckdb.Database(dbPath, opts, onOpen)
             : new duckdb.Database(dbPath, onOpen)
+          if (pendingOpen !== undefined) onOpen(pendingOpen)
           // Bun: native callback may not fire; fall back after 2s
           timeout = setTimeout(() => {
             if (!resolved) {
diff --git a/packages/drivers/test/driver-security.test.ts b/packages/drivers/test/driver-security.test.ts
index 1fd4a9435..a590235ff 100644
--- a/packages/drivers/test/driver-security.test.ts
+++ b/packages/drivers/test/driver-security.test.ts
@@ -12,6 +12,10 @@ import { describe, test, expect, mock, beforeEach, afterEach } from "bun:test"
 // DuckDB: wrapDuckDBError + lock retry
 // ---------------------------------------------------------------------------
 describe("DuckDB driver", () => {
+  function openCallback(optsOrCb: any, cb?: (err: Error | null) => void): (err: Error | null) => void {
+    return typeof optsOrCb === "function" ? optsOrCb : cb!
+  }
+
   // Test wrapDuckDBError logic inline (it's a closure, so we test via connect behavior)
   describe("wrapDuckDBError", () => {
     test("wraps SQLITE_BUSY errors with user-friendly message", async () => {
@@ -28,8 +32,8 @@ describe("DuckDB driver", () => {
       mock.module("duckdb", () => ({
         default: {
           Database: class {
-            constructor(_path: string, _opts: any, cb: (err: Error | null) => void) {
-              setTimeout(() => cb(null), 0)
+            constructor(_path: string, optsOrCb: any, cb?: (err: Error | null) => void) {
+              setTimeout(() => openCallback(optsOrCb, cb)(null), 0)
             }
             connect() {
               return mockDb.connect()
@@ -62,8 +66,8 @@ describe("DuckDB driver", () => {
       mock.module("duckdb", () => ({
         default: {
           Database: class {
-            constructor(_path: string, _opts: any, cb: (err: Error | null) => void) {
-              setTimeout(() => cb(null), 0)
+            constructor(_path: string, optsOrCb: any, cb?: (err: Error | null) => void) {
+              setTimeout(() => openCallback(optsOrCb, cb)(null), 0)
             }
             connect() {
               return {
@@ -97,8 +101,8 @@ describe("DuckDB driver", () => {
       mock.module("duckdb", () => ({
         default: {
           Database: class {
-            constructor(_path: string, _opts: any, cb: (err: Error | null) => void) {
-              setTimeout(() => cb(null), 0)
+            constructor(_path: string, optsOrCb: any, cb?: (err: Error | null) => void) {
+              setTimeout(() => openCallback(optsOrCb, cb)(null), 0)
             }
             connect() {
               return {
@@ -135,14 +139,16 @@ describe("DuckDB driver", () => {
       mock.module("duckdb", () => ({
         default: {
           Database: class {
-            constructor(_path: string, opts: any, cb: (err: Error | null) => void) {
+            constructor(_path: string, optsOrCb: any, cb?: (err: Error | null) => void) {
+              const opts = typeof optsOrCb === "function" ? undefined : optsOrCb
+              const done = openCallback(optsOrCb, cb)
               connectAttempts++
               if (connectAttempts === 1 && !opts?.access_mode) {
                 // First attempt fails with lock error
-                setTimeout(() => cb(new Error("DUCKDB_LOCKED: file is locked")), 0)
+                setTimeout(() => done(new Error("DUCKDB_LOCKED: file is locked")), 0)
               } else {
                 // READ_ONLY retry succeeds
-                setTimeout(() => cb(null), 0)
+                setTimeout(() => done(null), 0)
               }
             }
             connect() {
@@ -171,8 +177,8 @@ describe("DuckDB driver", () => {
       mock.module("duckdb", () => ({
         default: {
           Database: class {
-            constructor(_path: string, _opts: any, cb: (err: Error | null) => void) {
-              setTimeout(() => cb(new Error("DUCKDB_LOCKED")), 0)
+            constructor(_path: string, optsOrCb: any, cb?: (err: Error | null) => void) {
+              setTimeout(() => openCallback(optsOrCb, cb)(new Error("DUCKDB_LOCKED")), 0)
             }
             connect() {
               return {}
@@ -195,6 +201,34 @@ describe("DuckDB driver", () => {
         expect(e.message).toBe("DUCKDB_LOCKED")
       }
     })
+
+    test("handles native open callback invoked synchronously", async () => {
+      mock.module("duckdb", () => ({
+        default: {
+          Database: class {
+            constructor(_path: string, optsOrCb: any, cb?: (err: Error | null) => void) {
+              openCallback(optsOrCb, cb)(null)
+            }
+            connect() {
+              return {
+                all: (_sql: string, cb: (err: Error | null, rows: any[]) => void) => {
+                  cb(null, [{ ok: 1 }])
+                },
+              }
+            }
+            close(cb: any) {
+              if (cb) cb(null)
+            }
+          },
+        },
+      }))
+
+      const { connect } = await import("../src/duckdb")
+      const connector = await connect({ type: "duckdb", path: ":memory:" })
+      await connector.connect()
+      expect(await connector.execute("SELECT 1")).toMatchObject({ columns: ["ok"], rows: [[1]], row_count: 1 })
+      await connector.close()
+    })
   })
 })
 
diff --git a/packages/opencode/src/altimate/review/orchestrate.ts b/packages/opencode/src/altimate/review/orchestrate.ts
index 8ddb5bedf..e5341faea 100644
--- a/packages/opencode/src/altimate/review/orchestrate.ts
+++ b/packages/opencode/src/altimate/review/orchestrate.ts
@@ -962,6 +962,15 @@ async function piiClassifyLane(ctx: ModelContext, runner: ReviewRunner, dialect:
   return out
 }
 
+async function introducedOutputColumns(ctx: ModelContext, runner: ReviewRunner, dialect: string): Promise<Set<string> | undefined> {
+  if (!runner.columnLineage || !ctx.engineNewSql || ctx.file.status === "deleted") return undefined
+  const headCols = [...new Set((await runner.columnLineage(ctx.engineNewSql, dialect)).map((e) => e.target).filter(Boolean))]
+  if (!headCols.length) return new Set()
+  if (!ctx.engineOldSql) return new Set(headCols.map((c) => c.toLowerCase()))
+  const baseCols = new Set((await runner.columnLineage(ctx.engineOldSql, dialect)).map((e) => e.target.toLowerCase()))
+  return new Set(headCols.filter((c) => !baseCols.has(c.toLowerCase())).map((c) => c.toLowerCase()))
+}
+
 function piiLane(file: ChangedFile & { kind: string }, columns: string[]): Finding[] {
   if (file.status === "deleted" || !columns.length) return []
   const model = modelNameFromPath(file.path)
@@ -1077,8 +1086,16 @@ export async function runReview(input: OrchestrateInput): Promise<VerdictEnvelop
       tasks.push(columnBreakageLane(ctx, input.runner, getCompiled, dialect))
     }
     if (lanes.has("test_coverage")) tasks.push(missingGrainTestLane(ctx, input.runner))
-    const hasDiffScopedPii = !!input.runner.classifyPii && !!input.runner.columnLineage
-    if (lanes.has("pii_exposure") && !hasDiffScopedPii) all.push(piiLane(ctx.file, ctx.pii))
+    const diffScopedPiiCols =
+      input.runner.classifyPii && input.runner.columnLineage
+        ? await introducedOutputColumns(ctx, input.runner, dialect)
+        : undefined
+    if (lanes.has("pii_exposure")) {
+      const fallbackPii = diffScopedPiiCols
+        ? ctx.pii.filter((col) => !diffScopedPiiCols.has(col.toLowerCase()))
+        : ctx.pii
+      all.push(piiLane(ctx.file, fallbackPii))
+    }
     // PII classification always runs (cheap name-pattern check, diff-scoped to
     // newly-introduced columns) — exposing PII is a hard-floor concern that
     // shouldn't depend on the cost tier enabling the pii_exposure lane.
diff --git a/packages/opencode/test/altimate/review.test.ts b/packages/opencode/test/altimate/review.test.ts
index 036687969..a6c06fd1c 100644
--- a/packages/opencode/test/altimate/review.test.ts
+++ b/packages/opencode/test/altimate/review.test.ts
@@ -1372,6 +1372,44 @@ describe("orchestrate", () => {
     expect(env.verdict).toBe("REQUEST_CHANGES")
   })
 
+  test("broad PII detector remains fallback for existing output columns", async () => {
+    const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+    lower(email) as email\n" }]
+    const runner: ReviewRunner = {
+      ...fakeRunner({}),
+      async detectPii() {
+        return { columns: ["email"] }
+      },
+      async columnLineage(sql) {
+        return sql.includes("email")
+          ? [
+              { source: "customers.customer_id", target: "customer_id" },
+              { source: "customers.email", target: "email" },
+            ]
+          : [{ source: "customers.customer_id", target: "customer_id" }]
+      },
+      async classifyPii(columns) {
+        return columns.map((column) => ({
+          column,
+          classification: "Email",
+          confidence: 0.95,
+          masking: "'***MASKED***'",
+        }))
+      },
+    }
+    const env = await runReview({
+      changedFiles: files,
+      config: { ...DEFAULT_REVIEW_CONFIG },
+      rubric: DEFAULT_RUBRIC,
+      mode: "gate",
+      runner,
+      getContent: content("select customer_id, lower(email) as email from customers", "select customer_id, email from customers"),
+    })
+    const pii = env.findings.filter((f) => f.category === "pii_exposure")
+    expect(pii).toHaveLength(1)
+    expect(pii[0].evidence?.tool).toBe("schema.detect_pii")
+    expect(env.verdict).toBe("REQUEST_CHANGES")
+  })
+
   test("low-confidence name PII does not surface or fall back to regex PII comments", async () => {
     const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+    first_name\n" }]
     const runner: ReviewRunner = {

From 21e5adff70675e670f7ae9804e48f4bbb6bc8165 Mon Sep 17 00:00:00 2001
From: anandgupta42 <anand@altimate.ai>
Date: Tue, 9 Jun 2026 18:06:25 -0700
Subject: [PATCH 06/11] chore: remove non-functional planning docs from dbt
 review PR

---
 ...-08-dbt-pr-review-self-improvement-loop.md | 338 ------------------
 1 file changed, 338 deletions(-)
 delete mode 100644 docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md

diff --git a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md b/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md
deleted file mode 100644
index 8823d2da6..000000000
--- a/docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md
+++ /dev/null
@@ -1,338 +0,0 @@
-# dbt PR Review Self-Improvement Loop
-
-Date: 2026-06-08
-
-## Goal
-
-Make `altimate review` the highest-signal dbt PR reviewer by adding only checks
-that survive an end-to-end value test:
-
-1. The finding is backed by deterministic evidence, warehouse execution, or a
-   clearly bounded advisory rule.
-2. It catches a real dbt PR risk in the public demo or sourced benchmark corpus.
-3. It does not add false positives to corrected/safe cases.
-4. It produces a review comment that a data engineer would act on.
-5. It degrades loudly when required artifacts, schema, or warehouse access are
-   missing.
-
-## Recommended `/goal` Command
-
-```text
-/goal Objective: make the dbt PR reviewer demo-parity reliable and measurably
-higher-signal by closing the current public-demo gaps while preserving the
-existing benchmark floor.
-
-Quantified acceptance target:
-- Public demo matrix reaches expected verdicts for all 6 demo branches:
-  safe-refactor=APPROVE, join-key-breakage=REQUEST_CHANGES,
-  test-removal=COMMENT, new-pii-exposure=REQUEST_CHANGES,
-  mart-select-star=COMMENT, incremental-without-guard=COMMENT.
-- Real-world corpus remains at 15/15 caught bad cases and 0/5 false positives
-  on corrected cases.
-- At least one warehouse-backed DuckDB e2e proves a real dbt build/test pass
-  that the reviewer catches pre-merge.
-- Every blocking finding is produced by deterministic core evidence or
-  warehouse-observed impact, not advisory AI or heuristic equivalence.
-
-Operating rules:
-- Always start by pulling latest main of the sibling `altimate-core-internal`
-  checkout with fast-forward only.
-- Prefer implementing deterministic SQL/dbt analysis in altimate-core, not
-  altimate-code. Avoid regex/custom parsing in altimate-code when core
-  AST/parser support can own it.
-- altimate-code should mostly orchestrate, map core findings to review
-  categories/severity, format verdicts, and run GitHub/CI integration.
-- For every candidate improvement:
-  1. Write a short spec: risk, deterministic evidence, what must not be flagged.
-  2. Add bad and safe/corrected fixtures.
-  3. Implement in altimate-core when possible.
-  4. Build local core Node package and link altimate-code to it only when e2e
-     needs the local native package.
-  5. Validate with the narrowest useful tests first, then corpus/demo/warehouse
-     only at acceptance checkpoints.
-  6. Keep only if it improves real review value with no new false positives;
-     otherwise revert/reject or make advisory-only.
-- Use the `demo/dbt-pr-review-demo` checkout under the altimate-code workspace
-  for demo branch e2e.
-- Compile base dbt artifacts into target-base and head artifacts into target
-  before review when structural/equivalence lanes need base-vs-head compiled SQL.
-- Use local DuckDB first; use BigQuery where warehouse behavior matters and
-  credentials are available. Trino/Postgres/ClickHouse/DuckDB can be spawned
-  locally. Snowflake later.
-- Treat altimate-core equivalence as heuristic unless there is a sound proof
-  path. Do not block or approve solely from heuristic equivalence. Block only on
-  reproducible deterministic facts or warehouse-observed impact.
-- Track results in
-  docs/internal/2026-06-08-dbt-pr-review-self-improvement-loop.md.
-
-Iteration speed:
-- Optimize for fast learning loops. Do not run full test suites or expensive
-  builds by default.
-- Start with the narrowest relevant unit test or direct function smoke test.
-- Only run broader tests after a candidate passes the focused check.
-- Avoid rebuilding altimate-core unless the change affects native exports, NAPI
-  bindings, or e2e verification requires the local Node package.
-- Prefer `cargo test -p altimate-core <module_or_test_name>` over full cargo
-  test.
-- Prefer `bun test test/altimate/<specific-test-file>.ts` over full package
-  tests.
-- Use direct Node smoke tests against `crates/altimate-core-node` before
-  relinking altimate-code.
-- Run the real-world corpus and demo matrix only at acceptance checkpoints, not
-  after every edit.
-- Run warehouse e2e only when the value claim depends on observed data
-  behavior.
-- Keep a short validation ladder for each candidate: smoke -> focused unit ->
-  focused integration -> corpus/demo -> warehouse e2e if needed.
-
-Initial backlog:
-1. Remove safe-refactor noise: base compiled artifact support and/or equivalence
-   degradation handling.
-2. Tighten PII precision: do not flag low-confidence Name such as
-   customer_name, but still flag email/SSN/phone/payment identifiers.
-3. Reduce low-value missing_table_alias comments for dbt CTE-heavy SQL.
-4. Expand core structural diff rules for high-signal dbt semantic regressions.
-5. Add warehouse data-diff e2e cases that prove real row/value impact.
-```
-
-## Skill vs Goal
-
-Create a skill once this loop stabilizes across several iterations. A skill is
-the right home for reusable operating instructions: how to pull core, choose
-core-vs-code ownership, build local NAPI, run the demo matrix, and apply the
-validation ladder. The active `/goal` should still include the quantified
-objective above because it provides the stop condition and prevents open-ended
-"make it better" work.
-
-## Baseline From Dry Run
-
-Commands run from `packages/opencode` unless noted:
-
-```bash
-bun run --conditions=browser script/review-realworld-eval.ts
-```
-
-Result:
-
-- Sourced bad-case catch rate: 15/15.
-- Corrected-case false positives: 0/5.
-
-Demo matrix run against `demo/dbt-pr-review-demo` branches with the local dbt
-virtualenv and `altimate review --mode gate --json`:
-
-| Branch | Expected | Actual | Finding categories |
-|---|---:|---:|---|
-| `demo/safe-refactor` | `APPROVE` | `COMMENT` | `semantic_change`, `pii_exposure`, `sql_quality` |
-| `demo/join-key-breakage` | `REQUEST_CHANGES` | `COMMENT` | `semantic_change`, `pii_exposure`, `sql_correctness`, `sql_quality` |
-| `demo/test-removal` | `COMMENT` | `COMMENT` | `test_coverage` |
-| `demo/new-pii-exposure` | `REQUEST_CHANGES` | `REQUEST_CHANGES` | `pii_exposure`, `semantic_change`, `sql_quality` |
-| `demo/mart-select-star` | `COMMENT` | `COMMENT` | `semantic_change`, `pii_exposure`, `sql_quality`, `test_coverage` |
-| `demo/incremental-without-guard` | `COMMENT` | `COMMENT` | `semantic_change`, `pii_exposure`, `join_risk`, `warehouse_cost`, `sql_quality`, `materialization`, `sql_correctness`, `test_coverage` |
-
-Observed problems:
-
-- `safe-refactor` is noisy. The semantic lane cannot decide equivalence because
-  base compiled SQL is unavailable, `Name` classification flags `customer_name`
-  as PII, and `missing_table_alias` flags harmless CTE aliasing.
-- `join-key-breakage` is not blocked. The useful join-key mismatch appears from
-  the advisory AI lane, not the deterministic lane, so it cannot gate.
-- Several demo branches are degraded because base-vs-head compiled artifacts are
-  incomplete. A safe review must not pretend heuristic equivalence is a proof.
-
-## Loop Contract
-
-For each candidate improvement:
-
-1. **Spec**: one paragraph stating the risk, the deterministic evidence, and
-   what must not be flagged.
-2. **Fixtures**: at least one bad case and one corrected/safe countercase.
-3. **Implementation**: smallest scoped change in the reviewer or demo workflow.
-4. **Validation**:
-   - Unit or harness test for the detector.
-   - `bun run --conditions=browser script/review-realworld-eval.ts`.
-   - Demo branch matrix for affected branches.
-   - Warehouse-backed run when the value claim depends on actual data impact.
-5. **Decision**:
-   - Keep only if it improves catch rate or demo parity with zero new false
-     positives.
-   - Reject or keep advisory-only if evidence is heuristic, schema-dependent,
-     or too broad.
-
-## First Candidate Backlog
-
-### 1. Deterministic Join-Key Regression Detector
-
-Status: pushed down to `altimate-core-internal` as structural diff rule
-`SC010 join_key_regression` in
-`crates/altimate-core/src/review/structural_diff.rs`. `altimate-code` only maps
-the core finding into the review verdict.
-
-Risk: a PR changes a join predicate from matching the same business key to
-joining unrelated identifiers, e.g. `orders.customer_id = customers.customer_id`
-becomes `orders.order_id = customers.customer_id`.
-
-Evidence:
-
-- Base and head SQL are both available.
-- A changed `JOIN ... ON` equality compares identifier columns ending in `_id`.
-- The base predicate joined same-name/same-stem keys, while the head predicate
-  joins different stems.
-
-Do not flag:
-
-- Intentional bridge joins such as `orders.order_id = order_items.order_id`.
-- Joins where the changed key stems still match.
-- Cases without a base predicate to compare.
-
-Expected result:
-
-- `demo/join-key-breakage` gets a deterministic `join_risk` or
-  `sql_correctness` critical finding and blocks in gate mode.
-- `demo/safe-refactor` remains quiet for this detector.
-
-Validation:
-
-- `git pull --ff-only origin main` completed in
-  `/Users/anandgupta/codebase/altimate-core-internal`; latest core main was
-  `a413804` before this local rule change.
-- `cargo test -p altimate-core review::structural_diff --lib` passed with the
-  new `SC010` bad case and safe countercases.
-- `bun test --timeout 30000 test/altimate/review.test.ts` passed.
-- `bun run --conditions=browser script/review-realworld-eval.ts` stayed at
-  15/15 sourced catches and 0/5 corrected false positives.
-- Demo matrix: `demo/join-key-breakage` improved from `COMMENT` to
-  `REQUEST_CHANGES` with deterministic `join_risk`.
-- The detector itself stayed quiet on `demo/safe-refactor` and on the
-  same-key bridge-join countercase.
-
-### 2. Base Compiled Artifact Support In Demo CI
-
-Risk: safe refactors degrade to "could not prove equivalent" because the review
-has incomplete proof inputs: missing base compiled SQL or incomplete schema
-columns.
-
-Evidence:
-
-- `compiled.ts` already supports `target-base/compiled/<project>/<path>`.
-- The demo workflow currently runs only head-side `dbt compile`.
-- `demo/safe-refactor` became decidable after `dbt docs generate` produced
-  `target/catalog.json`, but core equivalence then falsely compared CTE alias
-  names inside the join filter.
-
-Do not flag:
-
-- A safe CTE rename where base and head compiled SQL are equivalent or where no
-  deterministic structural difference exists.
-
-Expected result:
-
-- The demo workflow compiles base into `target-base` before compiling head.
-- The demo workflow produces `target/catalog.json` when a warehouse-backed dbt
-  project is available, so equivalence has complete columns.
-- `demo/safe-refactor` emits no findings with local core and DuckDB catalog
-  artifacts.
-
-Implemented result:
-
-- Core `L012 missing_table_alias` now ignores CTE references, removing the
-  `order_records/customer_records` alias noise while preserving real table alias
-  lint.
-- Core equivalence now resolves join predicate columns through CTE aliases to
-  base `table.column` provenance, so CTE alias renames are equivalent but a
-  changed join key remains material.
-- `altimate-code` now threads the detected dialect into
-  `altimate_core.equivalence` and `--no-ai` correctly disables the advisory lane.
-- Local DuckDB safe-refactor validation: `APPROVE`, zero findings.
-- Real-world corpus floor preserved: 15/15 bad cases caught, 0/5 false
-  positives.
-
-### 3. PII Classification Precision Floor
-
-Risk: high-noise PII comments on ordinary names such as `customer_name` reduce
-trust and make safe PRs look risky.
-
-Evidence:
-
-- `classify_pii` labels `customer_name` as `Name` at 75% confidence in the demo.
-- The stronger demo/security value is email/SSN/phone/payment identifiers and
-  source-propagated sensitive columns.
-
-Do not flag:
-
-- `Name` classification below a high confidence threshold.
-- Pre-existing PII columns not introduced by the PR.
-
-Expected result:
-
-- `demo/safe-refactor` no longer reports `customer_name`.
-- `demo/new-pii-exposure` still reports `email`.
-
-Implemented result:
-
-- Diff-scoped core PII classification is the authoritative PR-review PII
-  comment when lineage/classification are available.
-- High-confidence non-low-risk PII introduced into `marts/` or `reporting/`
-  is critical and blockable from `altimate_core.classify_pii` evidence.
-- Low-risk `Name`/`Address` classifications require at least 90% confidence to
-  surface, so `first_name`/`customer_name`-style weak signals do not create
-  noisy comments.
-- Fallback `dbt-patterns`/`rule-catalog` PII twins are suppressed for files
-  where the diff-scoped core classifier ran; fallback remains available when
-  core lineage/classification is unavailable.
-- `demo/new-pii-exposure` now reports one critical core PII finding for
-  `email` plus the core equivalence warning, and still returns
-  `REQUEST_CHANGES`.
-- Real-world corpus floor preserved: 15/15 bad cases caught, 0/5 false
-  positives.
-
-## Current Acceptance Checkpoint
-
-Demo matrix with local core, base artifacts in `target-base`, head artifacts in
-`target`, and AI disabled:
-
-| Branch | Expected | Actual | Deterministic evidence |
-|---|---:|---:|---|
-| `demo/safe-refactor` | `APPROVE` | `APPROVE` | no findings |
-| `demo/join-key-breakage` | `REQUEST_CHANGES` | `REQUEST_CHANGES` | `altimate_core.structural_diff:SC010`, core equivalence |
-| `demo/test-removal` | `COMMENT` | `COMMENT` | `dbt-patterns:removed_tests` |
-| `demo/new-pii-exposure` | `REQUEST_CHANGES` | `REQUEST_CHANGES` | `altimate_core.classify_pii`, core equivalence |
-| `demo/mart-select-star` | `COMMENT` | `COMMENT` | core equivalence warning |
-| `demo/incremental-without-guard` | `COMMENT` | `COMMENT` | core dbt config lint + deterministic catalog cost notes |
-
-DuckDB e2e proof:
-
-- Branch: `demo/join-key-breakage`.
-- `dbt build --profiles-dir . --target dev`: passed `PASS=14 WARN=0 ERROR=0`.
-- Reviewer: `REQUEST_CHANGES` with critical `join_risk` from
-  `altimate_core.structural_diff` rule `join_key_regression` (`SC010`).
-
-Warehouse data-diff e2e proof:
-
-- Test: `packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts`.
-- Connection: local DuckDB via `@altimateai/drivers/duckdb`.
-- Base/head relations: `base_orders` and `head_orders`.
-- Key columns: `order_id`; compared value column: `amount`.
-- Observed warehouse delta: one row only in head and one updated value.
-- Validation:
-  `ALTIMATE_RUN_WAREHOUSE_E2E=1 bun test --timeout 30000 test/altimate/data-diff-duckdb-e2e.test.ts`.
-- Default fast-loop behavior:
-  `bun test --timeout 30000 test/altimate/data-diff-duckdb-e2e.test.ts`
-  skips unless `ALTIMATE_RUN_WAREHOUSE_E2E=1` is set.
-- Implementation note: fixed the DuckDB connector to use the two-argument
-  constructor when no open options are required; Bun's native binding can miss
-  the callback when `undefined` is passed as the second argument.
-
-## Warehouse E2E Policy
-
-Use local DuckDB first for fast end-to-end checks, then add BigQuery runs when a
-candidate needs production-style warehouse behavior. Trino, Postgres,
-ClickHouse, and DuckDB can be spawned locally. Snowflake is deferred until a
-separate account is available.
-
-Warehouse-backed findings must include:
-
-- The connection used.
-- Base/head SQL or model relation names.
-- Key columns.
-- Row/value delta summary.
-- A skip/degraded state when credentials or drivers are unavailable.

From 173f9d94ca146be2c6f32fe6db7fd325c892dcc1 Mon Sep 17 00:00:00 2001
From: anandgupta42 <anand@altimate.ai>
Date: Tue, 9 Jun 2026 18:06:44 -0700
Subject: [PATCH 07/11] chore: drop non-functional dbt review docs edit

---
 docs/docs/usage/dbt-pr-review.md | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/docs/docs/usage/dbt-pr-review.md b/docs/docs/usage/dbt-pr-review.md
index e23cd8ddd..e4b63774b 100644
--- a/docs/docs/usage/dbt-pr-review.md
+++ b/docs/docs/usage/dbt-pr-review.md
@@ -332,11 +332,8 @@ reviewer does **not** re-implement Jinja — it consumes dbt's own compiled outp
    from `target/compiled/<project>/…` (HEAD) and `target-base/compiled/…` (BASE),
    produced by `dbt compile`. To enable full equivalence verdicts, compile both
    the base and head refs in CI (the base into `target-base/`, the Recce
-   convention). For proof-grade equivalence, also produce `target/catalog.json`
-   with `dbt docs generate`; the catalog carries complete warehouse columns that
-   `manifest.json` often lacks. Without compiled SQL or complete columns,
-   schema-dependent engine lanes such as equivalence and PII stay *undecidable*
-   (never fabricated) — the `dbt-patterns` lane still runs.
+   convention). Without compiled SQL the engine lanes fall back to raw and stay
+   *undecidable* (never fabricated) — the `dbt-patterns` lane still runs.
 
 ## What it checks — deterministic rule catalog
 

From 7c7b6dbabe5acb609190a73087aeb1f7f760af99 Mon Sep 17 00:00:00 2001
From: anandgupta42 <anand@altimate.ai>
Date: Tue, 9 Jun 2026 18:23:23 -0700
Subject: [PATCH 08/11] test: harden dbt review parity edge cases

---
 packages/drivers/test/driver-security.test.ts |  55 ++++++++++
 .../src/altimate/review/orchestrate.ts        |  82 +++++++-------
 .../altimate/data-diff-duckdb-e2e.test.ts     |  50 +++++++++
 .../opencode/test/altimate/review.test.ts     | 101 ++++++++++++++++++
 4 files changed, 249 insertions(+), 39 deletions(-)

diff --git a/packages/drivers/test/driver-security.test.ts b/packages/drivers/test/driver-security.test.ts
index a590235ff..1b48965b4 100644
--- a/packages/drivers/test/driver-security.test.ts
+++ b/packages/drivers/test/driver-security.test.ts
@@ -229,6 +229,61 @@ describe("DuckDB driver", () => {
       expect(await connector.execute("SELECT 1")).toMatchObject({ columns: ["ok"], rows: [[1]], row_count: 1 })
       await connector.close()
     })
+
+    test("retries read-only when native open synchronously reports a file lock", async () => {
+      let connectAttempts = 0
+      mock.module("duckdb", () => ({
+        default: {
+          Database: class {
+            constructor(_path: string, optsOrCb: any, cb?: (err: Error | null) => void) {
+              const opts = typeof optsOrCb === "function" ? undefined : optsOrCb
+              const done = openCallback(optsOrCb, cb)
+              connectAttempts++
+              done(connectAttempts === 1 && !opts?.access_mode ? new Error("DUCKDB_LOCKED: file is locked") : null)
+            }
+            connect() {
+              return {
+                all: (_sql: string, cb: (err: Error | null, rows: any[]) => void) => {
+                  cb(null, [{ ok: 1 }])
+                },
+              }
+            }
+            close(cb: any) {
+              if (cb) cb(null)
+            }
+          },
+        },
+      }))
+
+      const { connect } = await import("../src/duckdb")
+      const connector = await connect({ type: "duckdb", path: "/tmp/sync-lock.duckdb" })
+      await connector.connect()
+      expect(connectAttempts).toBe(2)
+      expect(await connector.execute("SELECT 1")).toMatchObject({ columns: ["ok"], rows: [[1]], row_count: 1 })
+      await connector.close()
+    })
+
+    test("propagates synchronous non-lock open errors without connecting undefined db", async () => {
+      mock.module("duckdb", () => ({
+        default: {
+          Database: class {
+            constructor(_path: string, optsOrCb: any, cb?: (err: Error | null) => void) {
+              openCallback(optsOrCb, cb)(new Error("catalog is corrupt"))
+            }
+            connect() {
+              throw new Error("should not connect")
+            }
+            close(cb: any) {
+              if (cb) cb(null)
+            }
+          },
+        },
+      }))
+
+      const { connect } = await import("../src/duckdb")
+      const connector = await connect({ type: "duckdb", path: "/tmp/corrupt.duckdb" })
+      await expect(connector.connect()).rejects.toThrow("catalog is corrupt")
+    })
   })
 })
 
diff --git a/packages/opencode/src/altimate/review/orchestrate.ts b/packages/opencode/src/altimate/review/orchestrate.ts
index e5341faea..aa37efcfb 100644
--- a/packages/opencode/src/altimate/review/orchestrate.ts
+++ b/packages/opencode/src/altimate/review/orchestrate.ts
@@ -916,22 +916,38 @@ async function siblingConsistencyLane(ctxs: ModelContext[], runner: ReviewRunner
  * only NEW PII (a column whose name appears in the added diff), so it never nags
  * about pre-existing PII columns the PR didn't touch.
  */
-async function piiClassifyLane(ctx: ModelContext, runner: ReviewRunner, dialect: string): Promise<Finding[]> {
-  if (!runner.classifyPii) return []
+async function piiClassifyLane(
+  ctx: ModelContext,
+  runner: ReviewRunner,
+  dialect: string,
+): Promise<{ findings: Finding[]; introducedColumns: Set<string>; completed: boolean }> {
+  const empty = { findings: [], introducedColumns: new Set<string>(), completed: false }
+  if (!runner.classifyPii) return empty
   const { file, engineNewSql } = ctx
-  if (!engineNewSql || file.status === "deleted") return []
+  if (!engineNewSql || file.status === "deleted") return empty
   const model = modelNameFromPath(file.path)
-  if (!runner.columnLineage) return []
+  if (!runner.columnLineage) return empty
   // Output columns the change INTRODUCED = head targets − base targets (precise;
   // a pre-existing column merely mentioned in the diff is NOT counted).
-  const headCols = [...new Set((await runner.columnLineage(engineNewSql, dialect)).map((e) => e.target).filter(Boolean))]
-  if (!headCols.length) return []
-  const baseCols = ctx.engineOldSql
-    ? new Set((await runner.columnLineage(ctx.engineOldSql, dialect)).map((e) => e.target.toLowerCase()))
-    : undefined
-  const newCols = baseCols ? headCols.filter((c) => !baseCols.has(c.toLowerCase())) : headCols
-  if (!newCols.length) return []
-  const pii = await runner.classifyPii(newCols)
+  let newCols: string[]
+  try {
+    const headCols = [...new Set((await runner.columnLineage(engineNewSql, dialect)).map((e) => e.target).filter(Boolean))]
+    if (!headCols.length) return { ...empty, completed: true }
+    const baseCols = ctx.engineOldSql
+      ? new Set((await runner.columnLineage(ctx.engineOldSql, dialect)).map((e) => e.target.toLowerCase()))
+      : undefined
+    newCols = baseCols ? headCols.filter((c) => !baseCols.has(c.toLowerCase())) : headCols
+  } catch {
+    return empty
+  }
+  const introducedColumns = new Set(newCols.map((c) => c.toLowerCase()))
+  if (!newCols.length) return { findings: [], introducedColumns, completed: true }
+  let pii
+  try {
+    pii = await runner.classifyPii(newCols)
+  } catch {
+    return empty
+  }
   const out: Finding[] = []
   const highExposureLayer = /(^|\/)(marts|reporting)\//.test(file.path)
   for (const p of pii) {
@@ -959,16 +975,7 @@ async function piiClassifyLane(ctx: ModelContext, runner: ReviewRunner, dialect:
       }),
     )
   }
-  return out
-}
-
-async function introducedOutputColumns(ctx: ModelContext, runner: ReviewRunner, dialect: string): Promise<Set<string> | undefined> {
-  if (!runner.columnLineage || !ctx.engineNewSql || ctx.file.status === "deleted") return undefined
-  const headCols = [...new Set((await runner.columnLineage(ctx.engineNewSql, dialect)).map((e) => e.target).filter(Boolean))]
-  if (!headCols.length) return new Set()
-  if (!ctx.engineOldSql) return new Set(headCols.map((c) => c.toLowerCase()))
-  const baseCols = new Set((await runner.columnLineage(ctx.engineOldSql, dialect)).map((e) => e.target.toLowerCase()))
-  return new Set(headCols.filter((c) => !baseCols.has(c.toLowerCase())).map((c) => c.toLowerCase()))
+  return { findings: out, introducedColumns, completed: true }
 }
 
 function piiLane(file: ChangedFile & { kind: string }, columns: string[]): Finding[] {
@@ -1061,6 +1068,7 @@ export async function runReview(input: OrchestrateInput): Promise<VerdictEnvelop
   const lanes = new Set(input.config.reviewers.length ? input.config.reviewers : TIER_LANES[tier])
 
   const all: Finding[][] = []
+  const diffScopedPiiFiles = new Set<string>()
   for (const ctx of ctxByPath.values()) {
     const tasks: Promise<Finding[]>[] = []
     // Engine lanes consume COMPILED SQL (rendered by dbt) when available.
@@ -1086,20 +1094,19 @@ export async function runReview(input: OrchestrateInput): Promise<VerdictEnvelop
       tasks.push(columnBreakageLane(ctx, input.runner, getCompiled, dialect))
     }
     if (lanes.has("test_coverage")) tasks.push(missingGrainTestLane(ctx, input.runner))
-    const diffScopedPiiCols =
-      input.runner.classifyPii && input.runner.columnLineage
-        ? await introducedOutputColumns(ctx, input.runner, dialect)
-        : undefined
+    // PII classification always runs (cheap name-pattern check, diff-scoped to
+    // newly-introduced columns) — exposing PII is a hard-floor concern that
+    // shouldn't depend on the cost tier enabling the pii_exposure lane. If core
+    // lineage/classification fails, keep the broad deterministic fallback.
+    const diffScopedPii = await piiClassifyLane(ctx, input.runner, dialect)
+    if (diffScopedPii.completed) diffScopedPiiFiles.add(ctx.file.path)
     if (lanes.has("pii_exposure")) {
-      const fallbackPii = diffScopedPiiCols
-        ? ctx.pii.filter((col) => !diffScopedPiiCols.has(col.toLowerCase()))
+      const fallbackPii = diffScopedPii.completed
+        ? ctx.pii.filter((col) => !diffScopedPii.introducedColumns.has(col.toLowerCase()))
         : ctx.pii
       all.push(piiLane(ctx.file, fallbackPii))
     }
-    // PII classification always runs (cheap name-pattern check, diff-scoped to
-    // newly-introduced columns) — exposing PII is a hard-floor concern that
-    // shouldn't depend on the cost tier enabling the pii_exposure lane.
-    tasks.push(piiClassifyLane(ctx, input.runner, dialect))
+    all.push(diffScopedPii.findings)
     // Deterministic dbt anti-pattern detectors run on RAW SQL + diff (need Jinja).
     if (lanes.has("dbt_patterns")) {
       all.push(detectModelPatterns(ctx.file, ctx.newSql, input.rubric))
@@ -1163,12 +1170,9 @@ export async function runReview(input: OrchestrateInput): Promise<VerdictEnvelop
   // regex still covers functions core didn't flag (dialect-suppressed / not in
   // core's set) plus reserved words / types / operators that L033 doesn't do.
   const flat = all.flat()
-  const diffScopedPiiFiles = new Set<string>()
-  if (input.runner.classifyPii && input.runner.columnLineage) {
-    for (const ctx of ctxByPath.values()) {
-      if (ctx.engineNewSql && ctx.file.status !== "deleted") diffScopedPiiFiles.add(ctx.file.path)
-    }
-  }
+  const schemaPiiFiles = new Set(
+    flat.filter((f) => f.category === "pii_exposure" && f.evidence?.tool === "schema.detect_pii").map((f) => f.file),
+  )
   // Per file, the concatenated rule names core's AST lint actually emitted —
   // so we suppress a regex twin only when core genuinely covered that concern.
   const coreRulesByFile = new Map<string, string>()
@@ -1195,7 +1199,7 @@ export async function runReview(input: OrchestrateInput): Promise<VerdictEnvelop
     const fallbackRule = String((f.evidence?.result as any)?.rule ?? "").replace(/_/g, "-")
     if (
       (tool === "dbt-patterns" || tool === "rule-catalog") &&
-      diffScopedPiiFiles.has(f.file) &&
+      (diffScopedPiiFiles.has(f.file) || schemaPiiFiles.has(f.file)) &&
       f.category === "pii_exposure" &&
       (fallbackRule === "pii-into-mart" || fallbackRule === "select-pii-columns")
     ) {
diff --git a/packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts b/packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts
index 3da67fec9..dfde505a5 100644
--- a/packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts
+++ b/packages/opencode/test/altimate/data-diff-duckdb-e2e.test.ts
@@ -41,4 +41,54 @@ describe("data.diff DuckDB e2e", () => {
     expect((result.outcome as any)?.stats?.exclusive_table2).toBe(1)
     expect((result.outcome as any)?.stats?.updated).toBe(1)
   })
+
+  e2eTest("does not report differences for identical DuckDB tables", async () => {
+    Registry.reset()
+    Registry.setConfigs({ duck_e2e: { type: "duckdb", path: ":memory:" } })
+    const conn = await Registry.get("duck_e2e")
+
+    await conn.execute("CREATE TABLE base_orders (order_id INTEGER, amount INTEGER)")
+    await conn.execute("CREATE TABLE head_orders (order_id INTEGER, amount INTEGER)")
+    await conn.execute("INSERT INTO base_orders VALUES (1, 100), (2, 200)")
+    await conn.execute("INSERT INTO head_orders VALUES (1, 100), (2, 200)")
+
+    const result = await runDataDiff({
+      source: "base_orders",
+      target: "head_orders",
+      key_columns: ["order_id"],
+      extra_columns: ["amount"],
+      source_warehouse: "duck_e2e",
+      target_warehouse: "duck_e2e",
+      algorithm: "joindiff",
+    })
+
+    expect(result.success).toBe(true)
+    expect((result.outcome as any)?.stats?.exclusive_table1).toBe(0)
+    expect((result.outcome as any)?.stats?.exclusive_table2).toBe(0)
+    expect((result.outcome as any)?.stats?.updated).toBe(0)
+  })
+
+  e2eTest("auto-discovers comparable columns when extra_columns is omitted", async () => {
+    Registry.reset()
+    Registry.setConfigs({ duck_e2e: { type: "duckdb", path: ":memory:" } })
+    const conn = await Registry.get("duck_e2e")
+
+    await conn.execute("CREATE TABLE base_orders (order_id INTEGER, amount INTEGER, updated_at TIMESTAMP)")
+    await conn.execute("CREATE TABLE head_orders (order_id INTEGER, amount INTEGER, updated_at TIMESTAMP)")
+    await conn.execute("INSERT INTO base_orders VALUES (1, 100, TIMESTAMP '2026-01-01 00:00:00'), (2, 200, TIMESTAMP '2026-01-01 00:00:00')")
+    await conn.execute("INSERT INTO head_orders VALUES (1, 100, TIMESTAMP '2026-01-02 00:00:00'), (2, 250, TIMESTAMP '2026-01-02 00:00:00')")
+
+    const result = await runDataDiff({
+      source: "base_orders",
+      target: "head_orders",
+      key_columns: ["order_id"],
+      source_warehouse: "duck_e2e",
+      target_warehouse: "duck_e2e",
+      algorithm: "joindiff",
+    })
+
+    expect(result.success).toBe(true)
+    expect((result.outcome as any)?.stats?.updated).toBe(1)
+    expect(result.excluded_audit_columns).toContain("updated_at")
+  })
 })
diff --git a/packages/opencode/test/altimate/review.test.ts b/packages/opencode/test/altimate/review.test.ts
index a6c06fd1c..88363b6a7 100644
--- a/packages/opencode/test/altimate/review.test.ts
+++ b/packages/opencode/test/altimate/review.test.ts
@@ -1410,6 +1410,107 @@ describe("orchestrate", () => {
     expect(env.verdict).toBe("REQUEST_CHANGES")
   })
 
+  test("PII fallback survives core lineage failure instead of aborting review", async () => {
+    const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+    email\n" }]
+    const runner: ReviewRunner = {
+      ...fakeRunner({}),
+      async detectPii() {
+        return { columns: ["email"] }
+      },
+      async columnLineage() {
+        throw new Error("lineage parser crashed")
+      },
+      async classifyPii(columns) {
+        return columns.map((column) => ({
+          column,
+          classification: "Email",
+          confidence: 0.95,
+        }))
+      },
+    }
+    const env = await runReview({
+      changedFiles: files,
+      config: { ...DEFAULT_REVIEW_CONFIG },
+      rubric: DEFAULT_RUBRIC,
+      mode: "gate",
+      runner,
+      getContent: content("select customer_id, email from customers", "select customer_id from customers"),
+    })
+    const pii = env.findings.filter((f) => f.category === "pii_exposure")
+    expect(pii).toHaveLength(1)
+    expect(pii[0].evidence?.tool).toBe("schema.detect_pii")
+    expect(env.verdict).toBe("REQUEST_CHANGES")
+  })
+
+  test("PII fallback survives core classification failure instead of hiding risk", async () => {
+    const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+    email\n" }]
+    const runner: ReviewRunner = {
+      ...fakeRunner({}),
+      async detectPii() {
+        return { columns: ["email"] }
+      },
+      async columnLineage(sql) {
+        return sql.includes("email")
+          ? [
+              { source: "customers.customer_id", target: "customer_id" },
+              { source: "customers.email", target: "email" },
+            ]
+          : [{ source: "customers.customer_id", target: "customer_id" }]
+      },
+      async classifyPii() {
+        throw new Error("classifier unavailable")
+      },
+    }
+    const env = await runReview({
+      changedFiles: files,
+      config: { ...DEFAULT_REVIEW_CONFIG },
+      rubric: DEFAULT_RUBRIC,
+      mode: "gate",
+      runner,
+      getContent: content("select customer_id, email from customers", "select customer_id from customers"),
+    })
+    const pii = env.findings.filter((f) => f.category === "pii_exposure")
+    expect(pii).toHaveLength(1)
+    expect(pii[0].evidence?.tool).toBe("schema.detect_pii")
+    expect(env.verdict).toBe("REQUEST_CHANGES")
+  })
+
+  test("case-variant PII detector output does not duplicate core diff-scoped finding", async () => {
+    const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+    Email as EMAIL\n" }]
+    const runner: ReviewRunner = {
+      ...fakeRunner({}),
+      async detectPii() {
+        return { columns: ["EMAIL", "email"] }
+      },
+      async columnLineage(sql) {
+        return sql.toLowerCase().includes("email")
+          ? [
+              { source: "customers.customer_id", target: "customer_id" },
+              { source: "customers.email", target: "EMAIL" },
+            ]
+          : [{ source: "customers.customer_id", target: "customer_id" }]
+      },
+      async classifyPii(columns) {
+        return columns.map((column) => ({
+          column,
+          classification: "Email",
+          confidence: 0.95,
+        }))
+      },
+    }
+    const env = await runReview({
+      changedFiles: files,
+      config: { ...DEFAULT_REVIEW_CONFIG },
+      rubric: DEFAULT_RUBRIC,
+      mode: "gate",
+      runner,
+      getContent: content("select customer_id, Email as EMAIL from customers", "select customer_id from customers"),
+    })
+    const pii = env.findings.filter((f) => f.category === "pii_exposure")
+    expect(pii).toHaveLength(1)
+    expect(pii[0].evidence?.tool).toBe("altimate_core.classify_pii")
+  })
+
   test("low-confidence name PII does not surface or fall back to regex PII comments", async () => {
     const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+    first_name\n" }]
     const runner: ReviewRunner = {

From 85f103f7b15d05c738149cc0db9ce3390e947262 Mon Sep 17 00:00:00 2001
From: anandgupta42 <anand@altimate.ai>
Date: Tue, 9 Jun 2026 20:40:40 -0700
Subject: [PATCH 09/11] fix: address consensus review findings on dbt PR
 reviewer (PR #919)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fixes raised by the multi-model review of #919, prioritizing the PII
false-negative that 5 models flagged.

- PII recall (MAJOR): the diff-scoped PII dedup keyed off whether the
  classifier *ran* and off ALL introduced columns, so a high-risk column
  core's classifier MISSED (e.g. `ssn`) was dropped from the broad
  `schema.detect_pii` fallback AND had its regex twin suppressed — silently.
  Now dedup against the columns core actually CLASSIFIED (`classifiedColumns`),
  and make the regex-twin suppression column-aware (`pii_into_mart` via its
  matched `result.line`; `select-pii-columns` only when a precise detector
  actually emitted a finding). A column core never classified now stays
  eligible for the broad detector and the regex net — including on
  trivial/lite tiers that don't run the `pii_exposure` lane. Low-confidence
  name/address suppression is preserved (core *considered* those columns).
- Structural escalation (MAJOR): only `join_risk` (SC010 join-key regression)
  escalates a core `error` to critical/blocking. Other structural rules core
  marks `error` stay advisory (warning), so the lane never over-blocks on a
  rule it wasn't designed to gate.
- DuckDB (MINOR): arm the 2s fallback timer BEFORE replaying a synchronous
  open callback so a sync resolve/reject can clear it (no dangling timer
  delaying process exit); release the half-open handle on open error.
- Concurrency (MINOR): run the diff-scoped PII lane as a concurrent task
  instead of an inline `await`, so its native lineage/classify calls overlap
  the other per-file engine lanes.
- Polish: align `ReviewRunner.equivalence` dialect param to optional (matches
  the impl); named constants for the PII confidence thresholds; document that
  PII exposure severity intentionally escalates only for `marts/reporting`;
  comment the `--no-ai`/`--ai=false` dual check.

Adds regression tests: core-missed column surfaces via broad fallback;
core-missed high-risk PII still caught by the regex twin when the broad PII
lane is off; non-join `error` structural rule stays advisory; sub-error
join-key regression stays a warning. Existing 76 review + 26 driver tests
still pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 packages/drivers/src/duckdb.ts                |  21 ++-
 .../src/altimate/review/orchestrate.ts        | 112 +++++++++++---
 packages/opencode/src/cli/cmd/review.ts       |   3 +
 .../opencode/test/altimate/review.test.ts     | 137 ++++++++++++++++++
 4 files changed, 249 insertions(+), 24 deletions(-)

diff --git a/packages/drivers/src/duckdb.ts b/packages/drivers/src/duckdb.ts
index 1aa2fdba7..867840d0a 100644
--- a/packages/drivers/src/duckdb.ts
+++ b/packages/drivers/src/duckdb.ts
@@ -57,20 +57,33 @@ export async function connect(config: ConnectionConfig): Promise<Connector> {
           let resolved = false
           let timeout: ReturnType<typeof setTimeout> | undefined
           let instance: any
+          // Sentinel for an open callback that fired synchronously (before
+          // `instance` was assigned): `undefined` = not yet fired, `null` =
+          // fired with success, `Error` = fired with failure. Replayed once
+          // `instance` exists.
           let pendingOpen: Error | null | undefined
           const opts = accessMode ? { access_mode: accessMode } : undefined
+          const closeQuietly = () => {
+            try {
+              if (instance && typeof instance.close === "function") instance.close()
+            } catch {
+              // best-effort cleanup of a half-open handle
+            }
+          }
           const onOpen = (err: Error | null) => {
             if (!instance) {
               pendingOpen = err
               return
             }
             if (resolved) {
-              if (instance && typeof instance.close === "function") instance.close()
+              closeQuietly()
               return
             }
             resolved = true
             if (timeout) clearTimeout(timeout)
             if (err) {
+              // Open failed — release the half-open handle so it doesn't leak.
+              closeQuietly()
               const msg = err.message || String(err)
               if (msg.toLowerCase().includes("locked") || msg.includes("SQLITE_BUSY") || msg.includes("DUCKDB_LOCKED")) {
                 reject(new Error("DUCKDB_LOCKED"))
@@ -84,14 +97,16 @@ export async function connect(config: ConnectionConfig): Promise<Connector> {
           instance = opts
             ? new duckdb.Database(dbPath, opts, onOpen)
             : new duckdb.Database(dbPath, onOpen)
-          if (pendingOpen !== undefined) onOpen(pendingOpen)
-          // Bun: native callback may not fire; fall back after 2s
+          // Bun: native callback may not fire; fall back after 2s. Arm the timer
+          // BEFORE replaying a synchronous callback so a sync resolve/reject can
+          // actually clear it (otherwise it lingers ~2s and delays process exit).
           timeout = setTimeout(() => {
             if (!resolved) {
               resolved = true
               reject(new Error(`Timed out opening DuckDB database "${dbPath}"`))
             }
           }, 2000)
+          if (pendingOpen !== undefined) onOpen(pendingOpen)
         })
 
       try {
diff --git a/packages/opencode/src/altimate/review/orchestrate.ts b/packages/opencode/src/altimate/review/orchestrate.ts
index aa37efcfb..ffa0c5615 100644
--- a/packages/opencode/src/altimate/review/orchestrate.ts
+++ b/packages/opencode/src/altimate/review/orchestrate.ts
@@ -117,7 +117,7 @@ export interface ReviewRunner {
   impact(model: string): Promise<ImpactResult>
   grade(sql: string, dialect: string): Promise<GradeResult>
   check(sql: string, dialect: string, baseSql?: string): Promise<CheckResult>
-  equivalence(oldSql: string, newSql: string, dialect: string): Promise<EquivalenceResult>
+  equivalence(oldSql: string, newSql: string, dialect?: string): Promise<EquivalenceResult>
   detectPii(sql: string, dialect: string): Promise<{ columns: string[] }>
   /**
    * Lexical scan (reserved-word aliases + dialect operators) — the curated lists
@@ -816,6 +816,20 @@ const STRUCTURAL_CATEGORY: Record<string, ReviewCategory> = {
   join_key_regression: "join_risk",
 }
 
+// PII classification thresholds for the diff-scoped lane. Names/addresses are
+// common, lower-sensitivity PII that the broad name-pattern detector over-flags,
+// so the precise lane requires higher confidence before surfacing them. Kept a
+// named constant (not a magic number) and deliberately narrow: broadening the
+// low-risk set to variants would suppress MORE real columns, hurting recall.
+const PII_LOW_RISK_CLASS = /^(name|address)$/i
+const PII_LOW_RISK_MIN_CONFIDENCE = 0.9
+const PII_DEFAULT_MIN_CONFIDENCE = 0.7
+// Column-name tokens the regex PII twins (`pii_into_mart`) key off — used to
+// recover which column a coarse regex finding flagged so suppression can be
+// column-aware (suppress only when core actually classified THAT column).
+const PII_TOKEN_RE =
+  /\b(email|ssn|social_security|phone_number|first_name|last_name|full_name|street_address|date_of_birth|dob|passport|credit_card)\b/i
+
 async function structuralChangeLane(ctx: ModelContext, runner: ReviewRunner): Promise<Finding[]> {
   if (!runner.structuralDiff) return []
   // Use the ENGINE (compiled) SQL — structural_diff parses with a real SQL parser,
@@ -829,7 +843,11 @@ async function structuralChangeLane(ctx: ModelContext, runner: ReviewRunner): Pr
   const out: Finding[] = []
   for (const f of raw) {
     const cat: ReviewCategory = STRUCTURAL_CATEGORY[f.rule] ?? "semantic_change"
-    const sev = clampSeverity(cat, f.severity === "error" ? "critical" : "warning", "high")
+    // Only `join_risk` (SC010 join-key regression) is allowed to block: a changed
+    // join key silently corrupts row sets. Other structural rules core marks
+    // `error` stay advisory (warning) so this lane never over-blocks on a rule
+    // it wasn't designed to gate.
+    const sev = clampSeverity(cat, cat === "join_risk" && f.severity === "error" ? "critical" : "warning", "high")
     out.push(
       makeFinding({
         severity: sev,
@@ -920,8 +938,8 @@ async function piiClassifyLane(
   ctx: ModelContext,
   runner: ReviewRunner,
   dialect: string,
-): Promise<{ findings: Finding[]; introducedColumns: Set<string>; completed: boolean }> {
-  const empty = { findings: [], introducedColumns: new Set<string>(), completed: false }
+): Promise<{ findings: Finding[]; classifiedColumns: Set<string>; completed: boolean }> {
+  const empty = { findings: [], classifiedColumns: new Set<string>(), completed: false }
   if (!runner.classifyPii) return empty
   const { file, engineNewSql } = ctx
   if (!engineNewSql || file.status === "deleted") return empty
@@ -940,20 +958,29 @@ async function piiClassifyLane(
   } catch {
     return empty
   }
-  const introducedColumns = new Set(newCols.map((c) => c.toLowerCase()))
-  if (!newCols.length) return { findings: [], introducedColumns, completed: true }
+  if (!newCols.length) return { findings: [], classifiedColumns: new Set<string>(), completed: true }
   let pii
   try {
     pii = await runner.classifyPii(newCols)
   } catch {
     return empty
   }
+  // Columns core's classifier actually CONSIDERED (returned a verdict for), even
+  // if below threshold. The broad fallback and the regex twins dedup against THIS
+  // set — not all introduced columns — so a column core simply MISSED (never
+  // classified) stays eligible for the broad name-pattern detector instead of
+  // being silently dropped.
+  const classifiedColumns = new Set(pii.map((p) => String(p.column ?? "").toLowerCase()).filter(Boolean))
   const out: Finding[] = []
+  // "Exposure" severity only escalates for published/broadly-read layers
+  // (marts/reporting). This is intentionally narrower than where grain tests
+  // apply (`missingGrainTestLane` also includes `intermediate`): an internal
+  // intermediate model surfacing PII is a warning, not a hard block.
   const highExposureLayer = /(^|\/)(marts|reporting)\//.test(file.path)
   for (const p of pii) {
     const classification = String(p.classification ?? "")
-    const lowRiskClass = /^(name|address)$/i.test(classification)
-    const minConfidence = lowRiskClass ? 0.9 : 0.7
+    const lowRiskClass = PII_LOW_RISK_CLASS.test(classification)
+    const minConfidence = lowRiskClass ? PII_LOW_RISK_MIN_CONFIDENCE : PII_DEFAULT_MIN_CONFIDENCE
     if (p.confidence < minConfidence || !p.column) continue
     const col = p.column.toLowerCase()
     const highConfidence = p.confidence >= 0.9
@@ -975,7 +1002,7 @@ async function piiClassifyLane(
       }),
     )
   }
-  return { findings: out, introducedColumns, completed: true }
+  return { findings: out, classifiedColumns, completed: true }
 }
 
 function piiLane(file: ChangedFile & { kind: string }, columns: string[]): Finding[] {
@@ -1068,7 +1095,13 @@ export async function runReview(input: OrchestrateInput): Promise<VerdictEnvelop
   const lanes = new Set(input.config.reviewers.length ? input.config.reviewers : TIER_LANES[tier])
 
   const all: Finding[][] = []
-  const diffScopedPiiFiles = new Set<string>()
+  // Files where the diff-scoped PII classifier completed (looked at the change),
+  // those where it actually emitted a finding, and — per file — the lowercased
+  // columns it classified. These drive column-aware dedup of the coarse regex
+  // PII twins in the merge step below.
+  const diffScopedPiiCompletedFiles = new Set<string>()
+  const diffScopedPiiFindingFiles = new Set<string>()
+  const classifiedPiiColumnsByFile = new Map<string, Set<string>>()
   for (const ctx of ctxByPath.values()) {
     const tasks: Promise<Finding[]>[] = []
     // Engine lanes consume COMPILED SQL (rendered by dbt) when available.
@@ -1098,15 +1131,31 @@ export async function runReview(input: OrchestrateInput): Promise<VerdictEnvelop
     // newly-introduced columns) — exposing PII is a hard-floor concern that
     // shouldn't depend on the cost tier enabling the pii_exposure lane. If core
     // lineage/classification fails, keep the broad deterministic fallback.
-    const diffScopedPii = await piiClassifyLane(ctx, input.runner, dialect)
-    if (diffScopedPii.completed) diffScopedPiiFiles.add(ctx.file.path)
-    if (lanes.has("pii_exposure")) {
-      const fallbackPii = diffScopedPii.completed
-        ? ctx.pii.filter((col) => !diffScopedPii.introducedColumns.has(col.toLowerCase()))
-        : ctx.pii
-      all.push(piiLane(ctx.file, fallbackPii))
-    }
-    all.push(diffScopedPii.findings)
+    // Run it as a concurrent task (not an inline await) so the heavy native
+    // lineage/classify calls overlap the other engine lanes for this file.
+    const currentCtx = ctx
+    tasks.push(
+      (async () => {
+        const diffScopedPii = await piiClassifyLane(currentCtx, input.runner, dialect)
+        if (diffScopedPii.completed) {
+          diffScopedPiiCompletedFiles.add(currentCtx.file.path)
+          classifiedPiiColumnsByFile.set(currentCtx.file.path, diffScopedPii.classifiedColumns)
+        }
+        if (diffScopedPii.findings.length) diffScopedPiiFindingFiles.add(currentCtx.file.path)
+        const findings = [...diffScopedPii.findings]
+        if (lanes.has("pii_exposure")) {
+          // The broad fallback covers output PII columns the precise lane did NOT
+          // classify (pre-existing columns, or ones core's classifier missed) —
+          // dedup against the columns core actually CONSIDERED, never all
+          // introduced columns, so a missed column still surfaces here.
+          const fallbackPii = diffScopedPii.completed
+            ? currentCtx.pii.filter((col) => !diffScopedPii.classifiedColumns.has(col.toLowerCase()))
+            : currentCtx.pii
+          findings.push(...piiLane(currentCtx.file, fallbackPii))
+        }
+        return findings
+      })(),
+    )
     // Deterministic dbt anti-pattern detectors run on RAW SQL + diff (need Jinja).
     if (lanes.has("dbt_patterns")) {
       all.push(detectModelPatterns(ctx.file, ctx.newSql, input.rubric))
@@ -1197,13 +1246,34 @@ export async function runReview(input: OrchestrateInput): Promise<VerdictEnvelop
   const merged = flat.filter((f) => {
     const tool = f.evidence?.tool
     const fallbackRule = String((f.evidence?.result as any)?.rule ?? "").replace(/_/g, "-")
+    // Dedup the coarse regex PII twins against the precise deterministic PII
+    // detectors — but only when a better detector genuinely COVERS the same
+    // column. Suppressing whenever the classifier merely "ran" hid real PII
+    // (a high-risk column core's classifier missed but the regex would catch,
+    // especially on tiers without the broad `pii_exposure` lane).
     if (
       (tool === "dbt-patterns" || tool === "rule-catalog") &&
-      (diffScopedPiiFiles.has(f.file) || schemaPiiFiles.has(f.file)) &&
       f.category === "pii_exposure" &&
       (fallbackRule === "pii-into-mart" || fallbackRule === "select-pii-columns")
     ) {
-      return false
+      // The broad `schema.detect_pii` lane already flagged this file → redundant.
+      if (schemaPiiFiles.has(f.file)) return false
+      if (fallbackRule === "pii-into-mart") {
+        // `pii_into_mart` carries the matched column in `result.line`: suppress
+        // only when core's classifier actually CONSIDERED that column (flagged or
+        // deliberately muted). A column core never classified falls through.
+        const token = PII_TOKEN_RE.exec(String((f.evidence?.result as any)?.line ?? ""))?.[1]?.toLowerCase()
+        if (token && classifiedPiiColumnsByFile.get(f.file)?.has(token)) return false
+        // No recoverable column: defer to the prior file-level behavior so the
+        // twin isn't double-reported when core ran for the file.
+        if (!token && diffScopedPiiCompletedFiles.has(f.file)) return false
+        return true
+      }
+      // `select-pii-columns` doesn't preserve its column. Suppress only when a
+      // precise detector actually EMITTED a PII finding for the file (so the
+      // exposure is reported by the better source, not silently dropped).
+      if (diffScopedPiiFindingFiles.has(f.file)) return false
+      return true
     }
     if (!coreRanFiles.has(f.file)) return true
     if (tool !== "dbt-patterns" && tool !== "rule-catalog") return true
diff --git a/packages/opencode/src/cli/cmd/review.ts b/packages/opencode/src/cli/cmd/review.ts
index 7b594025c..ce766cd2e 100644
--- a/packages/opencode/src/cli/cmd/review.ts
+++ b/packages/opencode/src/cli/cmd/review.ts
@@ -58,6 +58,9 @@ export const ReviewCommand = cmd({
         manifestPath: args.manifest as string | undefined,
         mode: args.mode as ReviewMode | undefined,
         severityThreshold: args.severity as Severity | undefined,
+        // The flag is registered as `--no-ai`, so yargs sets `args.noAi`. We also
+        // accept `args.ai === false` to cover yargs' boolean auto-negation
+        // (`--ai=false`) and programmatic callers that pass `ai: false`.
         noAi: args.noAi === true || args.ai === false,
       })
 
diff --git a/packages/opencode/test/altimate/review.test.ts b/packages/opencode/test/altimate/review.test.ts
index 88363b6a7..251c4f469 100644
--- a/packages/opencode/test/altimate/review.test.ts
+++ b/packages/opencode/test/altimate/review.test.ts
@@ -1545,6 +1545,143 @@ describe("orchestrate", () => {
     expect(env.findings.some((f) => f.category === "pii_exposure")).toBe(false)
   })
 
+  test("diff-scoped PII recall: a column core's classifier MISSED still surfaces via the broad fallback", async () => {
+    // Regression guard: when core `classify_pii` returns NOTHING for an introduced
+    // column (it missed it), that column must remain eligible for the broad
+    // `schema.detect_pii` detector — it must not be silently dropped just because
+    // the diff-scoped lane ran for the file.
+    const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+    ssn\n" }]
+    const runner: ReviewRunner = {
+      ...fakeRunner({}),
+      async detectPii() {
+        return { columns: ["ssn"] }
+      },
+      async columnLineage(sql) {
+        return sql.includes("ssn")
+          ? [
+              { source: "customers.customer_id", target: "customer_id" },
+              { source: "customers.ssn", target: "ssn" },
+            ]
+          : [{ source: "customers.customer_id", target: "customer_id" }]
+      },
+      async classifyPii() {
+        return []
+      },
+    }
+    const env = await runReview({
+      changedFiles: files,
+      config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["pii_exposure"] },
+      rubric: DEFAULT_RUBRIC,
+      mode: "gate",
+      runner,
+      getContent: content("select customer_id, ssn from customers", "select customer_id from customers"),
+    })
+    const pii = env.findings.filter((f) => f.category === "pii_exposure")
+    expect(pii).toHaveLength(1)
+    expect(pii[0].evidence?.tool).toBe("schema.detect_pii")
+    expect(env.verdict).toBe("REQUEST_CHANGES")
+  })
+
+  test("core-missed high-risk PII still caught by the regex twin when the broad PII lane is off", async () => {
+    // Lower tiers (trivial/lite) run `dbt_patterns` but not `pii_exposure`. If core
+    // classification ran but missed a high-risk column, the regex twin is the only
+    // safety net — it must NOT be suppressed for a column core never classified.
+    const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+    ssn\n" }]
+    const runner: ReviewRunner = {
+      ...fakeRunner({}),
+      async detectPii() {
+        return { columns: ["ssn"] }
+      },
+      async columnLineage(sql) {
+        return sql.includes("ssn")
+          ? [
+              { source: "customers.customer_id", target: "customer_id" },
+              { source: "customers.ssn", target: "ssn" },
+            ]
+          : [{ source: "customers.customer_id", target: "customer_id" }]
+      },
+      async classifyPii() {
+        return []
+      },
+    }
+    const env = await runReview({
+      changedFiles: files,
+      config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["dbt_patterns"] },
+      rubric: DEFAULT_RUBRIC,
+      mode: "gate",
+      runner,
+      getContent: content("select customer_id, ssn from customers", "select customer_id from customers"),
+    })
+    const pii = env.findings.filter((f) => f.category === "pii_exposure")
+    expect(
+      pii.some((f) => f.evidence?.tool === "dbt-patterns" && (f.evidence?.result as any)?.rule === "pii_into_mart"),
+    ).toBe(true)
+    expect(env.verdict).toBe("REQUEST_CHANGES")
+  })
+
+  test("structural error severity blocks only for join_risk; other rules stay advisory", async () => {
+    // SC003 group_by_change is `semantic_change`, not `join_risk`. Even at core
+    // `severity: "error"` it must NOT be escalated to critical — only join-key
+    // regressions are allowed to hard-block.
+    const oldSql = "select g, sum(x) as x from t group by g"
+    const newSql = "select g, h, sum(x) as x from t group by g, h"
+    const runner: ReviewRunner = {
+      ...fakeRunner({}),
+      async structuralDiff() {
+        return [{ code: "SC003", rule: "group_by_change", severity: "error", message: "grain changed" }]
+      },
+    }
+    const env = await runReview({
+      changedFiles: [{ path: "models/marts/fct_orders.sql", status: "modified", diff: "+group by g, h\n" }],
+      config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["semantic_change"], ai: false },
+      rubric: DEFAULT_RUBRIC,
+      mode: "gate",
+      runner,
+      getContent: async (_f, side) => (side === "new" ? newSql : oldSql),
+      getCompiled: async (_f, side) => (side === "new" ? newSql : oldSql),
+      generatedAt: "2026-06-08T00:00:00Z",
+    })
+    const f = env.findings.find(
+      (x) => x.evidence?.tool === "altimate_core.structural_diff" && (x.evidence?.result as any)?.code === "SC003",
+    )
+    expect(f).toBeDefined()
+    expect(f!.severity).toBe("warning")
+    expect(f!.category).toBe("semantic_change")
+  })
+
+  test("join_key_regression below error severity stays advisory (warning, not critical)", async () => {
+    const oldSql = "select * from orders left join customers on orders.customer_id = customers.customer_id"
+    const newSql = "select * from orders left join customers on orders.order_id = customers.customer_id"
+    const runner: ReviewRunner = {
+      ...fakeRunner({}),
+      async structuralDiff() {
+        return [{ code: "SC010", rule: "join_key_regression", severity: "warning", message: "join key changed" }]
+      },
+    }
+    const env = await runReview({
+      changedFiles: [
+        {
+          path: "models/marts/fct_customer_orders.sql",
+          status: "modified",
+          diff: "+on orders.order_id = customers.customer_id\n",
+        },
+      ],
+      config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["semantic_change"], ai: false },
+      rubric: DEFAULT_RUBRIC,
+      mode: "gate",
+      runner,
+      getContent: async (_f, side) => (side === "new" ? newSql : oldSql),
+      getCompiled: async (_f, side) => (side === "new" ? newSql : oldSql),
+      generatedAt: "2026-06-08T00:00:00Z",
+    })
+    const f = env.findings.find(
+      (x) => x.evidence?.tool === "altimate_core.structural_diff" && (x.evidence?.result as any)?.code === "SC010",
+    )
+    expect(f).toBeDefined()
+    expect(f!.severity).toBe("warning")
+    expect(f!.category).toBe("join_risk")
+  })
+
   test("clean change with no manifest → degraded, APPROVE/COMMENT, lint-only labeled", async () => {
     const files: ChangedFile[] = [{ path: "models/staging/stg_x.sql", status: "modified", diff: "+select 1\n" }]
     const runner: ReviewRunner = {

From 842ebf12cc90fe6e288ea01850744823b175807f Mon Sep 17 00:00:00 2001
From: anandgupta42 <anand@altimate.ai>
Date: Wed, 10 Jun 2026 00:39:37 -0700
Subject: [PATCH 10/11] fix: address PR #919 review comments on dbt PR reviewer
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Resolve the three unresolved bot-review threads on PR #919:

- `orchestrate.ts` (`piiClassifyLane`): when `columnLineage` resolves but
  derives zero output targets, return `empty` (`completed: false`) instead of
  `completed: true`. Zero head targets is lineage *uncertainty* (the classifier
  considered nothing), not a confident "no new PII", so the file must not be
  marked PII-complete — keeping the deterministic regex fallback alive. Distinct
  from the `!newCols.length` case, where lineage resolved columns but none are
  newly introduced (genuinely done). Defensive/intent fix: behavior-neutral for
  the current detectors (the tokenless suppression path is unreachable because
  `pii_into_mart` always carries a recoverable column token), but it makes the
  `completed` contract honest and removes a latent footgun.

- `review.ts`: correct the misleading `noAi` comment. Only `--no-ai` is
  registered, so `--ai=false` is not a supported CLI flag; `args.ai === false`
  only covers programmatic callers.

- `driver-security.test.ts`: assert the DuckDB lock retry actually requests
  `READ_ONLY` (record `access_mode` per attempt), not just that two attempts
  occurred.

Tests: add two `review.test.ts` regressions — a zero-target-lineage guard and a
token-gated suppression guard (core muted a low-confidence column → the coarse
regex twin is suppressed). Review suite 82 pass, driver suite 26 pass.

Reviewed via multi-model consensus (Claude + 7 models); the lone objection was
test completeness, now incorporated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 packages/drivers/test/driver-security.test.ts |  6 ++
 .../src/altimate/review/orchestrate.ts        |  7 +-
 packages/opencode/src/cli/cmd/review.ts       |  7 +-
 .../opencode/test/altimate/review.test.ts     | 78 +++++++++++++++++++
 4 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/packages/drivers/test/driver-security.test.ts b/packages/drivers/test/driver-security.test.ts
index 1b48965b4..6a0f8d8c5 100644
--- a/packages/drivers/test/driver-security.test.ts
+++ b/packages/drivers/test/driver-security.test.ts
@@ -136,12 +136,14 @@ describe("DuckDB driver", () => {
   describe("connect retry with READ_ONLY", () => {
     test("retries with READ_ONLY when file DB is locked on initial connect", async () => {
       let connectAttempts = 0
+      const accessModes: Array<string | undefined> = []
       mock.module("duckdb", () => ({
         default: {
           Database: class {
             constructor(_path: string, optsOrCb: any, cb?: (err: Error | null) => void) {
               const opts = typeof optsOrCb === "function" ? undefined : optsOrCb
               const done = openCallback(optsOrCb, cb)
+              accessModes.push(opts?.access_mode)
               connectAttempts++
               if (connectAttempts === 1 && !opts?.access_mode) {
                 // First attempt fails with lock error
@@ -169,6 +171,10 @@ describe("DuckDB driver", () => {
       const connector = await connect({ type: "duckdb", path: "/tmp/test.duckdb" })
       await connector.connect()
       expect(connectAttempts).toBe(2) // First failed, second succeeded in READ_ONLY
+      // The retry must specifically request READ_ONLY — two attempts alone don't
+      // prove the lock was worked around correctly.
+      expect(accessModes[0]).toBeUndefined()
+      expect(accessModes[1]).toBe("READ_ONLY")
 
       await connector.close()
     })
diff --git a/packages/opencode/src/altimate/review/orchestrate.ts b/packages/opencode/src/altimate/review/orchestrate.ts
index ffa0c5615..68cefafde 100644
--- a/packages/opencode/src/altimate/review/orchestrate.ts
+++ b/packages/opencode/src/altimate/review/orchestrate.ts
@@ -950,7 +950,12 @@ async function piiClassifyLane(
   let newCols: string[]
   try {
     const headCols = [...new Set((await runner.columnLineage(engineNewSql, dialect)).map((e) => e.target).filter(Boolean))]
-    if (!headCols.length) return { ...empty, completed: true }
+    // Zero head targets means lineage could NOT resolve this model's output
+    // columns — the classifier considered nothing, so this is uncertainty, not a
+    // confident "no new PII". Returning `completed: false` keeps the deterministic
+    // regex fallback alive (vs. the `!newCols.length` case below, where lineage
+    // DID resolve columns but none are newly introduced — that is genuinely done).
+    if (!headCols.length) return empty
     const baseCols = ctx.engineOldSql
       ? new Set((await runner.columnLineage(ctx.engineOldSql, dialect)).map((e) => e.target.toLowerCase()))
       : undefined
diff --git a/packages/opencode/src/cli/cmd/review.ts b/packages/opencode/src/cli/cmd/review.ts
index ce766cd2e..c271dbe5f 100644
--- a/packages/opencode/src/cli/cmd/review.ts
+++ b/packages/opencode/src/cli/cmd/review.ts
@@ -58,9 +58,10 @@ export const ReviewCommand = cmd({
         manifestPath: args.manifest as string | undefined,
         mode: args.mode as ReviewMode | undefined,
         severityThreshold: args.severity as Severity | undefined,
-        // The flag is registered as `--no-ai`, so yargs sets `args.noAi`. We also
-        // accept `args.ai === false` to cover yargs' boolean auto-negation
-        // (`--ai=false`) and programmatic callers that pass `ai: false`.
+        // The flag is registered as `--no-ai`, so yargs sets `args.noAi`. No `ai`
+        // option is declared, so `--ai=false` is NOT a supported CLI flag; the
+        // `args.ai === false` check only covers programmatic callers that pass
+        // `ai: false` directly.
         noAi: args.noAi === true || args.ai === false,
       })
 
diff --git a/packages/opencode/test/altimate/review.test.ts b/packages/opencode/test/altimate/review.test.ts
index 251c4f469..0e94f0ff4 100644
--- a/packages/opencode/test/altimate/review.test.ts
+++ b/packages/opencode/test/altimate/review.test.ts
@@ -1619,6 +1619,84 @@ describe("orchestrate", () => {
     expect(env.verdict).toBe("REQUEST_CHANGES")
   })
 
+  test("zero-target lineage does not mark the file PII-complete, so the regex twin survives", async () => {
+    // Regression guard: when `columnLineage` RESOLVES but yields no output targets
+    // (the parser couldn't derive columns — uncertainty, not "no new PII"), the
+    // diff-scoped lane must not flag the file as completed. Otherwise the file
+    // enters the PII-completed/classified sets and a deterministic regex twin could
+    // be suppressed for a column core never actually classified.
+    const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+    ssn\n" }]
+    const runner: ReviewRunner = {
+      ...fakeRunner({}),
+      async detectPii() {
+        return { columns: ["ssn"] }
+      },
+      async columnLineage() {
+        // Lineage resolves but derives zero output columns for every input.
+        return []
+      },
+      async classifyPii() {
+        return []
+      },
+    }
+    const env = await runReview({
+      changedFiles: files,
+      config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["dbt_patterns"] },
+      rubric: DEFAULT_RUBRIC,
+      mode: "gate",
+      runner,
+      getContent: content("select customer_id, ssn from customers", "select customer_id from customers"),
+    })
+    const pii = env.findings.filter((f) => f.category === "pii_exposure")
+    expect(
+      pii.some((f) => f.evidence?.tool === "dbt-patterns" && (f.evidence?.result as any)?.rule === "pii_into_mart"),
+    ).toBe(true)
+    expect(env.verdict).toBe("REQUEST_CHANGES")
+  })
+
+  test("core considered the matched column (muted as low-confidence) → the regex twin is suppressed", async () => {
+    // Inverse of the recall guards above: suppression must be COLUMN-aware, not
+    // merely "the classifier ran". Here core's `classify_pii` DID consider `ssn`
+    // but muted it (confidence below threshold → no diff-scoped finding). Because
+    // the column was considered, the coarse `pii_into_mart` regex twin for the
+    // same column is redundant and must be suppressed — otherwise the same column
+    // is double-reported. This exercises the token-gated suppression branch
+    // (`classifiedPiiColumnsByFile.has(token)`), distinct from the "never
+    // classified → twin survives" path.
+    const files: ChangedFile[] = [{ path: "models/marts/dim_customers.sql", status: "modified", diff: "+    ssn\n" }]
+    const runner: ReviewRunner = {
+      ...fakeRunner({}),
+      async detectPii() {
+        return { columns: ["ssn"] }
+      },
+      async columnLineage(sql) {
+        return sql.includes("ssn")
+          ? [
+              { source: "customers.customer_id", target: "customer_id" },
+              { source: "customers.ssn", target: "ssn" },
+            ]
+          : [{ source: "customers.customer_id", target: "customer_id" }]
+      },
+      async classifyPii() {
+        // Considered `ssn` but is not confident → muted, no diff-scoped finding,
+        // yet `ssn` still lands in the classified-columns set.
+        return [{ column: "ssn", classification: "SSN", confidence: 0.1 }]
+      },
+    }
+    const env = await runReview({
+      changedFiles: files,
+      config: { ...DEFAULT_REVIEW_CONFIG, reviewers: ["dbt_patterns"] },
+      rubric: DEFAULT_RUBRIC,
+      mode: "gate",
+      runner,
+      getContent: content("select customer_id, ssn from customers", "select customer_id from customers"),
+    })
+    const piiIntoMart = env.findings.filter(
+      (f) => f.evidence?.tool === "dbt-patterns" && (f.evidence?.result as any)?.rule === "pii_into_mart",
+    )
+    expect(piiIntoMart).toHaveLength(0)
+  })
+
   test("structural error severity blocks only for join_risk; other rules stay advisory", async () => {
     // SC003 group_by_change is `semantic_change`, not `join_risk`. Even at core
     // `severity: "error"` it must NOT be escalated to critical — only join-key

From 89f77a75cacad2e82ea76729bf6d7a8349ac3a3f Mon Sep 17 00:00:00 2001
From: anandgupta42 <anand@altimate.ai>
Date: Wed, 10 Jun 2026 00:53:57 -0700
Subject: [PATCH 11/11] fix: drop unsupported dialect arg from checkEquivalence
 call
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

`core.checkEquivalence` (altimate-core@0.4.0) is typed `(sqlA, sqlB, schema)` —
3 params, no dialect (unlike `columnLineage`/`compareQueries`). A 4th `dialect`
arg was added to this call earlier on the branch (b5060e0ce6), which the napi
binding silently ignores at runtime but which fails `tsgo` typecheck (TS2554)
and blocks the pre-push hook. Align with the binding contract and the sibling
call site in `native/sql/register.ts`. Behavior-neutral.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 packages/opencode/src/altimate/native/altimate-core.ts | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/packages/opencode/src/altimate/native/altimate-core.ts b/packages/opencode/src/altimate/native/altimate-core.ts
index 2d58bad38..a41296ee7 100644
--- a/packages/opencode/src/altimate/native/altimate-core.ts
+++ b/packages/opencode/src/altimate/native/altimate-core.ts
@@ -320,7 +320,7 @@ export function registerAll(): void {
   register("altimate_core.equivalence", async (params) => {
     try {
       const schema = schemaOrEmpty(params.schema_path, params.schema_context)
-      const raw = await core.checkEquivalence(params.sql1, params.sql2, schema, params.dialect)
+      const raw = await core.checkEquivalence(params.sql1, params.sql2, schema)
       const data = toData(raw)
       return ok(true, data)
     } catch (e) {