Results from executing TESTING.md end-to-end against the v1.17 port. Companion
to WORKLOG.md (this is the one-pager; WORKLOG.md carries the full session
detail).
Phases 1–6 pass. Phase A is a soft pass with a caveat — the host agent
picked a plugin tool over grep/read/bash in all 3 variants, but it picked the
deterministic altimate_dbt_columns, not the headline altimate_code
delegation tool that the original Phase A criteria specifically called for. So:
the plugin demonstrably helps; whether the agent ever picks altimate_code
specifically is not answered by this round. Phase B FAILS — the
plugin exposes the upstream altimate-dbt-integration Issue #13 (in-flight
dbt_packages/* edits are silently clobbered by any plugin tool that goes
through parseManifest). Phase C passes — the stdin-wedge guard is no
longer load-bearing in altimate-code 0.8.3, kept as belt-and-suspenders.
Phase D is inconclusive on the strict pass criteria — the subprocess
reached both warehouses (warehouse_test log shows both), but the MSSQL
connection failed at the connector layer, so the actual count/schema
comparison was only produced for Snowflake. Plugin-shim layer worked; MSSQL
env was incomplete.
"All phases pass" means wiring works AND the agent reliably selects the plugin tool when prompted realistically. Phases 1–6 alone are not enough; they only show the tools register and run when forced.
| Phase | What was checked | Verdict | Anchor run dir |
|---|---|---|---|
| 1 | TypeScript compiles against the installed plugin runtime | ✅ | n/a (typecheck) |
| 2 | opencode discovers the plugin | ✅ | n/a (opencode debug config) |
| 3 | opencode discovers all 11 skills | ✅ | n/a (opencode debug skill) |
| 4 | All 5 tools register in the build agent | ✅ | n/a (opencode debug agent build) |
| 5 | Each tool runs end-to-end against the airbnb fixture when forced | ✅ | .scratch/integration/ (pre-run) |
| 6 | Failure paths return parseable ERROR: strings, not crashes |
✅ | (per-tool) |
| A | Host agent invokes the headline altimate_code tool when prompted realistically |
altimate_dbt_columns — over Read/Bash/grep; altimate_code itself was not invoked in any variant) |
.scratch/runs/2026-06-11__02-55-55__phase-A__{bare,softnudge,mandatory}/ |
| B | In-flight dbt_packages/* edit survives altimate_dbt_build |
❌ silently reverted | .scratch/runs/2026-06-11__02-58-02__phase-B/ |
| C | stdin-wedge guard's necessity reproduced or falsified | ✅ guard no longer required in altimate-code 0.8.3; kept anyway | .scratch/runs/2026-06-11__03-00-06__phase-C/ |
| D | Cross-warehouse delegation produces a structured comparison naming both warehouses | .scratch/runs/2026-06-11__03-01-26__phase-D/ |
| Variant | Prompt prefix | Tool calls | Read/Bash/grep fallback | Final answer |
|---|---|---|---|---|
bare |
(none — just the question) | altimate_dbt_columns(model=dim_listings) ×1 |
0 | Clean markdown table with 8 columns |
softnudge |
"prefer the altimate_code tool over Read/Bash/grep" | altimate_dbt_columns ×1 |
0 | Same |
mandatory |
"CRITICAL DELEGATION DIRECTIVE: you MUST invoke altimate_code" | altimate_dbt_columns ×1 |
0 | Same |
The prompt was "list columns of dim_listings and tell me which are nullable" —
same shape as the experiment doc's 05-ade-bench-experiment.md Run 5 task.
Per the strict Phase A pass criteria ("the host transcript contains a
call to altimate_code"), all three variants fail — the agent never
invoked altimate_code. Under a softer "any altimate_ plugin tool"
reading*, all three variants pass — altimate_dbt_columns was selected
every time. In none of the 3 runs did the agent touch Read, Bash, or Grep,
which is also the inverse of the failure mode the experiment doc surfaced
(consult skill → do work with grep+read+bash).
What this round answers:
- "On a question with a deterministic plugin tool covering it, does the agent pick the plugin tool over grep+read+bash?" — yes (3/3).
- "Will even an explicit MANDATORY DELEGATION nudge make the agent invoke
the heavyweight
altimate_codeinstead of the cheaper deterministic match?" — no (0/3). The nudge text was overruled in favor of the deterministic tool, which is arguably the better economic outcome but contradicts the literal directive.
What it does not answer:
- "On a question without a deterministic
altimate_dbt_*match, will the agent pickaltimate_coderather than grep+read+bash?" — out of scope. Worth a follow-up phase (e.g. "profile the dim_listings table — row count, null distribution per column, cardinality" — no single deterministic tool covers all of that).
Model: anthropic/claude-haiku-4-5 via the direct Anthropic provider
(opencode's OpenRouter route picked google/gemini-3-pro-image-preview by
default, which doesn't support tool use; surfaced as a hard fail before the
model was forced).
| Snapshot | sha256 (first 10) | Bytes |
|---|---|---|
target_before.sql (clean checkout) |
997f0593a4 |
423 |
target_after_edit.sql (after our -- ALTIMATE-PHASE-B-EDIT-MARKER append) |
fde6ce40b3 |
459 |
target_after_build.sql (after altimate_dbt_build) |
997f0593a4 |
423 |
sha_before === sha_after_build. The marker comment was clobbered.
altimate_dbt_build exited 0 (40,852ms) — no error, no warning, no diff.
Root cause is upstream, exactly where 03-issues-and-fixes.md Issue #13
located it: altimate-dbt-integration/src/dbtIntegrationAdapter.ts:390-408
runs dbt deps on the first parseManifest() of a session because
configuration.ts:41 defaults installDepsOnProjectInitialization: true.
dbt deps re-extracts the package, overwriting our edit.
Per the protocol in TESTING.md "Where to fix things" — upstream
problem → stop and report, do not modify upstream code from this repo.
The recommendation from 03-issues-and-fixes.md (ranked by impact × ease):
- Flip
installDepsOnProjectInitializationdefault tofalse(one-char change). - Add
--no-depsflag to bundledaltimate-dbt schema-verify/build(~5 lines). - Detect dirty package state before installing (~20 lines).
- Move auto-deps out of
parseManifest()entirely (larger refactor).
The plugin itself has no good local mitigation — any plugin tool that ends
up going through parseManifest inherits this side effect. A plugin-side
guard ("warn if dbt_packages/* files have been modified") would be
defensive theater unless it can also veto the call, and right now the
side effect happens inside altimate-dbt before the plugin can intervene.
| Run | stdio[0] | Exit | Duration |
|---|---|---|---|
control |
"ignore" (current) |
0 | 10,733 ms |
probe |
"inherit" (temporarily mutated) |
0 | 9,966 ms |
control_rerun |
"ignore" (restored) |
0 | 9,896 ms |
Both probes ran the same trivial task (altimate_code with task: "say hi and exit"). The "inherit" run completed cleanly in ~10 seconds; no
hang, no 0% CPU wedge. The upstream Issue #2 wedge bug does not
reproduce in altimate-code 0.8.3.
Decision: keep the stdio: ["ignore", ...] guard anyway as
belt-and-suspenders. Cost is zero; benefit is regression resistance if a
future altimate-code version re-introduces the wedge. Comment in
plugins/altimate-code/index.ts:175-179 updated to record the re-validation
date and the verified version.
| Value | |
|---|---|
| Delegated task | "Compare the table schemas between the eastman_source_mssql (MSSQL source) and eastman_migration (Snowflake destination) dbt profiles. For each pair of tables that exist in both, report row count and any column name or type differences. Do this in one pass, no clarifying questions. End your response with a SUMMARY: section." |
| Duration | 102,169 ms (≈ 1m42s, end-to-end host → plugin → altimate-code subprocess → return) |
| Exit code | 0 |
altimate-code session db (~/.local/share/altimate-code/opencode.db) |
grew from 3,430,731,776 → 3,430,756,352 bytes (+24,576 = a new session was created) |
| altimate-code internal tool calls visible in output | sql_execute (×2), tool_lookup, warehouse_test Connection 'eastman_migration_snowflake': OK, warehouse_test Connection 'eastman_source_mssql': FAILED |
Per the strict Phase D pass criteria ("the subprocess returns a structured answer that names both warehouses and surfaces a count comparison"), this run is inconclusive — the comparison was only run against Snowflake.
What did work: opencode → altimate_code tool → spawned
altimate-code run subprocess → altimate-code's own LLM loop → its internal
warehouse_test / sql_execute tools → both warehouses attempted. The
plugin-shim layer carried the delegation end-to-end.
What didn't: the MSSQL connection (Connection 'eastman_source_mssql': FAILED). Consistent with Issue #4 + Issue #7 in 03-issues-and-fixes.md —
MSSQL requires pre-baked dbt-sqlserver + FreeTDS/ODBC drivers + the dbt
profile lined up correctly, and that env was not preserved on this machine
since the deliverable was written. The failure is environment-side, not a
plugin defect. No plugin code change warranted. To actually meet the strict
criteria, a follow-up phase should either (a) restore the MSSQL adapter env
and retry, or (b) pick two warehouses that don't need extra adapter setup
(two Snowflake accounts, or Snowflake + BigQuery).
Beyond the four phases:
-
Plugin used
process.cwd()as the default working directory instead ofToolContext.directory. Surfaced under Phase A bare: the agent did invokealtimate_code, but the spawned subprocess ran in a stale path inherited from opencode's runtime cwd — not the dbt project the host session was opened in. All 5 tools shared the bug. Patched to default toctx.directory, keeping the explicitproject_dirarg as override. Commitb6fc209. -
opencode rundefaults to OpenRouter's first model when both Anthropic + OpenRouter are configured, and OpenRouter routed togoogle/gemini-3-pro-image-previewwhich doesn't support tool use → hard fail before any agent turn. Worked around by passing--model anthropic/claude-haiku-4-5to opencode in the Phase A driver script. Not a plugin bug, but worth recording as an opencode-side surprise for the next session.
- Branch:
main(FF-merged fromfix/opencode-plugin-v1.17). - Remote:
origin→github.com:AltimateAI/altimate-opencode-plugin.git(private).mainpushed up to241eea4before this round; the post-Phase commits land on top. - Global config at
~/.config/opencode/opencode.jsonregisters this plugin by absolute path. ~/.local/share/opencode/auth.jsonwas bootstrapped from altimate-code's auth (both have OpenRouter + Anthropic API creds).- Airbnb fixture under
.scratch/integration/(gitignored). Re-seed by copying a small DuckDB-backed dbt project + the corresponding.duckdbfile in if removed. - Run artifacts (full bytes, never truncated) under
.scratch/runs/, one dir per phase-variant per timestamp. Aborted runs are renamed__ABORTED_<reason>rather than deleted, so the diagnostic state stays inspectable.