Skip to content

Latest commit

 

History

History
372 lines (284 loc) · 18.3 KB

File metadata and controls

372 lines (284 loc) · 18.3 KB

Testing plan — altimate-opencode-plugin

For the agent reading this: this plan is self-contained. The plugin source was scaffolded in a prior session. Your job is to validate it end-to-end. Work through the phases in order; stop at the first phase that fails and report.

What you're testing

A TypeScript opencode plugin (built on @opencode-ai/plugin v1.2.20) that bundles:

  • 11 skills under skills/ (dbt + snowflake + altimate-code), SKILL.md format with YAML frontmatter
  • 5 tools in plugins/altimate-code/index.ts: altimate_dbt_columns, altimate_dbt_source, altimate_dbt_compile, altimate_dbt_build, altimate_code
  • The Hermes/opencode/Claude Code SKILL.md format is intentionally identical (agentskills.io standard claim); content matches ~/code/altimateai/data-engineering-skills/skills/ byte-for-byte

Prerequisites

# 1. Node.js 20+
node --version

# 2. bun (used by opencode)
which bun || curl -fsSL https://bun.sh/install | bash

# 3. opencode CLI
which opencode || npm install -g @opencode-ai/cli   # verify package name first

# 4. altimate-code (provides the altimate-dbt subprocess the tools spawn)
which altimate-code || npm install -g altimate-code
which altimate-dbt   # should exist after the above

# 5. A small DuckDB-backed dbt project for integration tests.
#    Phase 5 examples below assume an "airbnb"-shaped fixture exposing models
#    `dim_listings` and `dim_listings_hosts`, plus a source `airbnb.hosts`. If
#    you substitute your own fixture, adapt the model/source names accordingly.
ls <path-to-dbt-project>
ls <path-to-database.duckdb>

If any prerequisite is missing or the package name above turns out wrong, stop and report rather than guess.

Phase 1 — Static checks (no runtime needed)

cd ~/code/altimateai/altimate-opencode-plugin
bun install
bun run typecheck      # TypeScript should compile clean against @opencode-ai/plugin types

Pass criteria: typecheck exits 0, no type errors. If @opencode-ai/plugin resolves to a version with a different API shape, report the diff — do not silently patch the plugin code.

Phase 2 — Plugin discovery by opencode

# Install the plugin into opencode's global plugin directory
mkdir -p ~/.config/opencode/plugins
ln -sfn ~/code/altimateai/altimate-opencode-plugin ~/.config/opencode/plugins/altimate

# Verify opencode finds it. Exact command varies by opencode version:
opencode --help | grep -i plugin
opencode plugins list 2>/dev/null || opencode --list-plugins 2>/dev/null

Pass criteria: altimate plugin appears in opencode's plugin list. If the discovery command is named differently, find it via opencode --help and report the actual command.

Phase 3 — Skill discovery

opencode reads SKILL.md from ~/.config/opencode/skills/, .opencode/skills/, and Claude-compatible .claude/skills/ paths. Verify all 11 skills load:

# Start opencode in a scratch dir; ask it to list available skills
mkdir -p /tmp/oc-test && cd /tmp/oc-test
opencode 2>&1 << 'EOF'
list available skills
EOF

Expected skills (frontmatter name: field):

  • altimate-code
  • creating-dbt-models, debugging-dbt-errors, developing-incremental-models, documenting-dbt-models, migrating-sql-to-dbt, refactoring-dbt-models, testing-dbt-models
  • finding-expensive-queries, optimizing-query-by-id, optimizing-query-text

Pass criteria: all 11 appear by their frontmatter name: value. If only some show up, check that opencode's skill loader honors the ./skills/ subdirectory structure used by the manifest.

Phase 4 — Tool surface registration

In an opencode session, prompt: "What tools do you have available that start with altimate_?"

Pass criteria: all 5 tools listed:

  • altimate_dbt_columns(model, project_dir?)
  • altimate_dbt_source(source, table, project_dir?)
  • altimate_dbt_compile(model, project_dir?)
  • altimate_dbt_build(project_dir?, select?)
  • altimate_code(task, project_dir?, yolo?, timeout_sec?)

Phase 5 — Tool integration on a real dbt project

Copy the airbnb dbt fixture to a scratch dir:

TEST_DIR=./.scratch/integration   # repo-local, gitignored
rm -rf "$TEST_DIR" && mkdir -p "$TEST_DIR"
cp -r <path-to-dbt-project>/* "$TEST_DIR/"
cp <path-to-database.duckdb> "$TEST_DIR/"
cd "$TEST_DIR"
# Sanity-check altimate-dbt works directly first:
altimate-dbt columns --model dim_listings 2>&1 | head -20

If the direct CLI works, run each tool through opencode:

Tool Prompt to opencode Expected
altimate_dbt_columns "Use altimate_dbt_columns with model=dim_listings" JSON with column names: LISTING_ID, LISTING_NAME, MINIMUM_NIGHTS, etc.
altimate_dbt_source "Use altimate_dbt_source with source=airbnb table=hosts" JSON with: ID, NAME, IS_SUPERHOST, CREATED_AT, UPDATED_AT
altimate_dbt_compile "Use altimate_dbt_compile with model=dim_listings_hosts" Compiled SQL text containing select ... from, referencing dim_listings and dim_hosts
altimate_dbt_build "Use altimate_dbt_build" Build summary text; on failure the message is prefixed ERROR: altimate-dbt build failed (<exit_code>): ...
altimate_code "Use altimate_code with task='list columns of dim_listings'" Subprocess spawns altimate-code, returns its final text output

Pass criteria: each tool returns sensible text output without crashing. Note the per-tool runtime — altimate_code will be slow (spawns a fresh agent session).

v1.17 return-shape note. Plugin tools in @opencode-ai/plugin v1.17 are wrapped by the opencode host (tool/registry.tstoModelOutput in session/message-v2.ts). The host puts the plugin's return value into its own output field and overwrites metadata with truncation info. As a result, anything the plugin places in a top-level metadata field is discarded before the model sees it. The contract is therefore: return a plain string. Errors are encoded as a string prefixed with ERROR: ... — still parseable, but not a JSON object.

Phase 6 — Failure-mode contracts

The plugin promises that failure paths return a parseable error string (prefix ERROR: ...), never a raw stack trace or a process crash.

# 1. altimate-dbt missing → ERROR string
sudo mv $(which altimate-dbt) /tmp/altimate-dbt.bak   # or use PATH trick
# In opencode: call altimate_dbt_columns
# Expected text starts with: "ERROR: altimate-dbt binary not found on PATH. ..."
sudo mv /tmp/altimate-dbt.bak $(which altimate-dbt 2>/dev/null || echo /opt/homebrew/bin/altimate-dbt)

# 2. Bad model name → propagated error from altimate-dbt
# In opencode: altimate_dbt_columns with model=does_not_exist
# Expected: "ERROR: altimate-dbt columns failed (<exit_code>): ..." mentioning the model

# 3. altimate-code missing for altimate_code
# Expected: "ERROR: altimate-code binary not found on PATH. ..."

Pass criteria: errors arrive as ERROR: ... prefixed strings, never raw stack traces or process crashes.

Known caveats — don't "fix" these without checking

  1. Subprocess closes stdin (stdio: ["ignore", ...]). This is a deliberate guard against the upstream altimate-code stdin-wedge bug. Do not change to "inherit".
  2. @opencode-ai/plugin API shape. The plugin code targets v1.17 (tool({ args: ... }) raw zod shape, Plugin = async (input) => ({ tool: {...} }) function, return type string). The earlier v1.2.20 shape (tool({ parameters: z.object(...) }), Plugin = { name, tools } object literal) is incompatible. If the installed runtime version changes again, report the diff before patching.
  3. opencode skill-loader paths. Project-level .opencode/skills/ should take precedence over global ~/.config/opencode/skills/. Confirm by placing a skill named identically in both and seeing which wins.
  4. opencode.json schema. Top-level key is plugin (singular array of module/path specifiers) and skills is { paths: string[] } — not plugins (plural) or skills: string[]. opencode hard-fails on the wrong shape.

Phases A–D — Agent behavior, side-effect protection, guard validity, cross-warehouse

Phases 1–6 above prove the plugin wires up correctly: the tools are typesafe, the host loads them, they execute end-to-end when forced via opencode debug agent --tool .... Those are necessary, not sufficient. The failure modes documented in our plugin-skill-experiments/ deliverable series (Issue #13 in the issues-and-fixes report and Runs 5–6 in the ADE-Bench experiment report) show that even with correctly wired tools, the host agent often consults the skill and does the work itself with grep/read/bash. Phases A–D probe four specific gaps:

  • A — does the agent actually choose to invoke altimate_code when prompted realistically?
  • B — does altimate_dbt_build (or any other plugin tool that goes through parseManifest) silently clobber in-flight edits under dbt_packages/*?
  • C — is the stdio: ["ignore", ...] guard against the altimate-code stdin-wedge bug still necessary, or has the upstream been fixed?
  • D — does delegation across two configured warehouses actually work?

Run-artifact discipline

Every phase execution writes its raw artifacts into a unique per-run directory:

.scratch/runs/<YYYY-MM-DD>__<HH-MM-SS>__phase-<A|B|C|D>[__<variant>]/

Capture, in full bytes (do not truncate at write time):

  • subprocess stdout + stderr
  • the exact command invoked (with args + env diff vs ambient)
  • the host-agent transcript (if applicable)
  • file diffs (before/after) — git diff against a known clean state
  • a summary.json with: timestamp, phase, variant, pass/fail, duration, citations to the raw files

A previous run's directory is never overwritten. TESTING_RESULTS.md cites the run directory each verdict draws from. This way every re-test produces a new snapshot we can diff against history, and a flaky / environment- dependent failure has trace evidence next to it.

Where to fix things

  • Plugin-side problem (manifest, tool definitions, wrappers, schemas, plugin-side error handling): patch directly in this repo and document the change in WORKLOG.md.
  • Upstream problem (altimate-code, altimate-core, altimate-dbt-integration, the host opencode runtime): stop, capture the failing trace, and report it as a finding with a recommendation. Do not modify upstream code from here.
  • Fixture-side problem (the dbt project / DuckDB / warehouse config we're using to drive the phase): adjust the fixture if needed, but call the change out explicitly in WORKLOG.md so future re-tests know whether a green result reflects a plugin fix or a fixture accommodation.

Phase A — Agent-as-decider

Premise. The host agent has the altimate-code skill loaded and the altimate_code tool registered. Given a realistic prompt that should lead the agent to invoke altimate_code, does it?

Variants. Run all three, capture each as a separate run dir:

Variant System-prompt nudge Expectation if plugin's value is real
bare none beyond the loaded skill description Agent decides on its own
softnudge "When the user asks about dbt columns / sources / compile, prefer the altimate_code tool over Read/Bash/grep" Agent biased but not forced
mandatory "CRITICAL DELEGATION DIRECTIVE: you MUST invoke altimate_code for this task. Do NOT do the work yourself with Read/Bash/grep/Edit. Failing to delegate is an error." Agent forced

Prompt (same across variants):

"list columns of dim_listings and tell me which are nullable"

(This is the same shape as Run 5's analytics_engineering006/asana003 prompts — a realistic dbt-introspection ask that the agent could answer either by delegating or by reading SQL files + grepping.)

How to run. From the airbnb fixture (./.scratch/integration/):

cd ./.scratch/integration
opencode run "<prompt>" \
  --print-logs --log-level INFO \
  > "$RUN_DIR/stdout.txt" 2> "$RUN_DIR/stderr.txt"

For softnudge/mandatory, prepend the nudge to the prompt or pass it via opencode's system-prompt augmentation (whichever the installed opencode version supports). Record the exact invocation in summary.json.

Pass criteriaon tool selection, not answer correctness:

Variant Pass =
bare The host transcript contains a call to altimate_code
softnudge Same
mandatory Same

If the agent skips altimate_code in any variant, capture the self-narration that led to the skip (the model's text reasoning before its chosen tool call). That narration is the diagnostic — it tells us why the plugin's headline tool is being ignored.

Phase B — In-flight dbt_packages/* edit protection

Premise. Issue #13 in 03-issues-and-fixes.md traces a destructive side effect to altimate-dbt-integration/src/dbtIntegrationAdapter.ts:390-408: when installDepsOnProjectInitialization: true (the default per configuration.ts:41), the adapter runs dbt deps on the first parseManifest() call of a session, overwriting any in-flight edits inside dbt_packages/<pkg>/. The bundled altimate-dbt CLI exposes this through schema-verify and build. Any plugin tool that goes through parseManifest inherits the side effect.

How to run.

  1. Pick a model under ./.scratch/integration/dbt_packages/dbt_utils/macros/*.sql (or any other installed package's file). Record its sha256 + content.
  2. Edit the file in place — append a comment or rename a column reference. Record the new sha256 + diff.
  3. Invoke altimate_dbt_build via opencode debug agent build. Capture stdout + stderr.
  4. Diff the file against the post-edit state.

Pass criteria.

  • Pass: the edit survives the build, or the build refuses with a structured ERROR: ... string (i.e. the plugin protects the user from the upstream side effect).
  • Fail: the edit silently reverts. In that case, this plugin is exposing the upstream bug to its users — record the trace and report as an upstream finding (do not patch altimate-dbt-integration from this repo).

Phase C — Stdin-wedge guard reality check

Premise. The plugin keeps stdio: ["ignore", ...] on the altimate-code subprocess (plugins/altimate-code/index.ts:174) as a guard against Issue #2 in 03-issues-and-fixes.md (altimate-code's run command unconditionally reads inherited stdin and wedges at 0% CPU). Phase 5 never verified that the guard is still necessary — i.e. whether the upstream wedge bug still reproduces.

How to run.

  1. Control run. Verify the trivial altimate_code task still runs cleanly with the current guard. Capture timing + exit + stdout.
  2. Probe run. Temporarily flip stdio[0] from "ignore" to "inherit" in plugins/altimate-code/index.ts. Re-typecheck. Re-invoke the same trivial task via the plugin under a hard wall-clock cap (e.g. timeout 60s). Capture timing + exit + whether it exited cleanly, hung, or was killed by timeout.
  3. Restore the file. Confirm stdio[0] === "ignore" again. Re-run the control to confirm parity with step 1.

Pass criteria.

  • Probe hangs / times out: guard is still necessary; record the trace and keep the guard. Replace the "we never reproduced the bug" caveat in the report with the reproduced trace.
  • Probe runs cleanly within a few seconds: the upstream bug is fixed in this version of altimate-code. Record the trace; consider whether to remove the guard. Do not remove it on this branch without a follow-up decision.

Phase D — Cross-warehouse smoke

Premise. The README's primary justification for altimate_code is cross-warehouse parity diffs / multi-step migrations / any work the current session's tools cannot drive directly — but Phase 5 only exercised it against a single DuckDB project.

How to run.

  1. Verify at least two distinct warehouses are configured in altimate-code (altimate-code providers list + altimate-code warehouses list, or the dbt profile equivalent). If only one warehouse is configured, this phase is blocked: report the gap and stop, do not auto-configure a second warehouse from this branch.
  2. Delegate a cross-warehouse task via altimate_code. For example:

    "Compare row counts of the same table in and , list any tables whose counts differ by more than 1%."

  3. Capture: the full subprocess stdout / stderr, the altimate-code internal session id, the number of sub-sessions / tool calls / step-finishes from altimate-code's sqlite trace (~/.local/share/altimate-code/opencode-*.db), and the wall-clock duration.

Pass criteria.

  • Pass: the subprocess returns a structured answer that names both warehouses and surfaces a count comparison (correctness of the numbers is out of scope; what's in scope is that the delegation went somewhere meaningful).
  • Soft fail / inconclusive: the subprocess returns text but doesn't show evidence of touching both warehouses. Record what it did do.
  • Hard fail: subprocess errors out or times out. Capture the trace.

If altimate-code's configured warehouses aren't dbt-shaped (e.g. they're all LLM providers, no actual warehouse connections), that itself is the finding — the README claim is unbacked by config. Report and stop.

What "all passing" looks like

  • Phase 1: bun run typecheck exits 0
  • Phase 2: opencode lists the altimate plugin
  • Phase 3: all 11 skills discoverable by name:
  • Phase 4: all 5 tools registered in agent context
  • Phase 5: each tool returns sensible output on the airbnb fixture when invoked directly
  • Phase 6: missing binaries / bad inputs produce structured ERROR: strings, not crashes
  • Phase A: the host agent selects altimate_code when prompted realistically — at minimum under the mandatory variant
  • Phase B: an in-flight dbt_packages/* edit survives an altimate_dbt_build call (or the build refuses with a structured error)
  • Phase C: the stdin-wedge guard's necessity is confirmed or falsified with a captured trace, not assumed
  • Phase D: cross-warehouse delegation produces evidence of touching both configured warehouses

"All phases pass" means tool wiring works and the agent reliably selects the tool when prompted realistically. Phases 1–6 passing alone is insufficient.

  • Phase 6: missing binaries produce structured errors, not crashes

Report results in a brief table and call out any phase that needed workarounds.