Skip to content

Multi-CLI sub-agent runtime cleanup (closes #90, #91)#92

Open
Yambr wants to merge 39 commits intomainfrom
feat/sub-agent-multi-cli-cleanup
Open

Multi-CLI sub-agent runtime cleanup (closes #90, #91)#92
Yambr wants to merge 39 commits intomainfrom
feat/sub-agent-multi-cli-cleanup

Conversation

@Yambr
Copy link
Copy Markdown
Owner

@Yambr Yambr commented May 3, 2026

Summary

  • Closes [v0.9.2.3] sub_agent MCP tool advertises Claude-only aliases (sonnet/opus/haiku) regardless of SUBAGENT_CLI — non-Claude operators can't onboard callers reliably #90: sub_agent MCP tool no longer leaks Claude model aliases (sonnet/opus/haiku) when SUBAGENT_CLI=opencode|codex. Per-CLI docstring assigned at registration time. Per-CLI default model envs (CLAUDE_/OPENCODE_/CODEX_SUB_AGENT_DEFAULT_MODEL); legacy global removed.
  • Closes [v0.9.2.3] sub_agent output: opencode error events bypass error path and dump raw NDJSON into Sub-Agent Completed (success) banner #91: opencode adapter now detects event.type=="error" events (rc=0 application errors) and surfaces data.message as a structured error instead of dumping raw NDJSON into a green success banner.
  • New CLI tool list-subagent-models — host-friendly model discovery (no sandbox container required). Reads OPENCODE_CONFIG_EXTRA / CODEX_CONFIG_EXTRA env if set, else canonical computer-use-server/cli-defaults/{opencode,codex}.json. claude returns static aliases (no native CLI subcommand exists). --inside-container flag preserves CLI-shelling as escape hatch.
  • sub-agent skill reworked (mirrors gitlab-explorer pattern): mandatory bash scripts/list_subagent_models.sh first-step. Hardcoded model identifiers stripped from SKILL.md and references/usage.md; load-bearing examples wrapped in <!-- canonical-example --> markers. CI audit script tests/test-skill-no-hardcoded-models.sh enforces.
  • Anthropic baseline dropped from _OPENCODE_ALIAS_MAP (now {} by default; aliases come exclusively from OPENCODE_MODEL_ALIASES env). opencode/codex no longer have hardcoded defaults — caller passes model or sets <CLI>_SUB_AGENT_DEFAULT_MODEL env, otherwise raises ValueError with actionable message pointing to list-subagent-models. claude default stays sonnet.
  • Single source of truth for canonical CLI defaultscomputer-use-server/cli-defaults/{opencode,codex}.json now feeds BOTH the sandbox Dockerfile entrypoint (when *_CONFIG_EXTRA env unset) AND list-subagent-models. No more duplicated provider/model lists.
  • Adapter test coveragetests/orchestrator/test_cli_adapters.py extended with 11 gap-closure tests across opencode/codex/claude (multi-event streams, empty stdout, malformed JSON, error events, usage extraction). Total: 30 tests.
  • 13 requirements (REQ-MCP-01..13) across two phases (M1).
  • 0 code-review critical findings; 3 warnings auto-fixed before this PR (Dockerfile TOML inline-table conversion, codex docstring drift, ValueError surfacing in sub_agent).

Test plan

  • python3 -m pytest tests/orchestrator/test_cli_adapters.py -v — expect 30 passed
  • python3 -m pytest tests/test-default-model-resolution.py tests/test_default_resolver_no_legacy_global.py tests/test_subagent_docstring.py tests/test-opencode-error-mapping.py tests/test_opencode_alias_map_drop.py tests/test_codex_toml_converter.py -v — full sub-agent surface, expect ~67 passed
  • bash tests/test-subagent-runtime.sh — umbrella, expect 6/6
  • bash tests/test-no-corporate.sh — expect 14/14
  • bash tests/test-project-structure.sh — expect 23/23
  • docker build --platform linux/amd64 -t open-computer-use:latest . && ./tests/test-docker-image.sh — verify Dockerfile refactor
  • Smoke: with SUBAGENT_CLI=opencode and no OPENCODE_SUB_AGENT_DEFAULT_MODEL env, calling sub_agent without a model arg surfaces the structured error (not an Anthropic-default silent fallback)
  • Smoke: opencode error event in stream is surfaced as red MCP tool error with the original data.message text
  • Smoke: list-subagent-models runs on host (orchestrator side, no sandbox container required) and prints the canonical opencode default

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added a model-discovery utility to list available sub-agent models.
    • Docker image now ships canonical CLI default configs and a helper for model listing.
  • Bug Fixes

    • Better mapping of opencode error events so application errors surface clearly.
  • Behavior Changes

    • Stricter per-CLI default model resolution; some CLIs now require explicit defaults or discovery.
  • Documentation

    • Updated sub-agent docs to require running model discovery and updated selection guidance.
  • Tests

    • Numerous tests added for adapters, resolver logic, CLI tools, and image validation.

Nick and others added 30 commits May 3, 2026 13:33
- Creates computer-use-server/bin/list-subagent-models (Python, BUSL-1.1)
- Dispatches per SUBAGENT_CLI: claude (static alias_map), opencode (opencode models), codex (codex debug models --bundled)
- Outputs stable JSON to stdout on success; structured JSON error to stderr on failure
- Exit codes: 0=success, 2=CLI missing/unknown, 3=parse/exec error
- Includes whitelist validation (T-01-01) and subprocess list-form argv (T-01-02)
- Truncates stderr to 200 chars to avoid info leakage (T-01-03)
- 30s timeout on all subprocesses (T-01-04)
- 9 tests covering et == 'error' branch (5 failing pre-fix)
- Tests verify: is_error promotion, message surfacing, null/missing data fallbacks
- Tests verify existing success/rc paths are preserved (4 passing pre-fix)
- REQ-MCP-03 acceptance criteria
…lve_opencode

- Add json + logging imports; add _LOG = logging.getLogger('cli_runtime')
- Add _merge_opencode_alias_map() after _OPENCODE_ALIAS_MAP: merges built-in
  Anthropic alias baseline with operator overrides from OPENCODE_MODEL_ALIASES
  env (JSON object string); read at call time so env changes take effect without
  restart (D-07)
- Validate override: malformed JSON, non-dict payload, non-string entries are
  each rejected with a logged warning; built-in map preserved on any error
- Update _resolve_opencode to call _merge_opencode_alias_map() at call time
  instead of referencing module-level _OPENCODE_ALIAS_MAP directly
- Built-in _OPENCODE_ALIAS_MAP (Anthropic baseline) preserved unchanged
…loop

- Handles {"type":"error","data":{...}} events that opencode emits with rc=0
- Promotes is_error=True even when subprocess exit code is 0 (defense-in-depth)
- Surfaces data.message as 'opencode error: <msg>' instead of dumping raw NDJSON
- Falls back to str(data) when message key is missing or data is non-dict/null
- continue skips text-extraction branches for error events (error wins)
- Existing rc-based is_error initialization at line 110 preserved (D-12)
- codex adapter untouched (D-13)
- Fixes Bug #91: opencode errors no longer appear as green success banners
… skill

- Create skills/public/sub-agent/scripts/list_subagent_models.sh
- Thin Bash wrapper mirroring gitlab-explorer/check_gitlab_auth.sh pattern
- MIT SPDX header per CLAUDE.md rule for skills/public/sub-agent/
- Fails fast with exit 127 if list-subagent-models is not on PATH
- Execs canonical Python tool (single source of truth, no logic duplication)
- Banner echo to stderr (stdout reserved for clean JSON pipe)
- Add '## Before Any sub_agent Call' as first H2 (mirrors gitlab-explorer pattern)
- Update frontmatter description to mention model discovery script
- Section includes: bash command block, 'This script will:' bullets, Why this matters
  paragraph with per-CLI model id syntax, default behavior per CLI table
- References OPENCODE_MODEL_ALIASES, OPENCODE_SUB_AGENT_DEFAULT_MODEL, CODEX_SUB_AGENT_DEFAULT_MODEL
- Satisfies REQ-MCP-05 / D-14 acceptance criteria
- COPY computer-use-server/bin/list-subagent-models to /usr/local/bin/
- chmod +x to make it executable inside the sandbox image
- Placed alongside extract-text COPY+chmod pattern (lines 590-595)
- computer-use-server/Dockerfile NOT modified (server image unchanged)
- Satisfies REQ-MCP-04: list-subagent-models reachable via command -v in sandbox
…ng registration

- Add _SUBAGENT_DOC_CLAUDE, _SUBAGENT_DOC_OPENCODE, _SUBAGENT_DOC_CODEX constants
- Add _subagent_docstring_for_cli(cli) helper that selects variant by CLI
- Remove @mcp.tool() decorator from sub_agent; replace with explicit mcp.add_tool()
- Assign sub_agent.__doc__ before mcp.add_tool() so FastMCP captures correct text
- Route default-model fallback through resolve_subagent_model('', resolve_cli())
- Add tests/test_subagent_docstring.py and test_subagent_deprecation.py
- Add one-shot deprecation warning for SUB_AGENT_DEFAULT_MODEL when SUBAGENT_CLI is opencode/codex

Reconstructed from worktree-agent-a6239975e979b2349 commits 610fde9 + 48aef3a; original cherry-pick would have deleted wave-1 files (worktree was based on pre-wave-1 commit) so additive files were extracted surgically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- computer-use-server/requirements.txt: add pytest>=8.0.0
- tests/test-subagent-cli-surface.py: per-CLI docstring assertions (8 tests)
- tests/test-default-model-resolution.py: D-08/D-09 resolution order (8 tests)
- tests/test-list-subagent-models.sh: script invocation per CLI (exit code + JSON)
…roject-structure.sh

- tests/test-subagent-runtime.sh: umbrella runner for all 6 sub-agent test suites
- tests/test-project-structure.sh: add [13/13] section invoking umbrella
…to test-docker-image.sh

- Adds dedicated PATH presence check (which list-subagent-models)
- Adds executability check (test -x) after the CLI tools loop
- Mirrors Phase 1 Plan 01-05 Dockerfile install line verification
…ADME

- Extract opencode canonical config verbatim from Dockerfile:484-500 heredoc
- Add codex.json as structured JSON baseline (empty model_providers = public OpenAI defaults)
- Both JSON files carry _spdx/_copyright keys per D-10 BUSL-1.1 convention
- README documents the _spdx key convention and consumer strip requirement
- Single source of truth for Plan 02-04 (Dockerfile refactor) and Plan 02-05 (list-subagent-models)
…as (D-05/D-06)

- _OPENCODE_ALIAS_MAP = {} — no Anthropic baseline (D-05)
- _resolve_opencode raises ValueError with actionable message when alias not
  in map (D-06) and when no default env is configured (D-02)
- Provider/model strings containing '/' still pass through unchanged
- New tests/test_opencode_alias_map_drop.py: 6 tests covering D-05/D-06
- Updated test_cli_runtime.py: replaced 4 stale opencode tests with
  D-05/D-06 behavior (raises on missing alias, raises on empty with no env)
…inside-container flag

- Default mode reads OPENCODE_CONFIG_EXTRA env (JSON) or cli-defaults/opencode.json
- codex mode reads CODEX_CONFIG_EXTRA env (TOML via tomllib) or cli-defaults/codex.json
- claude path unchanged: static alias list, always succeeds
- Add --inside-container flag to preserve Phase 1 CLI-shelling behavior (D-12)
- Add OPENCODE_MODEL_ALIASES env merge into top-level aliases{} output (D-11)
- Add _strip_spdx() to remove _spdx/_copyright/_doc keys before consumption
- Add _resolve_defaults_path(): LIST_SUBAGENT_CLI_DEFAULTS_DIR > /opt > repo-local
- All error paths emit structured stderr JSON, exit 2 (parse error) or 3 (missing file)
- Extend tests with 13 sections (24 assertions) covering all new modes
…CLI explicit defaults

D-01..D-04 from .planning/phases/02-sub-agent-runtime-cleanup/02-CONTEXT.md:

- Remove SUB_AGENT_DEFAULT_MODEL global from docker_manager.py (no deprecation, removed)
- Delete tests/test_subagent_deprecation.py (warning is gone)
- claude default stays 'sonnet' when no caller model and CLAUDE_SUB_AGENT_DEFAULT_MODEL unset
- opencode/codex raise ValueError with actionable message when no caller model and no <CLI>_SUB_AGENT_DEFAULT_MODEL env
- mcp_tools.py call site handles new ValueError as MCP tool error (no server crash)
- Update orchestrator tests (test_sub_agent_dispatch, test_subagent_claude_compat) to scrub deprecated env
- Add tests/test_default_resolver_no_legacy_global.py covering D-01..D-04 paths
- Update tests/test-subagent-runtime.sh umbrella

Reconstructed from worktree-agent-a5518713265b43649 (agent timed out at the commit step;
all work was complete in the working tree, 70/70 tests pass).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D-09: replace inline heredoc for opencode config with a file read from
/opt/cli-defaults/opencode.json (COPY'd into image). _spdx/_copyright
underscore-prefixed keys are stripped via python3 json.pop before
writing /tmp/opencode.json. Codex baseline now reads codex.json and
converts to TOML (empty model_providers -> empty TOML, same behavior
as previous `: > config.toml` path). OPENAI_BASE_URL custom-gateway
path and all *_CONFIG_EXTRA operator overrides preserved verbatim.

- COPY computer-use-server/cli-defaults/ /opt/cli-defaults/ added
- Inline opencode JSON heredoc (OCEOF tag) removed
- Codex case: elif reads /opt/cli-defaults/codex.json when no OPENAI_BASE_URL
- tests/test-docker-image.sh: new section [13/14] verifies file presence + JSON validity
- All sections renumbered to /14 (was /12 and /13 inconsistently)
Phase 1 tests asserted opencode/codex resolved to hardcoded defaults
(anthropic/claude-sonnet-4-6 / gpt-5-codex) when no env override.
Phase 2 D-02 explicitly drops those — opencode/codex now raise ValueError
with actionable message pointing operator at list-subagent-models and the
per-CLI env var.

Two test functions renamed to reflect new contract:
  test_opencode_hardcoded_default → test_opencode_no_default_raises
  test_codex_hardcoded_default    → test_codex_no_default_raises

Module docstring updated to reflect the Phase 2 contract.

8/8 tests pass. Found during wave-2 cross-merge integration check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove (claude/opencode/codex) enum from SKILL.md frontmatter description
- Replace Why-this-matters paragraph specific model examples with <provider>/<model-id> placeholders
- Replace Default-behavior section explicit list with list-subagent-models pointer
- Update Parameters table model row default from 'sonnet' to 'discover via list-subagent-models'
- Wrap load-bearing claude baseline alias mentions in <!-- canonical-example --> markers
- Replace model="opus" in usage.md Code Review template with placeholder
- Replace Model Selection table with CLI-agnostic guidance + canonical-example markers
- Update Before-You-Start model-selection bullet to reference list-subagent-models
- Create tests/test-skill-no-hardcoded-models.sh (D-16): greps skill files
  for forbidden hardcoded model tokens, excludes canonical-example markers
  and list-subagent-models references, exits non-zero on violations
- Wire audit as step [6/6] in tests/test-subagent-runtime.sh umbrella
- Renumber all umbrella steps from [N/5] to [N/6]
- Update umbrella header comment to document 6 sub-tests
Nick added 6 commits May 3, 2026 20:04
- codex: usage/token-cost extraction, empty event stream, error event in stream (documents rc-only detection), partial truncated JSON
- opencode: multi-event stream (last-seen-wins), empty stdout, malformed JSON between valid events, error-event message extraction (Phase 1 D-11 regression guard)
- claude: malformed JSON fallback, empty stdout rc=0, event with null result field
- total: 30 tests collected (19 existing + 11 new), all PASSED
…ict values

Replace json.dumps(cv) with _to_toml_value(cv) helper that produces valid
TOML inline-table syntax {k = v} for nested dicts instead of JSON object
notation {"k": "v"} which tomllib rejects. Adds regression test covering
the nested-dict provider config case and confirming the old code was broken.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 3, 2026

Warning

Rate limit exceeded

@Yambr has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 37 minutes and 3 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dd3d079d-74d8-406f-9dd2-e8edf8c53031

📥 Commits

Reviewing files that changed from the base of the PR and between c1e72d0 and 26d1a81.

📒 Files selected for processing (12)
  • Dockerfile
  • computer-use-server/docker_manager.py
  • computer-use-server/mcp_tools.py
  • tests/orchestrator/test_runtime_cli_endpoint.py
  • tests/test-docker-image.sh
  • tests/test-list-subagent-models.sh
  • tests/test-mcp-endpoint-live.sh
  • tests/test-project-structure.sh
  • tests/test-skill-no-hardcoded-models.sh
  • tests/test_codex_toml_converter.py
  • tests/test_default_resolver_no_legacy_global.py
  • tests/test_subagent_docstring.py
📝 Walkthrough

Walkthrough

This PR externalizes canonical CLI defaults to cli-defaults/, adds a host/container model-discovery tool, tightens per-CLI model resolution and env-var defaults, updates the Docker entrypoint to source those defaults, makes OpenCode error events set run-level errors, and adds comprehensive tests and docs for multi-CLI sub-agent behavior.

Changes

Multi-CLI Sub-Agent Runtime & Model Discovery Overhaul

Layer / File(s) Summary
Data Shape & Canonical Defaults
computer-use-server/cli-defaults/README.md, computer-use-server/cli-defaults/opencode.json, computer-use-server/cli-defaults/codex.json
Adds canonical per-CLI JSON defaults and documents SPDX metadata keys (_spdx, _copyright) that must be stripped by consumers.
Dockerfile Integration & Entrypoint
Dockerfile
Copies computer-use-server/cli-defaults/ into /opt/cli-defaults/, installs bin/list-subagent-models, and changes entrypoint runtime config generation: opencode now reads /opt/cli-defaults/opencode.json (strips _ keys) when OPENCODE_CONFIG_EXTRA is unset; codex reads /opt/cli-defaults/codex.json and converts JSON→TOML (or falls back to empty TOML).
Model Discovery CLI
computer-use-server/bin/list-subagent-models
New CLI providing host and --inside-container modes. Host mode reads *_CONFIG_EXTRA overrides or canonical defaults, exposes cli, models, default_model, aliases, source as JSON; inside-container mode queries installed sub-agent CLIs and parses subprocess output. Emits structured stderr JSON on errors.
Core Model Resolution
computer-use-server/cli_runtime.py
Removes global hardcoded default; introduces per-CLI behavior: Claude honors CLAUDE_SUB_AGENT_DEFAULT_MODEL and alias map fallback; Codex requires caller or CODEX_SUB_AGENT_DEFAULT_MODEL (no hardcoded default); Opencode starts with empty alias map and requires OPENCODE_MODEL_ALIASES/env-provided default for alias resolution, raising ValueError for bare aliases when unmapped.
Entrypoint / Docker Manager Env vars
computer-use-server/docker_manager.py
Removes SUB_AGENT_DEFAULT_MODEL; adds CLAUDE_SUB_AGENT_DEFAULT_MODEL (empty default) and supports per-CLI default env vars.
OpenCode Adapter Error Mapping
computer-use-server/cli_adapters/opencode.py
OpenCodeAdapter.parse_result now treats {"type":"error",...} events as run-level errors even when returncode==0, sets is_error=True, extracts/sets an opencode error: ... last message text, and skips normal final-text extraction for those events.
Sub-Agent MCP Tool & Docstrings
computer-use-server/mcp_tools.py
sub_agent now resolves the active CLI and calls resolve_subagent_model("", cli) when model is empty; per-CLI docstring variants added and assigned to sub_agent.__doc__ before registration. Legacy single default doc entries/imports removed.
Bash Wrapper & Skill Docs
skills/public/sub-agent/scripts/list_subagent_models.sh, skills/public/sub-agent/SKILL.md, skills/public/sub-agent/references/usage.md
Adds wrapper that verifies list-subagent-models availability and execs it. Updates skill docs to require running discovery, changes parameter docs to recommend discovery-first model selection, and removes hardcoded model guidance.
Testing: Adapters, Resolver, Integration
(many files) tests/orchestrator/test_cli_adapters.py, tests/orchestrator/test_cli_runtime.py, tests/orchestrator/*, tests/test-* etc.
Adds extensive tests: OpenCode error-event mapping, stricter opencode/codex resolution expectations, list-subagent-models unit/integration/bash tests, TOML converter tests, alias-map drop tests, docstring surface tests, and several new bash orchestration/umbrella runners and project-structure updates.
Dev Dependency
computer-use-server/requirements.txt
Adds pytest>=8.0.0 for the new Python test modules.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Caller as MCP Caller
    participant SubAgent as sub_agent MCP Tool
    participant Resolver as cli_runtime.resolve_subagent_model
    participant Adapter as cli_adapters (Opencode/Codex/Claude)
    participant CLI as Sub-Agent CLI (subprocess)
    
    Caller->>SubAgent: call sub_agent(task, model="")
    SubAgent->>SubAgent: resolve_cli() → SUBAGENT_CLI env
    SubAgent->>Resolver: resolve_subagent_model("", cli)
    alt caller model empty + no env default
        Resolver-->>SubAgent: ValueError (must use list-subagent-models)
    else env default or alias mapped
        Resolver-->>SubAgent: resolved model id
    end
    SubAgent->>CLI: dispatch(model, task, ...)
    CLI->>Adapter: stdout/stderr events
    alt Opencode: error event in stream
        Adapter->>Adapter: detect {"type":"error"} -> is_error=True, extract message
        Adapter-->>SubAgent: SubAgentResult(is_error=True, text="opencode error: ...")
    else normal completion
        Adapter-->>SubAgent: SubAgentResult(is_error=False, text=..., cost_usd=...)
    end
    SubAgent-->>Caller: {text, is_error, cost_usd, ...}
Loading
sequenceDiagram
    autonumber
    participant User as Operator/Developer
    participant Discovery as list-subagent-models CLI
    participant CliDefaults as /opt/cli-defaults/*.json
    participant EnvVar as Environment (OPENCODE_CONFIG_EXTRA, CODEX_CONFIG_EXTRA)
    participant Output as stdout JSON
    
    User->>Discovery: list-subagent-models [--inside-container]
    
    alt Claude mode
        Discovery-->>Output: {cli: "claude", models: [...], default_model, aliases: {...}}
    else Opencode host-mode
        Discovery->>EnvVar: read OPENCODE_CONFIG_EXTRA
        alt env override provided
            Discovery->>Discovery: parse JSON, extract model
        else no override
            Discovery->>CliDefaults: read opencode.json and strip _* keys
            Discovery->>Discovery: extract model
        end
        Discovery->>EnvVar: read OPENCODE_MODEL_ALIASES and merge
        Discovery-->>Output: {models: [model_id], default_model, aliases: {...}, source}
    else Codex host-mode
        Discovery->>EnvVar: read CODEX_CONFIG_EXTRA (TOML) or CliDefaults
        Discovery->>Discovery: convert JSON->TOML if needed, extract default_model
        Discovery-->>Output: {models: [model_id|null], default_model, source}
    else --inside-container + CLI present
        Discovery->>CLI: run opencode/codex subprocess
        Discovery->>Discovery: parse output, collect model IDs
        Discovery-->>Output: {models: [...], default_model, source: "cli_subprocess"}
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🐰 Hop hop, the defaults now dance by CLI,
No more Claude whispers when Codex walks by,
Error events sing their tune loud and clear,
Models are found — no more guessing here!

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sub-agent-multi-cli-cleanup

…e/cli + tests

CI surfaced 6 test failures from Phase 1/2 contract change:

1. /api/runtime/cli endpoint called resolve_subagent_model('', cli) without
   try/except. Phase 2 D-02 made opencode/codex raise ValueError when no
   default env set — endpoint crashed with 500 instead of returning a
   clean default_model: null. Fix: catch ValueError, return null so the
   Preview SPA badge can render 'no default — set <CLI>_SUB_AGENT_DEFAULT_MODEL'.

2. test_runtime_cli_endpoint.test_codex / test_opencode asserted
   default_model truthy. Updated to assertIsNone since no env is set in
   the test client.

3. test_sub_agent_dispatch.test_dispatch_routes_to_correct_adapter and
   test_cost_rendering_unavailable_for_none called sub_agent without a
   model arg; Phase 2 made resolve_subagent_model raise before
   fake_dispatch was reached. Fix: set per-CLI env in the test setup so
   resolution succeeds and dispatch is exercised.

Local sweep: 344 pytest passed, 6/6 umbrella, 1 pre-existing test_filter
failure (Open WebUI filter, untouched by Phase 1/2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
computer-use-server/cli_adapters/opencode.py (1)

107-148: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Error message can be silently overwritten by events arriving after the error event.

continue on Line 134 only exits the current loop iteration for the "error" event itself. If opencode emits further events after that error event (e.g., a "step-finish" with a non-empty text_field), Lines 137–148 will overwrite last_message_text, replacing the extracted error message while is_error stays True. The caller would receive is_error=True with a non-error text body — defeating the intent stated in the docstring comment.

🛡️ Proposed fix: track error state and guard text extraction
     events: list[dict] = []
     last_message_text = ""
     cost_usd: float | None = None
     is_error = (returncode != 0)
+    error_event_seen = False

     for line in stdout.strip().split("\n"):
         line = line.strip()
         if not line:
             continue
         try:
             event = json.loads(line)
         except json.JSONDecodeError:
             continue
         events.append(event)
         et = event.get("type", "")

         # Application-level error events (opencode exits rc=0 but emits these).
         # Must be checked first so the error branch wins over any text extraction
         # that might otherwise overwrite last_message_text with empty/garbage.
         if et == "error":
             is_error = True
+            error_event_seen = True
             err_data = event.get("data", {}) or {}
             if isinstance(err_data, dict):
                 err_msg = err_data.get("message") or str(err_data)
             else:
                 err_msg = str(err_data)
             last_message_text = f"opencode error: {err_msg}"
             continue

         # Capture the final assistant text (last seen wins).
-        if et in ("assistant-message-completed", "step-finish", "message-completed"):
+        if not error_event_seen and et in ("assistant-message-completed", "step-finish", "message-completed"):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@computer-use-server/cli_adapters/opencode.py` around lines 107 - 148, The
loop can overwrite the extracted error message because later events still update
last_message_text; to fix, after detecting an "error" event in the for-loop
(where you set is_error = True and last_message_text = ...), prevent subsequent
event handlers from clobbering that value by guarding the text-extraction branch
(the block that reads et in ("assistant-message-completed", "step-finish",
"message-completed") and computes text_field) with a check for the error state
(e.g., if is_error: continue or skip assignment) or by storing the error text in
a separate error_message variable and returning that when is_error is True;
refer to symbols last_message_text, is_error, et, and text_field to locate where
to add the guard.
skills/public/sub-agent/SKILL.md (1)

30-33: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make the warning runtime-neutral.

The file now documents claude, opencode, and codex, but this warning still says every sub_agent call spawns a Claude session. That is misleading for non-Claude runtimes.

Suggested fix
-WARNING: Each sub_agent call spawns a SEPARATE Claude CLI session consuming significant API resources. Use as a LAST RESORT.
+WARNING: Each sub_agent call spawns a SEPARATE sub-agent CLI session consuming significant API resources. Use as a LAST RESORT.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/public/sub-agent/SKILL.md` around lines 30 - 33, The warning in
SKILL.md incorrectly hardcodes "Claude" as the runtime started by each sub_agent
call; update the warning to be runtime-neutral by replacing references to
"Claude CLI session" with a generic phrase such as "a separate sub-agent session
in the configured runtime (e.g., claude, opencode, codex)", and ensure the
paragraph around the header "When to Use (ONLY complex CODE tasks requiring 10+
iterative tool calls)" and the term "sub_agent" reflect that the resource cost
depends on the chosen runtime rather than always being Claude.
🧹 Nitpick comments (3)
computer-use-server/cli_runtime.py (1)

103-112: 💤 Low value

Stale docstring reference to "Anthropic baseline".

The docstring mentions "Built-in keys remain as the Anthropic baseline unless overridden" (line 107), but _OPENCODE_ALIAS_MAP is now an empty dict (line 100). The baseline no longer exists, so this comment is misleading.

📝 Suggested docstring fix
 def _merge_opencode_alias_map() -> dict[str, str]:
     """Return _OPENCODE_ALIAS_MAP merged with operator overrides from
     OPENCODE_MODEL_ALIASES env (JSON object string).
 
-    Built-in keys remain as the Anthropic baseline unless overridden. Malformed
+    Aliases come exclusively from OPENCODE_MODEL_ALIASES env. Malformed
     JSON, non-dict payloads, or non-string values are logged and ignored — never
     silently half-merged. Read at call time so env changes take effect without
     restart. See D-07 (CONTEXT.md) and OPENCODE_MODEL_ALIASES note in
     docs/multi-cli.md (deferred docs phase).
     """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@computer-use-server/cli_runtime.py` around lines 103 - 112, Update the
docstring for _merge_opencode_alias_map to remove the stale "Anthropic baseline"
reference and instead describe current behavior given that _OPENCODE_ALIAS_MAP
may be empty: state that built-in keys from _OPENCODE_ALIAS_MAP are preserved
unless overridden by the OPENCODE_MODEL_ALIASES env JSON, and that malformed
JSON, non-dict payloads, or non-string values are logged and ignored; keep the
notes about reading at call time and references to CONTEXT.md/docs/multi-cli.md
intact.
tests/test_subagent_docstring.py (1)

167-167: 💤 Low value

Remove redundant import.

re is already imported at module level (line 12), so this inner import is unnecessary.

Proposed fix
-        import re
         # The legacy line was "- SUB_AGENT_DEFAULT_MODEL: Default model ..."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_subagent_docstring.py` at line 167, Remove the redundant inner
import of the "re" module in tests/test_subagent_docstring.py (the duplicate
import inside the test body); since "re" is already imported at module level,
delete the inner "import re" statement so the test uses the top-level import
instead.
tests/test_default_resolver_no_legacy_global.py (1)

81-84: 💤 Low value

Consider using cr.Cli instead of re-importing after fresh module load.

Each test calls _fresh_cli_runtime(monkeypatch) which returns the freshly imported module as cr, then immediately does from cli_runtime import Cli. Since the module is already loaded, you could use cr.Cli directly and avoid the redundant import statement.

Example
 def test_resolve_claude_empty_no_env_returns_sonnet(monkeypatch):
     """D-01: claude falls back to 'sonnet' when no caller model and no env."""
     _scrub_all_model_envs(monkeypatch)
     cr = _fresh_cli_runtime(monkeypatch)
-    from cli_runtime import Cli
-    model_id, display = cr.resolve_subagent_model("", Cli.CLAUDE)
+    model_id, display = cr.resolve_subagent_model("", cr.Cli.CLAUDE)

Also applies to: 96-99, 107-110, 121-127, 135-138, 145-148

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_default_resolver_no_legacy_global.py` around lines 81 - 84, The
test redundantly re-imports Cli after loading a fresh module; update each
occurrence (e.g. lines using "from cli_runtime import Cli") to reference the
freshly imported module's attribute (use cr.Cli) instead: in tests that call
_fresh_cli_runtime(monkeypatch) and receive cr, replace the direct import with
cr.Cli when invoking resolve_subagent_model and related calls (e.g.,
resolve_subagent_model("", cr.Cli.CLAUDE) or using cr.Cli where the imported Cli
was used) to avoid duplicate imports and ensure the test uses the fresh module
instance.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@computer-use-server/docker_manager.py`:
- Around line 71-78: The three per-CLI env constants
CLAUDE_SUB_AGENT_DEFAULT_MODEL, CODEX_SUB_AGENT_DEFAULT_MODEL, and
OPENCODE_SUB_AGENT_DEFAULT_MODEL in docker_manager.py are defined but unused;
either remove them to avoid dead code, or explicitly document they're present
solely for configuration/documentation and ensure the resolver in cli_runtime.py
uses these constants (import them) instead of calling os.getenv directly; pick
one approach and apply consistently so the symbols are either removed or
intentionally exported and referenced.

In `@computer-use-server/mcp_tools.py`:
- Around line 960-962: When resolving an implicit default model, the code
currently calls resolve_subagent_model("", cli) and drops the returned display
name, which causes the friendly name to be lost; update the branch where model
is falsy to capture both returned values (e.g., model_id and display_name) and
assign the resolver's display_name into the variable used by the result/response
banner (the same place other paths put the friendly name) so that implicit
defaults preserve the human-friendly name in completion banners and result
blobs; specifically, keep the tuple return from resolve_subagent_model and
propagate the display_name instead of discarding it.

In `@tests/test_codex_toml_converter.py`:
- Around line 68-70: The test shows _to_toml_value(None) returns the string
"null", which is not valid TOML; change _to_toml_value to explicitly guard
against None by detecting value is None and raising a clear exception (e.g.,
ValueError or TypeError) with a message indicating TOML has no null literal (or
alternatively update the function docstring to explicitly forbid None inputs),
and update any callers/tests to expect the exception; reference the
_to_toml_value function to locate and modify the logic that currently uses
json.dumps().

In `@tests/test_subagent_docstring.py`:
- Around line 152-155: Rename the ambiguous single-letter variable `l` used in
the list comprehension that builds `code_lines` to a clearer name like `line` to
satisfy static analysis (E741) and improve readability; update the list
comprehension expression "l for l in self.src.splitlines()" to "line for line in
self.src.splitlines()" and ensure any other occurrences in that test (such as
the filter "if ... in l" and the f-string capture) use the new `line` identifier
consistently.

In `@tests/test-list-subagent-models.sh`:
- Line 136: The trap on EXIT currently set to remove "$TMPDIR_HOSTS" is being
overwritten later by another trap for "$EMPTY_TMPDIR"; update the script to
ensure both temp dirs are cleaned by either (a) replacing both separate traps
with a single trap that removes "$TMPDIR_HOSTS" and "$EMPTY_TMPDIR", or (b)
create a cleanup function (e.g., cleanup) that runs rm -rf "$TMPDIR_HOSTS"
"$EMPTY_TMPDIR" and have the trap call that function, then remove the explicit
rm -rf "$EMPTY_TMPDIR" invocation so cleanup is only handled by the trap.

In `@tests/test-skill-no-hardcoded-models.sh`:
- Around line 38-40: The grep pipeline that computes hits is suppressing matches
by excluding 'list-subagent-models', causing false negatives; remove the
unnecessary exclusion from the pipeline so the command that sets hits (the grep
-nE "$FORBIDDEN_PATTERN" ... | grep -v 'canonical-example' | grep -v
'list-subagent-models' ...) no longer filters out 'list-subagent-models'—leave
the canonical-example exemption if intended but delete the grep -v
'list-subagent-models' segment so FORBIDDEN_PATTERN can surface real violations.

---

Outside diff comments:
In `@computer-use-server/cli_adapters/opencode.py`:
- Around line 107-148: The loop can overwrite the extracted error message
because later events still update last_message_text; to fix, after detecting an
"error" event in the for-loop (where you set is_error = True and
last_message_text = ...), prevent subsequent event handlers from clobbering that
value by guarding the text-extraction branch (the block that reads et in
("assistant-message-completed", "step-finish", "message-completed") and computes
text_field) with a check for the error state (e.g., if is_error: continue or
skip assignment) or by storing the error text in a separate error_message
variable and returning that when is_error is True; refer to symbols
last_message_text, is_error, et, and text_field to locate where to add the
guard.

In `@skills/public/sub-agent/SKILL.md`:
- Around line 30-33: The warning in SKILL.md incorrectly hardcodes "Claude" as
the runtime started by each sub_agent call; update the warning to be
runtime-neutral by replacing references to "Claude CLI session" with a generic
phrase such as "a separate sub-agent session in the configured runtime (e.g.,
claude, opencode, codex)", and ensure the paragraph around the header "When to
Use (ONLY complex CODE tasks requiring 10+ iterative tool calls)" and the term
"sub_agent" reflect that the resource cost depends on the chosen runtime rather
than always being Claude.

---

Nitpick comments:
In `@computer-use-server/cli_runtime.py`:
- Around line 103-112: Update the docstring for _merge_opencode_alias_map to
remove the stale "Anthropic baseline" reference and instead describe current
behavior given that _OPENCODE_ALIAS_MAP may be empty: state that built-in keys
from _OPENCODE_ALIAS_MAP are preserved unless overridden by the
OPENCODE_MODEL_ALIASES env JSON, and that malformed JSON, non-dict payloads, or
non-string values are logged and ignored; keep the notes about reading at call
time and references to CONTEXT.md/docs/multi-cli.md intact.

In `@tests/test_default_resolver_no_legacy_global.py`:
- Around line 81-84: The test redundantly re-imports Cli after loading a fresh
module; update each occurrence (e.g. lines using "from cli_runtime import Cli")
to reference the freshly imported module's attribute (use cr.Cli) instead: in
tests that call _fresh_cli_runtime(monkeypatch) and receive cr, replace the
direct import with cr.Cli when invoking resolve_subagent_model and related calls
(e.g., resolve_subagent_model("", cr.Cli.CLAUDE) or using cr.Cli where the
imported Cli was used) to avoid duplicate imports and ensure the test uses the
fresh module instance.

In `@tests/test_subagent_docstring.py`:
- Line 167: Remove the redundant inner import of the "re" module in
tests/test_subagent_docstring.py (the duplicate import inside the test body);
since "re" is already imported at module level, delete the inner "import re"
statement so the test uses the top-level import instead.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: af8e18cb-93bd-40df-b298-104f17e95cd2

📥 Commits

Reviewing files that changed from the base of the PR and between d29621e and 0067a19.

📒 Files selected for processing (29)
  • Dockerfile
  • computer-use-server/bin/list-subagent-models
  • computer-use-server/cli-defaults/README.md
  • computer-use-server/cli-defaults/codex.json
  • computer-use-server/cli-defaults/opencode.json
  • computer-use-server/cli_adapters/opencode.py
  • computer-use-server/cli_runtime.py
  • computer-use-server/docker_manager.py
  • computer-use-server/mcp_tools.py
  • computer-use-server/requirements.txt
  • skills/public/sub-agent/SKILL.md
  • skills/public/sub-agent/references/usage.md
  • skills/public/sub-agent/scripts/list_subagent_models.sh
  • tests/orchestrator/test_cli_adapters.py
  • tests/orchestrator/test_cli_runtime.py
  • tests/orchestrator/test_sub_agent_dispatch.py
  • tests/orchestrator/test_subagent_claude_compat.py
  • tests/test-default-model-resolution.py
  • tests/test-docker-image.sh
  • tests/test-list-subagent-models.sh
  • tests/test-opencode-error-mapping.py
  • tests/test-project-structure.sh
  • tests/test-skill-no-hardcoded-models.sh
  • tests/test-subagent-cli-surface.py
  • tests/test-subagent-runtime.sh
  • tests/test_codex_toml_converter.py
  • tests/test_default_resolver_no_legacy_global.py
  • tests/test_opencode_alias_map_drop.py
  • tests/test_subagent_docstring.py

Comment thread computer-use-server/docker_manager.py Outdated
Comment thread computer-use-server/mcp_tools.py Outdated
Comment thread tests/test_codex_toml_converter.py Outdated
Comment thread tests/test_subagent_docstring.py Outdated
Comment thread tests/test-list-subagent-models.sh Outdated
Comment thread tests/test-skill-no-hardcoded-models.sh Outdated
…thon deps)

CI Test job runs only bash scripts (no setup-python step). The Phase 1
umbrella tests/test-subagent-runtime.sh runs 4 pytest sub-tests, which
require mcp/cli_runtime imports — pip install never ran in this job, so
the umbrella crashed with ModuleNotFoundError.

Split:
- tests/test-project-structure.sh now invokes ONLY the bash sub-tests
  directly (test-skill-no-hardcoded-models.sh, test-list-subagent-models.sh)
- pytest sub-tests are exercised by the existing 'Pytest — orchestrator' CI
  job which installs Python deps from requirements.txt
- tests/test-subagent-runtime.sh stays as a local-dev convenience runner
  that exercises everything end-to-end when Python deps are available

Project-structure now reports 24/24 (was 23/13 before, with [13/13] failing
in CI due to missing deps).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/orchestrator/test_runtime_cli_endpoint.py`:
- Around line 72-90: Update the module-level docstring to reflect that for CLIs
"codex" and "opencode" the "default_model" field may be null (None) when the
corresponding env vars (CODEX_SUB_AGENT_DEFAULT_MODEL /
OPENCODE_SUB_AGENT_DEFAULT_MODEL) are not set; locate the docstring near the top
of this test module and change any statement claiming "default_model must be a
non-empty string" to state it can be a string or null/None for these CLIs (refer
to test_codex and test_opencode and the default_model assertions to guide the
wording).

In `@tests/test-project-structure.sh`:
- Line 155: The step counter in tests/test-project-structure.sh is incorrect:
update the echoed label in the line that currently prints "[12/13] No werf.yaml"
to reflect the new total of 14 checks by changing it to "[12/14] No werf.yaml"
so the numbered output stays consistent with the actual sequence.
- Around line 162-182: The test-suite added by invoking
test-list-subagent-models.sh from test-project-structure.sh pulls in Python
(calls to python3 inside tests/test-list-subagent-models.sh) which violates the
shell-only constraint of the "test" job; either ensure the "test" job in
.github/workflows/build.yml installs Python (add a setup-python step) or move
the bash call to the pytest-orchestrator job that already configures Python.
Locate the invocation of test-list-subagent-models.sh in
test-project-structure.sh and either (A) update the workflow job named "test" to
include a setup-python action so python3 is available for scripts that call
python3, or (B) remove the call from test-project-structure.sh and add it to the
pytest-orchestrator job so test-list-subagent-models.sh runs where Python deps
are guaranteed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 580bc98c-9421-47f4-b993-4ce80e6455af

📥 Commits

Reviewing files that changed from the base of the PR and between 0067a19 and c1e72d0.

📒 Files selected for processing (4)
  • computer-use-server/app.py
  • tests/orchestrator/test_runtime_cli_endpoint.py
  • tests/orchestrator/test_sub_agent_dispatch.py
  • tests/test-project-structure.sh

Comment thread tests/orchestrator/test_runtime_cli_endpoint.py
Comment thread tests/test-project-structure.sh Outdated
Comment thread tests/test-project-structure.sh Outdated
Comment on lines +162 to +182
# 13. Sub-agent runtime — shell-only sub-tests (Phase 1 + Phase 2)
# NB: pytest sub-tests live in tests/orchestrator/ and are run by the
# "Pytest — orchestrator" CI job which installs Python deps. The Test job
# (this script) runs ONLY bash sub-tests so it stays self-contained and
# doesn't require a Python toolchain. The full umbrella
# (tests/test-subagent-runtime.sh) is for local dev convenience only.
echo ""
echo "[13/14] Sub-agent skill audit (no hardcoded model names)"
if bash "$(dirname "$0")/test-skill-no-hardcoded-models.sh" >/tmp/skill-audit.log 2>&1; then
pass "test-skill-no-hardcoded-models.sh exits 0"
else
fail "test-skill-no-hardcoded-models.sh exits non-zero — see /tmp/skill-audit.log"
fi

echo ""
echo "[14/14] list-subagent-models script invocation"
if bash "$(dirname "$0")/test-list-subagent-models.sh" >/tmp/list-subagent.log 2>&1; then
pass "test-list-subagent-models.sh exits 0"
else
fail "test-list-subagent-models.sh exits non-zero — see /tmp/list-subagent.log"
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the test-list-subagent-models.sh file
find . -name "test-list-subagent-models.sh" -type f

Repository: Yambr/open-computer-use

Length of output: 104


🏁 Script executed:

# Also find the CI configuration
find . -name "*.yml" -o -name "*.yaml" | grep -E "(github|ci|action)" | head -20

Repository: Yambr/open-computer-use

Length of output: 377


🏁 Script executed:

# Check the test-project-structure.sh file around the mentioned lines
cat -n tests/test-project-structure.sh | sed -n '150,190p'

Repository: Yambr/open-computer-use

Length of output: 1762


🏁 Script executed:

# Examine the test-list-subagent-models.sh file
cat -n tests/test-list-subagent-models.sh

Repository: Yambr/open-computer-use

Length of output: 18611


🏁 Script executed:

# Check the build workflow to see what's configured for the project-structure job
cat .github/workflows/build.yml

Repository: Yambr/open-computer-use

Length of output: 8507


This test reintroduces a Python dependency into a shell-only job.

Lines 69, 81, 89, 106, 114, 142, 150, 171, 179, 206, 225, 233, 251, 259, 281, 289, 309, 317, 340, 348, 372, and 380 of tests/test-list-subagent-models.sh call python3 extensively for test assertions and validation. The test job in .github/workflows/build.yml does not include a setup-python step and should remain self-contained per the inline comment (lines 164–166 of test-project-structure.sh). Either add explicit Python setup to the test job or move test-list-subagent-models.sh execution to the pytest-orchestrator job, which already configures Python dependencies.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test-project-structure.sh` around lines 162 - 182, The test-suite added
by invoking test-list-subagent-models.sh from test-project-structure.sh pulls in
Python (calls to python3 inside tests/test-list-subagent-models.sh) which
violates the shell-only constraint of the "test" job; either ensure the "test"
job in .github/workflows/build.yml installs Python (add a setup-python step) or
move the bash call to the pytest-orchestrator job that already configures
Python. Locate the invocation of test-list-subagent-models.sh in
test-project-structure.sh and either (A) update the workflow job named "test" to
include a setup-python action so python3 is available for scripts that call
python3, or (B) remove the call from test-project-structure.sh and add it to the
pytest-orchestrator job so test-list-subagent-models.sh runs where Python deps
are guaranteed.

…overage

CodeRabbit findings (9 comments):

1. (major) docker_manager.py:71-78 — three per-CLI default-model env constants
   were unused dead code (cli_runtime.py reads env directly). REMOVED. Updated
   test_docker_manager_has_claude_sub_agent_default_model →
   test_docker_manager_has_no_per_cli_constants asserting absence.

2. (minor) mcp_tools.py:962 — implicit-default resolution discarded the
   resolver's display_name, so the completion banner showed full model id
   ('claude-sonnet-4-6') instead of the friendly alias ('sonnet'). Now
   captured as default_display_name and overrides model_display after dispatch.

3. (minor) test_codex_toml_converter.py + Dockerfile — _to_toml_value(None)
   used to silently produce 'null' (invalid TOML). Both reference helper and
   Dockerfile inline copy now raise ValueError explicitly. Test updated.

4. (minor) test_subagent_docstring.py:152 — renamed ambiguous variable l → line.

5. (minor) test-list-subagent-models.sh:136 — second trap on EXIT was
   overwriting the first, leaking TMPDIR_HOSTS. Now uses a unified
   cleanup_tmpdirs() function called by a single trap.

6. (minor) test-skill-no-hardcoded-models.sh:40 — removed the
   'list-subagent-models' grep -v exemption that was hiding real violations
   (FORBIDDEN_PATTERN doesn't match the discovery command anyway).

7. (minor) test_runtime_cli_endpoint.py:90 — module docstring still claimed
   default_model was a non-empty string; updated to reflect the Phase 2
   contract where opencode/codex return null when no env is set.

8. (minor) test-project-structure.sh:155 — fixed [12/13] → [12/14] step counter
   after splitting the umbrella into two bash subtests.

9. (major) test-project-structure.sh:182 — comment claimed shell-only but the
   sub-tests shell out to python3 (stdlib only — JSON validation). Updated
   the comment to be accurate (stdlib python3 is preinstalled on every
   ubuntu-latest runner; only third-party deps belong in the pytest job).

Smoke coverage additions for Phase 2 (per user 'проверь что все смоками покрыто'):

10. test-docker-image.sh — list-subagent-models is now actually invoked inside
    the image (claude path: assert valid JSON with cli=claude and >=3 models;
    opencode path: assert valid JSON from canonical cli-defaults fallback).
    Catches runtime errors that file-presence checks miss.

11. test-mcp-endpoint-live.sh — added /api/runtime/cli probe that asserts
    the response is 200 with contract-stable JSON (cli, default_model
    str|null, supports_cost bool). Phase 2 D-02 dropped opencode/codex
    hardcoded fallback; the endpoint must catch ValueError, not crash 500.

Local sweep: 152 pytest passed, 6/6 umbrella, 24/24 structure, 14/14 no-corp.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant