Multi-CLI sub-agent runtime cleanup (closes #90, #91)#92
Multi-CLI sub-agent runtime cleanup (closes #90, #91)#92
Conversation
- Creates computer-use-server/bin/list-subagent-models (Python, BUSL-1.1) - Dispatches per SUBAGENT_CLI: claude (static alias_map), opencode (opencode models), codex (codex debug models --bundled) - Outputs stable JSON to stdout on success; structured JSON error to stderr on failure - Exit codes: 0=success, 2=CLI missing/unknown, 3=parse/exec error - Includes whitelist validation (T-01-01) and subprocess list-form argv (T-01-02) - Truncates stderr to 200 chars to avoid info leakage (T-01-03) - 30s timeout on all subprocesses (T-01-04)
- 9 tests covering et == 'error' branch (5 failing pre-fix) - Tests verify: is_error promotion, message surfacing, null/missing data fallbacks - Tests verify existing success/rc paths are preserved (4 passing pre-fix) - REQ-MCP-03 acceptance criteria
…lve_opencode
- Add json + logging imports; add _LOG = logging.getLogger('cli_runtime')
- Add _merge_opencode_alias_map() after _OPENCODE_ALIAS_MAP: merges built-in
Anthropic alias baseline with operator overrides from OPENCODE_MODEL_ALIASES
env (JSON object string); read at call time so env changes take effect without
restart (D-07)
- Validate override: malformed JSON, non-dict payload, non-string entries are
each rejected with a logged warning; built-in map preserved on any error
- Update _resolve_opencode to call _merge_opencode_alias_map() at call time
instead of referencing module-level _OPENCODE_ALIAS_MAP directly
- Built-in _OPENCODE_ALIAS_MAP (Anthropic baseline) preserved unchanged
…loop
- Handles {"type":"error","data":{...}} events that opencode emits with rc=0
- Promotes is_error=True even when subprocess exit code is 0 (defense-in-depth)
- Surfaces data.message as 'opencode error: <msg>' instead of dumping raw NDJSON
- Falls back to str(data) when message key is missing or data is non-dict/null
- continue skips text-extraction branches for error events (error wins)
- Existing rc-based is_error initialization at line 110 preserved (D-12)
- codex adapter untouched (D-13)
- Fixes Bug #91: opencode errors no longer appear as green success banners
… skill - Create skills/public/sub-agent/scripts/list_subagent_models.sh - Thin Bash wrapper mirroring gitlab-explorer/check_gitlab_auth.sh pattern - MIT SPDX header per CLAUDE.md rule for skills/public/sub-agent/ - Fails fast with exit 127 if list-subagent-models is not on PATH - Execs canonical Python tool (single source of truth, no logic duplication) - Banner echo to stderr (stdout reserved for clean JSON pipe)
- Add '## Before Any sub_agent Call' as first H2 (mirrors gitlab-explorer pattern) - Update frontmatter description to mention model discovery script - Section includes: bash command block, 'This script will:' bullets, Why this matters paragraph with per-CLI model id syntax, default behavior per CLI table - References OPENCODE_MODEL_ALIASES, OPENCODE_SUB_AGENT_DEFAULT_MODEL, CODEX_SUB_AGENT_DEFAULT_MODEL - Satisfies REQ-MCP-05 / D-14 acceptance criteria
- COPY computer-use-server/bin/list-subagent-models to /usr/local/bin/ - chmod +x to make it executable inside the sandbox image - Placed alongside extract-text COPY+chmod pattern (lines 590-595) - computer-use-server/Dockerfile NOT modified (server image unchanged) - Satisfies REQ-MCP-04: list-subagent-models reachable via command -v in sandbox
…ng registration
- Add _SUBAGENT_DOC_CLAUDE, _SUBAGENT_DOC_OPENCODE, _SUBAGENT_DOC_CODEX constants
- Add _subagent_docstring_for_cli(cli) helper that selects variant by CLI
- Remove @mcp.tool() decorator from sub_agent; replace with explicit mcp.add_tool()
- Assign sub_agent.__doc__ before mcp.add_tool() so FastMCP captures correct text
- Route default-model fallback through resolve_subagent_model('', resolve_cli())
- Add tests/test_subagent_docstring.py and test_subagent_deprecation.py
- Add one-shot deprecation warning for SUB_AGENT_DEFAULT_MODEL when SUBAGENT_CLI is opencode/codex
Reconstructed from worktree-agent-a6239975e979b2349 commits 610fde9 + 48aef3a; original cherry-pick would have deleted wave-1 files (worktree was based on pre-wave-1 commit) so additive files were extracted surgically.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- computer-use-server/requirements.txt: add pytest>=8.0.0 - tests/test-subagent-cli-surface.py: per-CLI docstring assertions (8 tests) - tests/test-default-model-resolution.py: D-08/D-09 resolution order (8 tests) - tests/test-list-subagent-models.sh: script invocation per CLI (exit code + JSON)
…roject-structure.sh - tests/test-subagent-runtime.sh: umbrella runner for all 6 sub-agent test suites - tests/test-project-structure.sh: add [13/13] section invoking umbrella
…to test-docker-image.sh - Adds dedicated PATH presence check (which list-subagent-models) - Adds executability check (test -x) after the CLI tools loop - Mirrors Phase 1 Plan 01-05 Dockerfile install line verification
…ADME - Extract opencode canonical config verbatim from Dockerfile:484-500 heredoc - Add codex.json as structured JSON baseline (empty model_providers = public OpenAI defaults) - Both JSON files carry _spdx/_copyright keys per D-10 BUSL-1.1 convention - README documents the _spdx key convention and consumer strip requirement - Single source of truth for Plan 02-04 (Dockerfile refactor) and Plan 02-05 (list-subagent-models)
…as (D-05/D-06)
- _OPENCODE_ALIAS_MAP = {} — no Anthropic baseline (D-05)
- _resolve_opencode raises ValueError with actionable message when alias not
in map (D-06) and when no default env is configured (D-02)
- Provider/model strings containing '/' still pass through unchanged
- New tests/test_opencode_alias_map_drop.py: 6 tests covering D-05/D-06
- Updated test_cli_runtime.py: replaced 4 stale opencode tests with
D-05/D-06 behavior (raises on missing alias, raises on empty with no env)
…ine + raise on missing alias)
…inside-container flag
- Default mode reads OPENCODE_CONFIG_EXTRA env (JSON) or cli-defaults/opencode.json
- codex mode reads CODEX_CONFIG_EXTRA env (TOML via tomllib) or cli-defaults/codex.json
- claude path unchanged: static alias list, always succeeds
- Add --inside-container flag to preserve Phase 1 CLI-shelling behavior (D-12)
- Add OPENCODE_MODEL_ALIASES env merge into top-level aliases{} output (D-11)
- Add _strip_spdx() to remove _spdx/_copyright/_doc keys before consumption
- Add _resolve_defaults_path(): LIST_SUBAGENT_CLI_DEFAULTS_DIR > /opt > repo-local
- All error paths emit structured stderr JSON, exit 2 (parse error) or 3 (missing file)
- Extend tests with 13 sections (24 assertions) covering all new modes
…CLI explicit defaults D-01..D-04 from .planning/phases/02-sub-agent-runtime-cleanup/02-CONTEXT.md: - Remove SUB_AGENT_DEFAULT_MODEL global from docker_manager.py (no deprecation, removed) - Delete tests/test_subagent_deprecation.py (warning is gone) - claude default stays 'sonnet' when no caller model and CLAUDE_SUB_AGENT_DEFAULT_MODEL unset - opencode/codex raise ValueError with actionable message when no caller model and no <CLI>_SUB_AGENT_DEFAULT_MODEL env - mcp_tools.py call site handles new ValueError as MCP tool error (no server crash) - Update orchestrator tests (test_sub_agent_dispatch, test_subagent_claude_compat) to scrub deprecated env - Add tests/test_default_resolver_no_legacy_global.py covering D-01..D-04 paths - Update tests/test-subagent-runtime.sh umbrella Reconstructed from worktree-agent-a5518713265b43649 (agent timed out at the commit step; all work was complete in the working tree, 70/70 tests pass). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D-09: replace inline heredoc for opencode config with a file read from /opt/cli-defaults/opencode.json (COPY'd into image). _spdx/_copyright underscore-prefixed keys are stripped via python3 json.pop before writing /tmp/opencode.json. Codex baseline now reads codex.json and converts to TOML (empty model_providers -> empty TOML, same behavior as previous `: > config.toml` path). OPENAI_BASE_URL custom-gateway path and all *_CONFIG_EXTRA operator overrides preserved verbatim. - COPY computer-use-server/cli-defaults/ /opt/cli-defaults/ added - Inline opencode JSON heredoc (OCEOF tag) removed - Codex case: elif reads /opt/cli-defaults/codex.json when no OPENAI_BASE_URL - tests/test-docker-image.sh: new section [13/14] verifies file presence + JSON validity - All sections renumbered to /14 (was /12 and /13 inconsistently)
…li-defaults JSON)
Phase 1 tests asserted opencode/codex resolved to hardcoded defaults (anthropic/claude-sonnet-4-6 / gpt-5-codex) when no env override. Phase 2 D-02 explicitly drops those — opencode/codex now raise ValueError with actionable message pointing operator at list-subagent-models and the per-CLI env var. Two test functions renamed to reflect new contract: test_opencode_hardcoded_default → test_opencode_no_default_raises test_codex_hardcoded_default → test_codex_no_default_raises Module docstring updated to reflect the Phase 2 contract. 8/8 tests pass. Found during wave-2 cross-merge integration check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove (claude/opencode/codex) enum from SKILL.md frontmatter description - Replace Why-this-matters paragraph specific model examples with <provider>/<model-id> placeholders - Replace Default-behavior section explicit list with list-subagent-models pointer - Update Parameters table model row default from 'sonnet' to 'discover via list-subagent-models' - Wrap load-bearing claude baseline alias mentions in <!-- canonical-example --> markers - Replace model="opus" in usage.md Code Review template with placeholder - Replace Model Selection table with CLI-agnostic guidance + canonical-example markers - Update Before-You-Start model-selection bullet to reference list-subagent-models
- Create tests/test-skill-no-hardcoded-models.sh (D-16): greps skill files for forbidden hardcoded model tokens, excludes canonical-example markers and list-subagent-models references, exits non-zero on violations - Wire audit as step [6/6] in tests/test-subagent-runtime.sh umbrella - Renumber all umbrella steps from [N/5] to [N/6] - Update umbrella header comment to document 6 sub-tests
- codex: usage/token-cost extraction, empty event stream, error event in stream (documents rc-only detection), partial truncated JSON - opencode: multi-event stream (last-seen-wins), empty stdout, malformed JSON between valid events, error-event message extraction (Phase 1 D-11 regression guard) - claude: malformed JSON fallback, empty stdout rc=0, event with null result field - total: 30 tests collected (19 existing + 11 new), all PASSED
…ict values
Replace json.dumps(cv) with _to_toml_value(cv) helper that produces valid
TOML inline-table syntax {k = v} for nested dicts instead of JSON object
notation {"k": "v"} which tomllib rejects. Adds regression test covering
the nested-dict provider config case and confirming the old code was broken.
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (12)
📝 WalkthroughWalkthroughThis PR externalizes canonical CLI defaults to ChangesMulti-CLI Sub-Agent Runtime & Model Discovery Overhaul
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller as MCP Caller
participant SubAgent as sub_agent MCP Tool
participant Resolver as cli_runtime.resolve_subagent_model
participant Adapter as cli_adapters (Opencode/Codex/Claude)
participant CLI as Sub-Agent CLI (subprocess)
Caller->>SubAgent: call sub_agent(task, model="")
SubAgent->>SubAgent: resolve_cli() → SUBAGENT_CLI env
SubAgent->>Resolver: resolve_subagent_model("", cli)
alt caller model empty + no env default
Resolver-->>SubAgent: ValueError (must use list-subagent-models)
else env default or alias mapped
Resolver-->>SubAgent: resolved model id
end
SubAgent->>CLI: dispatch(model, task, ...)
CLI->>Adapter: stdout/stderr events
alt Opencode: error event in stream
Adapter->>Adapter: detect {"type":"error"} -> is_error=True, extract message
Adapter-->>SubAgent: SubAgentResult(is_error=True, text="opencode error: ...")
else normal completion
Adapter-->>SubAgent: SubAgentResult(is_error=False, text=..., cost_usd=...)
end
SubAgent-->>Caller: {text, is_error, cost_usd, ...}
sequenceDiagram
autonumber
participant User as Operator/Developer
participant Discovery as list-subagent-models CLI
participant CliDefaults as /opt/cli-defaults/*.json
participant EnvVar as Environment (OPENCODE_CONFIG_EXTRA, CODEX_CONFIG_EXTRA)
participant Output as stdout JSON
User->>Discovery: list-subagent-models [--inside-container]
alt Claude mode
Discovery-->>Output: {cli: "claude", models: [...], default_model, aliases: {...}}
else Opencode host-mode
Discovery->>EnvVar: read OPENCODE_CONFIG_EXTRA
alt env override provided
Discovery->>Discovery: parse JSON, extract model
else no override
Discovery->>CliDefaults: read opencode.json and strip _* keys
Discovery->>Discovery: extract model
end
Discovery->>EnvVar: read OPENCODE_MODEL_ALIASES and merge
Discovery-->>Output: {models: [model_id], default_model, aliases: {...}, source}
else Codex host-mode
Discovery->>EnvVar: read CODEX_CONFIG_EXTRA (TOML) or CliDefaults
Discovery->>Discovery: convert JSON->TOML if needed, extract default_model
Discovery-->>Output: {models: [model_id|null], default_model, source}
else --inside-container + CLI present
Discovery->>CLI: run opencode/codex subprocess
Discovery->>Discovery: parse output, collect model IDs
Discovery-->>Output: {models: [...], default_model, source: "cli_subprocess"}
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
✨ Finishing Touches🧪 Generate unit tests (beta)
|
…e/cli + tests
CI surfaced 6 test failures from Phase 1/2 contract change:
1. /api/runtime/cli endpoint called resolve_subagent_model('', cli) without
try/except. Phase 2 D-02 made opencode/codex raise ValueError when no
default env set — endpoint crashed with 500 instead of returning a
clean default_model: null. Fix: catch ValueError, return null so the
Preview SPA badge can render 'no default — set <CLI>_SUB_AGENT_DEFAULT_MODEL'.
2. test_runtime_cli_endpoint.test_codex / test_opencode asserted
default_model truthy. Updated to assertIsNone since no env is set in
the test client.
3. test_sub_agent_dispatch.test_dispatch_routes_to_correct_adapter and
test_cost_rendering_unavailable_for_none called sub_agent without a
model arg; Phase 2 made resolve_subagent_model raise before
fake_dispatch was reached. Fix: set per-CLI env in the test setup so
resolution succeeds and dispatch is exercised.
Local sweep: 344 pytest passed, 6/6 umbrella, 1 pre-existing test_filter
failure (Open WebUI filter, untouched by Phase 1/2).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
computer-use-server/cli_adapters/opencode.py (1)
107-148:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winError message can be silently overwritten by events arriving after the error event.
continueon Line 134 only exits the current loop iteration for the"error"event itself. If opencode emits further events after that error event (e.g., a"step-finish"with a non-emptytext_field), Lines 137–148 will overwritelast_message_text, replacing the extracted error message whileis_errorstaysTrue. The caller would receiveis_error=Truewith a non-error text body — defeating the intent stated in the docstring comment.🛡️ Proposed fix: track error state and guard text extraction
events: list[dict] = [] last_message_text = "" cost_usd: float | None = None is_error = (returncode != 0) + error_event_seen = False for line in stdout.strip().split("\n"): line = line.strip() if not line: continue try: event = json.loads(line) except json.JSONDecodeError: continue events.append(event) et = event.get("type", "") # Application-level error events (opencode exits rc=0 but emits these). # Must be checked first so the error branch wins over any text extraction # that might otherwise overwrite last_message_text with empty/garbage. if et == "error": is_error = True + error_event_seen = True err_data = event.get("data", {}) or {} if isinstance(err_data, dict): err_msg = err_data.get("message") or str(err_data) else: err_msg = str(err_data) last_message_text = f"opencode error: {err_msg}" continue # Capture the final assistant text (last seen wins). - if et in ("assistant-message-completed", "step-finish", "message-completed"): + if not error_event_seen and et in ("assistant-message-completed", "step-finish", "message-completed"):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@computer-use-server/cli_adapters/opencode.py` around lines 107 - 148, The loop can overwrite the extracted error message because later events still update last_message_text; to fix, after detecting an "error" event in the for-loop (where you set is_error = True and last_message_text = ...), prevent subsequent event handlers from clobbering that value by guarding the text-extraction branch (the block that reads et in ("assistant-message-completed", "step-finish", "message-completed") and computes text_field) with a check for the error state (e.g., if is_error: continue or skip assignment) or by storing the error text in a separate error_message variable and returning that when is_error is True; refer to symbols last_message_text, is_error, et, and text_field to locate where to add the guard.skills/public/sub-agent/SKILL.md (1)
30-33:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMake the warning runtime-neutral.
The file now documents
claude,opencode, andcodex, but this warning still says everysub_agentcall spawns a Claude session. That is misleading for non-Claude runtimes.Suggested fix
-WARNING: Each sub_agent call spawns a SEPARATE Claude CLI session consuming significant API resources. Use as a LAST RESORT. +WARNING: Each sub_agent call spawns a SEPARATE sub-agent CLI session consuming significant API resources. Use as a LAST RESORT.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/public/sub-agent/SKILL.md` around lines 30 - 33, The warning in SKILL.md incorrectly hardcodes "Claude" as the runtime started by each sub_agent call; update the warning to be runtime-neutral by replacing references to "Claude CLI session" with a generic phrase such as "a separate sub-agent session in the configured runtime (e.g., claude, opencode, codex)", and ensure the paragraph around the header "When to Use (ONLY complex CODE tasks requiring 10+ iterative tool calls)" and the term "sub_agent" reflect that the resource cost depends on the chosen runtime rather than always being Claude.
🧹 Nitpick comments (3)
computer-use-server/cli_runtime.py (1)
103-112: 💤 Low valueStale docstring reference to "Anthropic baseline".
The docstring mentions "Built-in keys remain as the Anthropic baseline unless overridden" (line 107), but
_OPENCODE_ALIAS_MAPis now an empty dict (line 100). The baseline no longer exists, so this comment is misleading.📝 Suggested docstring fix
def _merge_opencode_alias_map() -> dict[str, str]: """Return _OPENCODE_ALIAS_MAP merged with operator overrides from OPENCODE_MODEL_ALIASES env (JSON object string). - Built-in keys remain as the Anthropic baseline unless overridden. Malformed + Aliases come exclusively from OPENCODE_MODEL_ALIASES env. Malformed JSON, non-dict payloads, or non-string values are logged and ignored — never silently half-merged. Read at call time so env changes take effect without restart. See D-07 (CONTEXT.md) and OPENCODE_MODEL_ALIASES note in docs/multi-cli.md (deferred docs phase). """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@computer-use-server/cli_runtime.py` around lines 103 - 112, Update the docstring for _merge_opencode_alias_map to remove the stale "Anthropic baseline" reference and instead describe current behavior given that _OPENCODE_ALIAS_MAP may be empty: state that built-in keys from _OPENCODE_ALIAS_MAP are preserved unless overridden by the OPENCODE_MODEL_ALIASES env JSON, and that malformed JSON, non-dict payloads, or non-string values are logged and ignored; keep the notes about reading at call time and references to CONTEXT.md/docs/multi-cli.md intact.tests/test_subagent_docstring.py (1)
167-167: 💤 Low valueRemove redundant import.
reis already imported at module level (line 12), so this inner import is unnecessary.Proposed fix
- import re # The legacy line was "- SUB_AGENT_DEFAULT_MODEL: Default model ..."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_subagent_docstring.py` at line 167, Remove the redundant inner import of the "re" module in tests/test_subagent_docstring.py (the duplicate import inside the test body); since "re" is already imported at module level, delete the inner "import re" statement so the test uses the top-level import instead.tests/test_default_resolver_no_legacy_global.py (1)
81-84: 💤 Low valueConsider using
cr.Cliinstead of re-importing after fresh module load.Each test calls
_fresh_cli_runtime(monkeypatch)which returns the freshly imported module ascr, then immediately doesfrom cli_runtime import Cli. Since the module is already loaded, you could usecr.Clidirectly and avoid the redundant import statement.Example
def test_resolve_claude_empty_no_env_returns_sonnet(monkeypatch): """D-01: claude falls back to 'sonnet' when no caller model and no env.""" _scrub_all_model_envs(monkeypatch) cr = _fresh_cli_runtime(monkeypatch) - from cli_runtime import Cli - model_id, display = cr.resolve_subagent_model("", Cli.CLAUDE) + model_id, display = cr.resolve_subagent_model("", cr.Cli.CLAUDE)Also applies to: 96-99, 107-110, 121-127, 135-138, 145-148
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_default_resolver_no_legacy_global.py` around lines 81 - 84, The test redundantly re-imports Cli after loading a fresh module; update each occurrence (e.g. lines using "from cli_runtime import Cli") to reference the freshly imported module's attribute (use cr.Cli) instead: in tests that call _fresh_cli_runtime(monkeypatch) and receive cr, replace the direct import with cr.Cli when invoking resolve_subagent_model and related calls (e.g., resolve_subagent_model("", cr.Cli.CLAUDE) or using cr.Cli where the imported Cli was used) to avoid duplicate imports and ensure the test uses the fresh module instance.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@computer-use-server/docker_manager.py`:
- Around line 71-78: The three per-CLI env constants
CLAUDE_SUB_AGENT_DEFAULT_MODEL, CODEX_SUB_AGENT_DEFAULT_MODEL, and
OPENCODE_SUB_AGENT_DEFAULT_MODEL in docker_manager.py are defined but unused;
either remove them to avoid dead code, or explicitly document they're present
solely for configuration/documentation and ensure the resolver in cli_runtime.py
uses these constants (import them) instead of calling os.getenv directly; pick
one approach and apply consistently so the symbols are either removed or
intentionally exported and referenced.
In `@computer-use-server/mcp_tools.py`:
- Around line 960-962: When resolving an implicit default model, the code
currently calls resolve_subagent_model("", cli) and drops the returned display
name, which causes the friendly name to be lost; update the branch where model
is falsy to capture both returned values (e.g., model_id and display_name) and
assign the resolver's display_name into the variable used by the result/response
banner (the same place other paths put the friendly name) so that implicit
defaults preserve the human-friendly name in completion banners and result
blobs; specifically, keep the tuple return from resolve_subagent_model and
propagate the display_name instead of discarding it.
In `@tests/test_codex_toml_converter.py`:
- Around line 68-70: The test shows _to_toml_value(None) returns the string
"null", which is not valid TOML; change _to_toml_value to explicitly guard
against None by detecting value is None and raising a clear exception (e.g.,
ValueError or TypeError) with a message indicating TOML has no null literal (or
alternatively update the function docstring to explicitly forbid None inputs),
and update any callers/tests to expect the exception; reference the
_to_toml_value function to locate and modify the logic that currently uses
json.dumps().
In `@tests/test_subagent_docstring.py`:
- Around line 152-155: Rename the ambiguous single-letter variable `l` used in
the list comprehension that builds `code_lines` to a clearer name like `line` to
satisfy static analysis (E741) and improve readability; update the list
comprehension expression "l for l in self.src.splitlines()" to "line for line in
self.src.splitlines()" and ensure any other occurrences in that test (such as
the filter "if ... in l" and the f-string capture) use the new `line` identifier
consistently.
In `@tests/test-list-subagent-models.sh`:
- Line 136: The trap on EXIT currently set to remove "$TMPDIR_HOSTS" is being
overwritten later by another trap for "$EMPTY_TMPDIR"; update the script to
ensure both temp dirs are cleaned by either (a) replacing both separate traps
with a single trap that removes "$TMPDIR_HOSTS" and "$EMPTY_TMPDIR", or (b)
create a cleanup function (e.g., cleanup) that runs rm -rf "$TMPDIR_HOSTS"
"$EMPTY_TMPDIR" and have the trap call that function, then remove the explicit
rm -rf "$EMPTY_TMPDIR" invocation so cleanup is only handled by the trap.
In `@tests/test-skill-no-hardcoded-models.sh`:
- Around line 38-40: The grep pipeline that computes hits is suppressing matches
by excluding 'list-subagent-models', causing false negatives; remove the
unnecessary exclusion from the pipeline so the command that sets hits (the grep
-nE "$FORBIDDEN_PATTERN" ... | grep -v 'canonical-example' | grep -v
'list-subagent-models' ...) no longer filters out 'list-subagent-models'—leave
the canonical-example exemption if intended but delete the grep -v
'list-subagent-models' segment so FORBIDDEN_PATTERN can surface real violations.
---
Outside diff comments:
In `@computer-use-server/cli_adapters/opencode.py`:
- Around line 107-148: The loop can overwrite the extracted error message
because later events still update last_message_text; to fix, after detecting an
"error" event in the for-loop (where you set is_error = True and
last_message_text = ...), prevent subsequent event handlers from clobbering that
value by guarding the text-extraction branch (the block that reads et in
("assistant-message-completed", "step-finish", "message-completed") and computes
text_field) with a check for the error state (e.g., if is_error: continue or
skip assignment) or by storing the error text in a separate error_message
variable and returning that when is_error is True; refer to symbols
last_message_text, is_error, et, and text_field to locate where to add the
guard.
In `@skills/public/sub-agent/SKILL.md`:
- Around line 30-33: The warning in SKILL.md incorrectly hardcodes "Claude" as
the runtime started by each sub_agent call; update the warning to be
runtime-neutral by replacing references to "Claude CLI session" with a generic
phrase such as "a separate sub-agent session in the configured runtime (e.g.,
claude, opencode, codex)", and ensure the paragraph around the header "When to
Use (ONLY complex CODE tasks requiring 10+ iterative tool calls)" and the term
"sub_agent" reflect that the resource cost depends on the chosen runtime rather
than always being Claude.
---
Nitpick comments:
In `@computer-use-server/cli_runtime.py`:
- Around line 103-112: Update the docstring for _merge_opencode_alias_map to
remove the stale "Anthropic baseline" reference and instead describe current
behavior given that _OPENCODE_ALIAS_MAP may be empty: state that built-in keys
from _OPENCODE_ALIAS_MAP are preserved unless overridden by the
OPENCODE_MODEL_ALIASES env JSON, and that malformed JSON, non-dict payloads, or
non-string values are logged and ignored; keep the notes about reading at call
time and references to CONTEXT.md/docs/multi-cli.md intact.
In `@tests/test_default_resolver_no_legacy_global.py`:
- Around line 81-84: The test redundantly re-imports Cli after loading a fresh
module; update each occurrence (e.g. lines using "from cli_runtime import Cli")
to reference the freshly imported module's attribute (use cr.Cli) instead: in
tests that call _fresh_cli_runtime(monkeypatch) and receive cr, replace the
direct import with cr.Cli when invoking resolve_subagent_model and related calls
(e.g., resolve_subagent_model("", cr.Cli.CLAUDE) or using cr.Cli where the
imported Cli was used) to avoid duplicate imports and ensure the test uses the
fresh module instance.
In `@tests/test_subagent_docstring.py`:
- Line 167: Remove the redundant inner import of the "re" module in
tests/test_subagent_docstring.py (the duplicate import inside the test body);
since "re" is already imported at module level, delete the inner "import re"
statement so the test uses the top-level import instead.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: af8e18cb-93bd-40df-b298-104f17e95cd2
📒 Files selected for processing (29)
Dockerfilecomputer-use-server/bin/list-subagent-modelscomputer-use-server/cli-defaults/README.mdcomputer-use-server/cli-defaults/codex.jsoncomputer-use-server/cli-defaults/opencode.jsoncomputer-use-server/cli_adapters/opencode.pycomputer-use-server/cli_runtime.pycomputer-use-server/docker_manager.pycomputer-use-server/mcp_tools.pycomputer-use-server/requirements.txtskills/public/sub-agent/SKILL.mdskills/public/sub-agent/references/usage.mdskills/public/sub-agent/scripts/list_subagent_models.shtests/orchestrator/test_cli_adapters.pytests/orchestrator/test_cli_runtime.pytests/orchestrator/test_sub_agent_dispatch.pytests/orchestrator/test_subagent_claude_compat.pytests/test-default-model-resolution.pytests/test-docker-image.shtests/test-list-subagent-models.shtests/test-opencode-error-mapping.pytests/test-project-structure.shtests/test-skill-no-hardcoded-models.shtests/test-subagent-cli-surface.pytests/test-subagent-runtime.shtests/test_codex_toml_converter.pytests/test_default_resolver_no_legacy_global.pytests/test_opencode_alias_map_drop.pytests/test_subagent_docstring.py
…thon deps) CI Test job runs only bash scripts (no setup-python step). The Phase 1 umbrella tests/test-subagent-runtime.sh runs 4 pytest sub-tests, which require mcp/cli_runtime imports — pip install never ran in this job, so the umbrella crashed with ModuleNotFoundError. Split: - tests/test-project-structure.sh now invokes ONLY the bash sub-tests directly (test-skill-no-hardcoded-models.sh, test-list-subagent-models.sh) - pytest sub-tests are exercised by the existing 'Pytest — orchestrator' CI job which installs Python deps from requirements.txt - tests/test-subagent-runtime.sh stays as a local-dev convenience runner that exercises everything end-to-end when Python deps are available Project-structure now reports 24/24 (was 23/13 before, with [13/13] failing in CI due to missing deps). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/orchestrator/test_runtime_cli_endpoint.py`:
- Around line 72-90: Update the module-level docstring to reflect that for CLIs
"codex" and "opencode" the "default_model" field may be null (None) when the
corresponding env vars (CODEX_SUB_AGENT_DEFAULT_MODEL /
OPENCODE_SUB_AGENT_DEFAULT_MODEL) are not set; locate the docstring near the top
of this test module and change any statement claiming "default_model must be a
non-empty string" to state it can be a string or null/None for these CLIs (refer
to test_codex and test_opencode and the default_model assertions to guide the
wording).
In `@tests/test-project-structure.sh`:
- Line 155: The step counter in tests/test-project-structure.sh is incorrect:
update the echoed label in the line that currently prints "[12/13] No werf.yaml"
to reflect the new total of 14 checks by changing it to "[12/14] No werf.yaml"
so the numbered output stays consistent with the actual sequence.
- Around line 162-182: The test-suite added by invoking
test-list-subagent-models.sh from test-project-structure.sh pulls in Python
(calls to python3 inside tests/test-list-subagent-models.sh) which violates the
shell-only constraint of the "test" job; either ensure the "test" job in
.github/workflows/build.yml installs Python (add a setup-python step) or move
the bash call to the pytest-orchestrator job that already configures Python.
Locate the invocation of test-list-subagent-models.sh in
test-project-structure.sh and either (A) update the workflow job named "test" to
include a setup-python action so python3 is available for scripts that call
python3, or (B) remove the call from test-project-structure.sh and add it to the
pytest-orchestrator job so test-list-subagent-models.sh runs where Python deps
are guaranteed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 580bc98c-9421-47f4-b993-4ce80e6455af
📒 Files selected for processing (4)
computer-use-server/app.pytests/orchestrator/test_runtime_cli_endpoint.pytests/orchestrator/test_sub_agent_dispatch.pytests/test-project-structure.sh
| # 13. Sub-agent runtime — shell-only sub-tests (Phase 1 + Phase 2) | ||
| # NB: pytest sub-tests live in tests/orchestrator/ and are run by the | ||
| # "Pytest — orchestrator" CI job which installs Python deps. The Test job | ||
| # (this script) runs ONLY bash sub-tests so it stays self-contained and | ||
| # doesn't require a Python toolchain. The full umbrella | ||
| # (tests/test-subagent-runtime.sh) is for local dev convenience only. | ||
| echo "" | ||
| echo "[13/14] Sub-agent skill audit (no hardcoded model names)" | ||
| if bash "$(dirname "$0")/test-skill-no-hardcoded-models.sh" >/tmp/skill-audit.log 2>&1; then | ||
| pass "test-skill-no-hardcoded-models.sh exits 0" | ||
| else | ||
| fail "test-skill-no-hardcoded-models.sh exits non-zero — see /tmp/skill-audit.log" | ||
| fi | ||
|
|
||
| echo "" | ||
| echo "[14/14] list-subagent-models script invocation" | ||
| if bash "$(dirname "$0")/test-list-subagent-models.sh" >/tmp/list-subagent.log 2>&1; then | ||
| pass "test-list-subagent-models.sh exits 0" | ||
| else | ||
| fail "test-list-subagent-models.sh exits non-zero — see /tmp/list-subagent.log" | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the test-list-subagent-models.sh file
find . -name "test-list-subagent-models.sh" -type fRepository: Yambr/open-computer-use
Length of output: 104
🏁 Script executed:
# Also find the CI configuration
find . -name "*.yml" -o -name "*.yaml" | grep -E "(github|ci|action)" | head -20Repository: Yambr/open-computer-use
Length of output: 377
🏁 Script executed:
# Check the test-project-structure.sh file around the mentioned lines
cat -n tests/test-project-structure.sh | sed -n '150,190p'Repository: Yambr/open-computer-use
Length of output: 1762
🏁 Script executed:
# Examine the test-list-subagent-models.sh file
cat -n tests/test-list-subagent-models.shRepository: Yambr/open-computer-use
Length of output: 18611
🏁 Script executed:
# Check the build workflow to see what's configured for the project-structure job
cat .github/workflows/build.ymlRepository: Yambr/open-computer-use
Length of output: 8507
This test reintroduces a Python dependency into a shell-only job.
Lines 69, 81, 89, 106, 114, 142, 150, 171, 179, 206, 225, 233, 251, 259, 281, 289, 309, 317, 340, 348, 372, and 380 of tests/test-list-subagent-models.sh call python3 extensively for test assertions and validation. The test job in .github/workflows/build.yml does not include a setup-python step and should remain self-contained per the inline comment (lines 164–166 of test-project-structure.sh). Either add explicit Python setup to the test job or move test-list-subagent-models.sh execution to the pytest-orchestrator job, which already configures Python dependencies.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test-project-structure.sh` around lines 162 - 182, The test-suite added
by invoking test-list-subagent-models.sh from test-project-structure.sh pulls in
Python (calls to python3 inside tests/test-list-subagent-models.sh) which
violates the shell-only constraint of the "test" job; either ensure the "test"
job in .github/workflows/build.yml installs Python (add a setup-python step) or
move the bash call to the pytest-orchestrator job that already configures
Python. Locate the invocation of test-list-subagent-models.sh in
test-project-structure.sh and either (A) update the workflow job named "test" to
include a setup-python action so python3 is available for scripts that call
python3, or (B) remove the call from test-project-structure.sh and add it to the
pytest-orchestrator job so test-list-subagent-models.sh runs where Python deps
are guaranteed.
…overage
CodeRabbit findings (9 comments):
1. (major) docker_manager.py:71-78 — three per-CLI default-model env constants
were unused dead code (cli_runtime.py reads env directly). REMOVED. Updated
test_docker_manager_has_claude_sub_agent_default_model →
test_docker_manager_has_no_per_cli_constants asserting absence.
2. (minor) mcp_tools.py:962 — implicit-default resolution discarded the
resolver's display_name, so the completion banner showed full model id
('claude-sonnet-4-6') instead of the friendly alias ('sonnet'). Now
captured as default_display_name and overrides model_display after dispatch.
3. (minor) test_codex_toml_converter.py + Dockerfile — _to_toml_value(None)
used to silently produce 'null' (invalid TOML). Both reference helper and
Dockerfile inline copy now raise ValueError explicitly. Test updated.
4. (minor) test_subagent_docstring.py:152 — renamed ambiguous variable l → line.
5. (minor) test-list-subagent-models.sh:136 — second trap on EXIT was
overwriting the first, leaking TMPDIR_HOSTS. Now uses a unified
cleanup_tmpdirs() function called by a single trap.
6. (minor) test-skill-no-hardcoded-models.sh:40 — removed the
'list-subagent-models' grep -v exemption that was hiding real violations
(FORBIDDEN_PATTERN doesn't match the discovery command anyway).
7. (minor) test_runtime_cli_endpoint.py:90 — module docstring still claimed
default_model was a non-empty string; updated to reflect the Phase 2
contract where opencode/codex return null when no env is set.
8. (minor) test-project-structure.sh:155 — fixed [12/13] → [12/14] step counter
after splitting the umbrella into two bash subtests.
9. (major) test-project-structure.sh:182 — comment claimed shell-only but the
sub-tests shell out to python3 (stdlib only — JSON validation). Updated
the comment to be accurate (stdlib python3 is preinstalled on every
ubuntu-latest runner; only third-party deps belong in the pytest job).
Smoke coverage additions for Phase 2 (per user 'проверь что все смоками покрыто'):
10. test-docker-image.sh — list-subagent-models is now actually invoked inside
the image (claude path: assert valid JSON with cli=claude and >=3 models;
opencode path: assert valid JSON from canonical cli-defaults fallback).
Catches runtime errors that file-presence checks miss.
11. test-mcp-endpoint-live.sh — added /api/runtime/cli probe that asserts
the response is 200 with contract-stable JSON (cli, default_model
str|null, supports_cost bool). Phase 2 D-02 dropped opencode/codex
hardcoded fallback; the endpoint must catch ValueError, not crash 500.
Local sweep: 152 pytest passed, 6/6 umbrella, 24/24 structure, 14/14 no-corp.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
sub_agentMCP tool no longer leaks Claude model aliases (sonnet/opus/haiku) whenSUBAGENT_CLI=opencode|codex. Per-CLI docstring assigned at registration time. Per-CLI default model envs (CLAUDE_/OPENCODE_/CODEX_SUB_AGENT_DEFAULT_MODEL); legacy global removed.event.type=="error"events (rc=0 application errors) and surfacesdata.messageas a structured error instead of dumping raw NDJSON into a green success banner.list-subagent-models— host-friendly model discovery (no sandbox container required). ReadsOPENCODE_CONFIG_EXTRA/CODEX_CONFIG_EXTRAenv if set, else canonicalcomputer-use-server/cli-defaults/{opencode,codex}.json. claude returns static aliases (no native CLI subcommand exists).--inside-containerflag preserves CLI-shelling as escape hatch.sub-agentskill reworked (mirrorsgitlab-explorerpattern): mandatorybash scripts/list_subagent_models.shfirst-step. Hardcoded model identifiers stripped fromSKILL.mdandreferences/usage.md; load-bearing examples wrapped in<!-- canonical-example -->markers. CI audit scripttests/test-skill-no-hardcoded-models.shenforces._OPENCODE_ALIAS_MAP(now{}by default; aliases come exclusively fromOPENCODE_MODEL_ALIASESenv). opencode/codex no longer have hardcoded defaults — caller passesmodelor sets<CLI>_SUB_AGENT_DEFAULT_MODELenv, otherwise raisesValueErrorwith actionable message pointing tolist-subagent-models. claude default stayssonnet.computer-use-server/cli-defaults/{opencode,codex}.jsonnow feeds BOTH the sandbox Dockerfile entrypoint (when*_CONFIG_EXTRAenv unset) ANDlist-subagent-models. No more duplicated provider/model lists.tests/orchestrator/test_cli_adapters.pyextended with 11 gap-closure tests across opencode/codex/claude (multi-event streams, empty stdout, malformed JSON, error events, usage extraction). Total: 30 tests.sub_agent).Test plan
python3 -m pytest tests/orchestrator/test_cli_adapters.py -v— expect 30 passedpython3 -m pytest tests/test-default-model-resolution.py tests/test_default_resolver_no_legacy_global.py tests/test_subagent_docstring.py tests/test-opencode-error-mapping.py tests/test_opencode_alias_map_drop.py tests/test_codex_toml_converter.py -v— full sub-agent surface, expect ~67 passedbash tests/test-subagent-runtime.sh— umbrella, expect 6/6bash tests/test-no-corporate.sh— expect 14/14bash tests/test-project-structure.sh— expect 23/23docker build --platform linux/amd64 -t open-computer-use:latest . && ./tests/test-docker-image.sh— verify Dockerfile refactorSUBAGENT_CLI=opencodeand noOPENCODE_SUB_AGENT_DEFAULT_MODELenv, callingsub_agentwithout amodelarg surfaces the structured error (not an Anthropic-default silent fallback)data.messagetextlist-subagent-modelsruns on host (orchestrator side, no sandbox container required) and prints the canonical opencode default🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Behavior Changes
Documentation
Tests