⚠️ CORRECTION (2026-06-27, see latest comment): A runtime frame-capture + RUST_LOG=debug sidecar-log trace contradicts the stale-response / read-loop-death root cause described below. The sidecar stays alive the whole hang (doing TLS to the LLM endpoint); the in-VM agent's ext turn simply never completes — the sidecar itself even warns "ext request still pending — possible stall before response frame". Treat the analysis below as a superseded hypothesis; the accurate symptom is in the latest comment.
claude agent: every tool routed through the ext extension runtime (Write / Bash / Skill / sub-agent) hangs with timed out waiting for sidecar protocol frame for ext
Summary
With a claude session, the agent boots, authenticates, and can use the Read tool and reply normally. But any tool that goes through the ext extension runtime — the Write tool, Bash tool, the Skill tool, or spawning a sub-agent (Task) — hangs ~120s and fails with:
timed out waiting for sidecar protocol frame for ext
stderr:
… level=info message="sidecar process started"
… level=info message="vm phase" phase=create_vm elapsed_ms=42
(no further frames; the ext request never returns an `ext_result`)
So a real coding-agent run can't complete (it can't write files, run shell commands, use skills, or fan out). Read-only / trivial sessions work fine, which is why this isn't obvious until the agent first reaches for a write/exec tool.
Confirmed NOT a usage error
- Reproduces with the canonical setup from
examples/docs/agents/claude/: default HOME=/home/agentos, user agentos, createSession("claude", {cwd, env}), no overrides. (Initially suspected my HOME=/root override — ruled out; same failure with the documented home.)
- Read tool + trivial prompts succeed in the same session config, so auth, VM, and the SDK bridge are healthy. The failure is specific to the
ext extension request path.
Reproduces on Node 22 and 25
Identical failure under fresh installs on Node v22.22.3 and v25.9.0 (macOS 15.7.4, arm64). Not a Node-version/ABI artifact.
Reproduction
npm i @rivet-dev/agentos-core @agentos-software/common @agentos-software/claude-code
export ANTHROPIC_API_KEY=sk-...
node repro.mjs
// repro.mjs
import { AgentOs } from "@rivet-dev/agentos-core";
import common from "@agentos-software/common";
import claude from "@agentos-software/claude-code";
const race = (p, ms) => Promise.race([p, new Promise((_, r) => setTimeout(() => r(new Error("TIMEOUT")), ms))]);
const vm = await AgentOs.create({ software: [common, claude] });
const { sessionId } = await vm.createSession("claude", {
cwd: "/home/agentos",
env: { ANTHROPIC_API_KEY: process.env.ANTHROPIC_API_KEY },
});
// WORKS: Read-only / trivial → returns fine.
console.log(await vm.prompt(sessionId, "Reply with exactly: OK"));
// HANGS ~120s → `timed out waiting for sidecar protocol frame for ext`
try {
console.log(await race(
vm.prompt(sessionId, "Write HELLO to /home/agentos/out.txt with your file tool, then reply DONE."),
150000));
} catch (e) { console.error("FAILED:", e.message); }
await vm.dispose();
What works vs fails
Works (in-process, no ext) |
Fails — ext request times out (~120s) |
AgentOs.create(), vm.exec() (direct) |
agent Write tool |
| auth + trivial agent reply |
agent Bash tool |
| agent Read tool |
Skill tool (using a discovered skill) |
|
sub-agent (Task) spawn |
Root cause (traced end-to-end)
This is a shared-sidecar lifecycle bug in secure-exec-sidecar 0.3.1 that was already fixed in 0.3.2 (secure-exec PR #133 / tag v0.3.2, commit d8a4435) — but 0.3.1 is what @agentos-software/claude-code@0.2.0 drags onto the tool path via its exact pin on @rivet-dev/agentos-core@0.2.0.
Mechanism:
- Write/Bash/Skill/Task each provision a VM on the shared secure-exec sidecar and dispose it (
phase=create_vm). The ext extension that backs them is dev.rivet.agent-os.acp (agentos crates/agentos-protocol/src/lib.rs:7, AcpExtension in crates/agentos-sidecar/src/acp_extension.rs), baked into the sidecar binary. (An unregistered namespace would reject fast — service.rs:1483-1492 — so this is not a missing-registration / protocol-version issue; those fail immediately, not after 120s.)
- When a per-VM
sidecar_response arrives after that VM was torn down, SidecarResponseTracker::accept_response returns UnmatchedResponse/DuplicateResponse (secure-exec crates/sidecar/src/protocol.rs:1899-1913).
- Pre-0.3.2, that error propagates:
accept_wire_sidecar_response → stdio.rs:364 …? → the main read loop's handle_protocol_frame(...).await? (stdio.rs:232,252) exits the read loop. The sidecar stops servicing all frames.
- The host's in-flight
ext request never gets ext_result and times out after DEFAULT_SIDECAR_FRAME_TIMEOUT_MS = 120_000 (secure-exec packages/core/src/native-client.ts:17) → the exact error in protocol-client.ts:122-124. Matches the 120s + the started → create_vm → silence signature precisely.
- 0.3.2 fixes it by tolerating stale responses (drop +
warn instead of dying): secure-exec crates/sidecar/src/service.rs:2720-2742 ("a per-VM sidecar_request can be answered by the host after that VM has been torn down (multiple VMs share one sidecar process)").
This is a same-version-lockstep break
Per AGENTS.md: "The protocol has no backwards compatibility. Clients and the sidecar ship in same-version lockstep... the single same-version wire handshake is the only version check." And the package tracks: "agent-os product/API (@rivet-dev/agentos*)... pins compatible secure-exec and registry package versions", while "@agentos-software/* registry packages... [are] versioned independently."
The @agentos-software/claude-code ACP adapter therefore must run in lockstep with the agentos-core/sidecar it talks to. But @agentos-software/claude-code@0.2.0 exact-pins @rivet-dev/agentos-core@0.2.0, so the README install npm i @rivet-dev/agentos-core @agentos-software/claude-code resolves core 0.2.2 (latest, with the fixed secure-exec-sidecar 0.3.2) for the host alongside the adapter's pinned core 0.2.0 (unfixed 0.3.1) — a lockstep break that routes tool execution through the unfixed sidecar.
The fix (maintainer-side)
Publish a @agentos-software/claude-code that depends on the same @rivet-dev/agentos-core version agent-os ships (0.2.2), so the adapter and sidecar are in lockstep on the fixed secure-exec-sidecar 0.3.2. (That package has no public repo, so this is maintainer action; per AGENTS.md the registry packages are version-managed via just agentos-pkgs-set-version.)
An npm overrides workaround is NOT valid here and is not recommended: forcing agentos-core@0.2.2 under a claude-code@0.2.0 build removes the 120s hang (confirming the root cause — the Write prompt returns end_turn instead of timing out) but leaves a 0.2.0-built adapter running against a 0.2.2 protocol, i.e. exactly the lockstep break AGENTS.md forbids — and in testing tool calls did not cleanly persist in that mixed state. Only a lockstep-rebuilt adapter fixes it.
Secondary: the silent 120s hang is an observability gap (per AGENTS.md)
Independent of the version fix: AGENTS.md → Limits, Bounds & Observability states "The default 120s ACP method timeout is the adapter-stall failure mode — make it observable, not a silent 120s hang," and that ACP timeouts carry data.kind === "acp_timeout" while "the native-sidecar frame timeout... [should] emit a structured near-threshold signal (default ≥80%) and fail with a typed error that names the limit and how to raise it."
This bug is precisely that failure mode: the native-sidecar ext frame timeout (secure-exec packages/core/src/protocol-client.ts:122-124, DEFAULT_SIDECAR_FRAME_TIMEOUT_MS = 120_000) surfaces as a bare 120s silent hang with no near-threshold warning and no typed error/kind. Even after the lockstep fix, giving the native-sidecar frame timeout the same typed-error + warn-on-approach treatment as acp_timeout would have turned this from a 120s silent hang into an immediately actionable error.
Notes
- Reproduces on Node 22 and 25 → not a Node/ABI artifact. macOS arm64.
- The slash-skill
Unknown skill behaviour (a /100x:... or personal ~/.claude/skills skill not resolving) is a separate matter from this hang and is not covered here.
Environment
|
|
| OS |
macOS 15.7.4, arm64 |
| Node |
v25.9.0 and v22.22.3 (same result) |
@rivet-dev/agentos-core |
0.2.2 (README install) — also tried 0.2.0 |
@agentos-software/claude-code |
0.2.0 (pins agentos-core 0.2.0) |
@secure-exec/core |
0.3.2 (top) + 0.3.1 (nested under claude-code) |
@rivet-dev/agentos-sidecar |
0.2.2 |
@anthropic-ai/claude-agent-sdk |
0.2.141 |
claudeagent: every tool routed through theextextension runtime (Write / Bash / Skill / sub-agent) hangs withtimed out waiting for sidecar protocol frame for extSummary
With a
claudesession, the agent boots, authenticates, and can use the Read tool and reply normally. But any tool that goes through theextextension runtime — the Write tool, Bash tool, the Skill tool, or spawning a sub-agent (Task) — hangs ~120s and fails with:So a real coding-agent run can't complete (it can't write files, run shell commands, use skills, or fan out). Read-only / trivial sessions work fine, which is why this isn't obvious until the agent first reaches for a write/exec tool.
Confirmed NOT a usage error
examples/docs/agents/claude/: defaultHOME=/home/agentos, useragentos,createSession("claude", {cwd, env}), no overrides. (Initially suspected myHOME=/rootoverride — ruled out; same failure with the documented home.)extextension request path.Reproduces on Node 22 and 25
Identical failure under fresh installs on Node v22.22.3 and v25.9.0 (macOS 15.7.4, arm64). Not a Node-version/ABI artifact.
Reproduction
npm i @rivet-dev/agentos-core @agentos-software/common @agentos-software/claude-code export ANTHROPIC_API_KEY=sk-... node repro.mjsWhat works vs fails
ext)extrequest times out (~120s)AgentOs.create(),vm.exec()(direct)Root cause (traced end-to-end)
This is a shared-sidecar lifecycle bug in
secure-exec-sidecar 0.3.1that was already fixed in0.3.2(secure-exec PR #133 / tagv0.3.2, commitd8a4435) — but0.3.1is what@agentos-software/claude-code@0.2.0drags onto the tool path via its exact pin on@rivet-dev/agentos-core@0.2.0.Mechanism:
phase=create_vm). Theextextension that backs them isdev.rivet.agent-os.acp(agentos crates/agentos-protocol/src/lib.rs:7,AcpExtensionincrates/agentos-sidecar/src/acp_extension.rs), baked into the sidecar binary. (An unregistered namespace would reject fast —service.rs:1483-1492— so this is not a missing-registration / protocol-version issue; those fail immediately, not after 120s.)sidecar_responsearrives after that VM was torn down,SidecarResponseTracker::accept_responsereturnsUnmatchedResponse/DuplicateResponse(secure-exec crates/sidecar/src/protocol.rs:1899-1913).accept_wire_sidecar_response→stdio.rs:364…?→ the main read loop'shandle_protocol_frame(...).await?(stdio.rs:232,252) exits the read loop. The sidecar stops servicing all frames.extrequest never getsext_resultand times out afterDEFAULT_SIDECAR_FRAME_TIMEOUT_MS = 120_000(secure-exec packages/core/src/native-client.ts:17) → the exact error inprotocol-client.ts:122-124. Matches the 120s + thestarted → create_vm → silencesignature precisely.warninstead of dying):secure-exec crates/sidecar/src/service.rs:2720-2742("a per-VMsidecar_requestcan be answered by the host after that VM has been torn down (multiple VMs share one sidecar process)").This is a same-version-lockstep break
Per
AGENTS.md: "The protocol has no backwards compatibility. Clients and the sidecar ship in same-version lockstep... the single same-version wire handshake is the only version check." And the package tracks: "agent-os product/API (@rivet-dev/agentos*)... pins compatible secure-exec and registry package versions", while "@agentos-software/*registry packages... [are] versioned independently."The
@agentos-software/claude-codeACP adapter therefore must run in lockstep with theagentos-core/sidecar it talks to. But@agentos-software/claude-code@0.2.0exact-pins@rivet-dev/agentos-core@0.2.0, so the README installnpm i @rivet-dev/agentos-core @agentos-software/claude-coderesolves core 0.2.2 (latest, with the fixedsecure-exec-sidecar 0.3.2) for the host alongside the adapter's pinned core 0.2.0 (unfixed0.3.1) — a lockstep break that routes tool execution through the unfixed sidecar.The fix (maintainer-side)
Publish a
@agentos-software/claude-codethat depends on the same@rivet-dev/agentos-coreversion agent-os ships (0.2.2), so the adapter and sidecar are in lockstep on the fixedsecure-exec-sidecar 0.3.2. (That package has no public repo, so this is maintainer action; perAGENTS.mdthe registry packages are version-managed viajust agentos-pkgs-set-version.)An npm
overridesworkaround is NOT valid here and is not recommended: forcingagentos-core@0.2.2under aclaude-code@0.2.0build removes the 120s hang (confirming the root cause — the Write prompt returnsend_turninstead of timing out) but leaves a 0.2.0-built adapter running against a 0.2.2 protocol, i.e. exactly the lockstep breakAGENTS.mdforbids — and in testing tool calls did not cleanly persist in that mixed state. Only a lockstep-rebuilt adapter fixes it.Secondary: the silent 120s hang is an observability gap (per AGENTS.md)
Independent of the version fix:
AGENTS.md→ Limits, Bounds & Observability states "The default 120s ACP method timeout is the adapter-stall failure mode — make it observable, not a silent 120s hang," and that ACP timeouts carrydata.kind === "acp_timeout"while "the native-sidecar frame timeout... [should] emit a structured near-threshold signal (default ≥80%) and fail with a typed error that names the limit and how to raise it."This bug is precisely that failure mode: the native-sidecar
extframe timeout (secure-exec packages/core/src/protocol-client.ts:122-124,DEFAULT_SIDECAR_FRAME_TIMEOUT_MS = 120_000) surfaces as a bare 120s silent hang with no near-threshold warning and no typed error/kind. Even after the lockstep fix, giving the native-sidecar frame timeout the same typed-error + warn-on-approach treatment asacp_timeoutwould have turned this from a 120s silent hang into an immediately actionable error.Notes
Unknown skillbehaviour (a/100x:...or personal~/.claude/skillsskill not resolving) is a separate matter from this hang and is not covered here.Environment
@rivet-dev/agentos-core@agentos-software/claude-code0.2.0)@secure-exec/core@rivet-dev/agentos-sidecar@anthropic-ai/claude-agent-sdk