Idle-stream timeout for remote LLM calls by SuuBro · Pull Request #575 · SuuBro/bobbit

SuuBro · 2026-05-13T11:41:35Z

Summary

Adds a node --require preload that wraps undici's global dispatcher in agent subprocesses so remote LLM SSE streams get an idle-gap timeout (default 60s headers, 120s body), while preserving the no-timeout behaviour pi-coding-agent needs for local vLLM / Ollama / LM Studio buffered tool-call responses.

Fixes the silent-stream hang observed on session 035f7442 where the agent pinned as streaming for 4+ minutes with no error after an LLM SSE stream went silent.

Root cause

@earendil-works/pi-coding-agent/dist/cli.js runs at startup:

setGlobalDispatcher(new EnvHttpProxyAgent({ bodyTimeout: 0, headersTimeout: 0 }));

This globally disables undici's idle-gap timeout for all outbound HTTP from the agent subprocess, including remote LLM streams. When a stream goes silent there's no built-in protection — the SDK waits forever.

Approach

Bobbit injects a CommonJS preload via NODE_OPTIONS=--require=... into every agent subprocess (direct and Docker-exec). The preload monkey-patches undici.setGlobalDispatcher so when pi installs its EnvHttpProxyAgent, the dispatcher gets wrapped in an IdleTimeoutDispatcher that injects bodyTimeout / headersTimeout per-request — but only when opts.origin is non-local and non-trusted.

Origin classification covers loopback, RFC1918, IPv6 ULA, Tailscale CGNAT, and .local / .localhost hostnames. Public-DNS AI gateways (Anthropic, OpenAI, ai-gateway.c3.zone, Bedrock, Vercel/Cloudflare AI Gateway) are treated as remote — operators opt them out via BOBBIT_TRUSTED_NO_TIMEOUT_ORIGINS if their gateway fronts a buffering on-prem backend.

For Docker, the preload is bind-mounted into containers and exec-time --require is gated on a probe (docker exec <id> test -f ...) so stale pre-upgrade containers don't crash. Project / sandbox containers carry a CONTAINER_FEATURE_VERSION=preload-1 label so old containers are treated as not-found and auto-recreated with the new mount.

Public contract

Three env vars (all optional):

BOBBIT_REMOTE_BODY_TIMEOUT_MS (default 120000)
BOBBIT_REMOTE_HEADERS_TIMEOUT_MS (default 60000)
BOBBIT_TRUSTED_NO_TIMEOUT_ORIGINS (comma-separated URL origins, default empty)

Files changed

defaults/agent-preload/undici-idle-timeouts.cjs (new)
src/server/agent/rpc-bridge.ts (preload wiring, direct + docker branches)
src/server/agent/docker-args.ts (preload mount, CONTAINER_FEATURE_VERSION)
src/server/agent/project-sandbox.ts (version-aware container reuse)
tests/undici-idle-timeouts.test.ts (new — 39 tests)
tests/container-feature-version.test.ts (new)
docs/internals.md, docs/debugging.md (documentation)

Validation

npm run check — pass
npm run test:unit — pass (1044 passed, 1 skipped, including all 39 new tests)
E2E + LLM gap-analysis + code-quality + security + QA verification all pass

Out of scope

Recovery of the already-hung session 035f7442 — preload only affects newly spawned subprocesses.
Gateway-side stall watchdog for in-flight sessions (separate concern).

🤖 Generated with Bobbit

pi-coding-agent's cli.js disables undici bodyTimeout/headersTimeout globally to accommodate buffered local vLLM responses, which leaves remote LLM SSE streams unprotected. When a remote stream goes silent the agent hangs indefinitely with status: streaming and no error surfaced. Add a CommonJS preload (defaults/agent-preload/undici-idle-timeouts.cjs) that monkey-patches undici.setGlobalDispatcher to wrap pi's dispatcher in an IdleTimeoutDispatcher. The wrapper injects bodyTimeout/headersTimeout on per-request opts only for non-local, non-trusted origins — preserving the existing no-timeout behaviour for localhost/RFC1918/Tailscale CGNAT/ .local backends. Wire the preload via NODE_OPTIONS=--require=... in rpc-bridge.ts for both direct-spawn and docker-exec branches. docker-args.ts bind-mounts the .cjs file read-only at /bobbit-preload/. Env vars: BOBBIT_REMOTE_BODY_TIMEOUT_MS default 120000 BOBBIT_REMOTE_HEADERS_TIMEOUT_MS default 60000 BOBBIT_TRUSTED_NO_TIMEOUT_ORIGINS default '' scripts/copy-defaults.mjs already copies the whole defaults/ tree so the new agent-preload/ subdir lands in dist automatically. Adds 39 unit tests covering isLocalOrigin (IPv4/IPv6/RFC1918/Tailscale/ .local), isTrustedNoTimeout (env parsing, port-aware matching), and IdleTimeoutDispatcher.dispatch (injection, passthrough, trusted-origin opt-out, caller-supplied positive value wins, close/destroy forwarding). Co-authored-by: bobbit-ai <bobbit@bobbit.ai>

… labels Two coordinated fixes for the brittleness in the docker-exec idle-timeout preload wiring: A. Probe-gate exec-time --require flag in rpc-bridge.spawnDockerExec. Previously the flag was injected unconditionally, but the corresponding bind mount in docker-args.buildDockerRunArgs is guarded by fs.statSync — so a missing host preload (dev tree pre-build) produced a container without /bobbit-preload, then 'node --require=<missing>' exited immediately, breaking every session in that container. New behaviour: on first exec into each container, probe 'docker exec <id> test -f /bobbit-preload/undici-idle-timeouts.cjs' (5s timeout, cached per containerId). If present → inject --require. If absent → emit a one-line warn and OMIT --require (timeout env vars are still exported; harmless without the preload). Short-circuits to false if the host PRELOAD_PATH is missing. B. Stamp containers with a feature-version label so stale containers from older bobbit get recreated on upgrade. Project-sandbox._initContainer was matching solely on 'bobbit-project=<id>' and reusing any hit; pre- preload containers therefore got re-exec'd without the bind mount and hit the same 'Cannot find module' failure. New behaviour: docker-args exports CONTAINER_FEATURE_VERSION ('preload-1'). _initContainer passes it as labelVersion to buildDockerRunArgs (so new containers get '<prefix>-version=preload-1') AND filters _findContainerByLabel on BOTH the project label and the version label. Old containers lack the version label, fall through to not-found, get a fresh container created. _findContainerByLabel signature widened to accept string | string[]; multiple --filter label=… args are emitted (docker AND-joins them). Tests: tests/container-feature-version.test.ts asserts buildDockerRunArgs emits the '<prefix>-version=<v>' label when labelVersion is passed and omits it otherwise, for both 'bobbit-project' and 'bobbit-sandbox' prefixes. npm run check and npm run test:unit (targeted subset) pass. Co-authored-by: bobbit-ai <bobbit@bobbit.ai>

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>

…f4d75

SuuBro and others added 6 commits May 13, 2026 11:54

Merge coder: gate docker-exec preload --require on actual mount

da8756c

Document undici idle-stream timeout preload

11081a6

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>

Merge docs-writer: document undici idle-stream timeout preload

b423535

Merge remote-tracking branch 'origin/master' into goal/idle-strea-cc5…

57dbe67

…f4d75

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idle-stream timeout for remote LLM calls#575

Idle-stream timeout for remote LLM calls#575
SuuBro wants to merge 6 commits into
masterfrom
goal/idle-strea-cc5f4d75

SuuBro commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SuuBro commented May 13, 2026

Summary

Root cause

Approach

Public contract

Files changed

Validation

Out of scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant