Idle-stream timeout for remote LLM calls#575
Open
SuuBro wants to merge 6 commits into
Open
Conversation
pi-coding-agent's cli.js disables undici bodyTimeout/headersTimeout globally to accommodate buffered local vLLM responses, which leaves remote LLM SSE streams unprotected. When a remote stream goes silent the agent hangs indefinitely with status: streaming and no error surfaced. Add a CommonJS preload (defaults/agent-preload/undici-idle-timeouts.cjs) that monkey-patches undici.setGlobalDispatcher to wrap pi's dispatcher in an IdleTimeoutDispatcher. The wrapper injects bodyTimeout/headersTimeout on per-request opts only for non-local, non-trusted origins — preserving the existing no-timeout behaviour for localhost/RFC1918/Tailscale CGNAT/ .local backends. Wire the preload via NODE_OPTIONS=--require=... in rpc-bridge.ts for both direct-spawn and docker-exec branches. docker-args.ts bind-mounts the .cjs file read-only at /bobbit-preload/. Env vars: BOBBIT_REMOTE_BODY_TIMEOUT_MS default 120000 BOBBIT_REMOTE_HEADERS_TIMEOUT_MS default 60000 BOBBIT_TRUSTED_NO_TIMEOUT_ORIGINS default '' scripts/copy-defaults.mjs already copies the whole defaults/ tree so the new agent-preload/ subdir lands in dist automatically. Adds 39 unit tests covering isLocalOrigin (IPv4/IPv6/RFC1918/Tailscale/ .local), isTrustedNoTimeout (env parsing, port-aware matching), and IdleTimeoutDispatcher.dispatch (injection, passthrough, trusted-origin opt-out, caller-supplied positive value wins, close/destroy forwarding). Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
… labels
Two coordinated fixes for the brittleness in the docker-exec idle-timeout
preload wiring:
A. Probe-gate exec-time --require flag in rpc-bridge.spawnDockerExec.
Previously the flag was injected unconditionally, but the corresponding
bind mount in docker-args.buildDockerRunArgs is guarded by fs.statSync —
so a missing host preload (dev tree pre-build) produced a container
without /bobbit-preload, then 'node --require=<missing>' exited
immediately, breaking every session in that container.
New behaviour: on first exec into each container, probe
'docker exec <id> test -f /bobbit-preload/undici-idle-timeouts.cjs'
(5s timeout, cached per containerId). If present → inject --require.
If absent → emit a one-line warn and OMIT --require (timeout env vars
are still exported; harmless without the preload). Short-circuits to
false if the host PRELOAD_PATH is missing.
B. Stamp containers with a feature-version label so stale containers from
older bobbit get recreated on upgrade. Project-sandbox._initContainer
was matching solely on 'bobbit-project=<id>' and reusing any hit; pre-
preload containers therefore got re-exec'd without the bind mount and
hit the same 'Cannot find module' failure.
New behaviour: docker-args exports CONTAINER_FEATURE_VERSION
('preload-1'). _initContainer passes it as labelVersion to
buildDockerRunArgs (so new containers get '<prefix>-version=preload-1')
AND filters _findContainerByLabel on BOTH the project label and the
version label. Old containers lack the version label, fall through to
not-found, get a fresh container created.
_findContainerByLabel signature widened to accept string | string[];
multiple --filter label=… args are emitted (docker AND-joins them).
Tests: tests/container-feature-version.test.ts asserts buildDockerRunArgs
emits the '<prefix>-version=<v>' label when labelVersion is passed and
omits it otherwise, for both 'bobbit-project' and 'bobbit-sandbox' prefixes.
npm run check and npm run test:unit (targeted subset) pass.
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
node --requirepreload that wraps undici's global dispatcher in agent subprocesses so remote LLM SSE streams get an idle-gap timeout (default 60s headers, 120s body), while preserving the no-timeout behaviour pi-coding-agent needs for local vLLM / Ollama / LM Studio buffered tool-call responses.Fixes the silent-stream hang observed on session
035f7442where the agent pinned asstreamingfor 4+ minutes with no error after an LLM SSE stream went silent.Root cause
@earendil-works/pi-coding-agent/dist/cli.jsruns at startup:This globally disables undici's idle-gap timeout for all outbound HTTP from the agent subprocess, including remote LLM streams. When a stream goes silent there's no built-in protection — the SDK waits forever.
Approach
Bobbit injects a CommonJS preload via
NODE_OPTIONS=--require=...into every agent subprocess (direct and Docker-exec). The preload monkey-patchesundici.setGlobalDispatcherso when pi installs itsEnvHttpProxyAgent, the dispatcher gets wrapped in anIdleTimeoutDispatcherthat injectsbodyTimeout/headersTimeoutper-request — but only whenopts.originis non-local and non-trusted.Origin classification covers loopback, RFC1918, IPv6 ULA, Tailscale CGNAT, and
.local/.localhosthostnames. Public-DNS AI gateways (Anthropic, OpenAI, ai-gateway.c3.zone, Bedrock, Vercel/Cloudflare AI Gateway) are treated as remote — operators opt them out viaBOBBIT_TRUSTED_NO_TIMEOUT_ORIGINSif their gateway fronts a buffering on-prem backend.For Docker, the preload is bind-mounted into containers and exec-time
--requireis gated on a probe (docker exec <id> test -f ...) so stale pre-upgrade containers don't crash. Project / sandbox containers carry aCONTAINER_FEATURE_VERSION=preload-1label so old containers are treated as not-found and auto-recreated with the new mount.Public contract
Three env vars (all optional):
BOBBIT_REMOTE_BODY_TIMEOUT_MS(default120000)BOBBIT_REMOTE_HEADERS_TIMEOUT_MS(default60000)BOBBIT_TRUSTED_NO_TIMEOUT_ORIGINS(comma-separated URL origins, default empty)Files changed
defaults/agent-preload/undici-idle-timeouts.cjs(new)src/server/agent/rpc-bridge.ts(preload wiring, direct + docker branches)src/server/agent/docker-args.ts(preload mount,CONTAINER_FEATURE_VERSION)src/server/agent/project-sandbox.ts(version-aware container reuse)tests/undici-idle-timeouts.test.ts(new — 39 tests)tests/container-feature-version.test.ts(new)docs/internals.md,docs/debugging.md(documentation)Validation
npm run check— passnpm run test:unit— pass (1044 passed, 1 skipped, including all 39 new tests)Out of scope
035f7442— preload only affects newly spawned subprocesses.🤖 Generated with Bobbit