fix(core): shared sidecar must not hang one-shot scripts; se 0.3.3 + agent×pkg-mgr e2e matrix#1551
Merged
Conversation
|
🚅 Deployed to the agentos-pr-1551 environment in agentos
🚅 Deployed to the agentos-pr-1551 environment in rivet-frontend
|
749f73b to
6d601e8
Compare
Member
Author
|
Ran a second adversarial review pass over the counted-hold rewrite (it was written after the first review, so it was itself unreviewed). Fixed everything it surfaced and re-verified:
A separate verification pass confirmed the earlier fixes (counted hold, Re-verified: |
6d601e8 to
446625a
Compare
The default shared sidecar pool kept its child process + stdio referenced after vm.dispose() released the last VM lease, so every standalone script (all the quickstart examples) hung on exit and had to be SIGINT'd. Fix: a counted event-loop hold on the shared sidecar, taken for the WHOLE create→use→dispose lifetime of each VM lease. Child + stdio are ref'd while holds > 0 and unref'd at 0, so in-flight VM work keeps the host alive while an idle program can exit. unref != kill: the sidecar stays pooled/reusable and is re-ref'd on the next lease. A one-time synchronous process 'exit' hook SIGKILLs pooled sidecars so a clean exit reaps them instantly. No SIGINT/SIGTERM handlers — the host owns signals. Hardened after adversarial review: - counted (not boolean) hold survives concurrent create/dispose - child handle cached BEFORE the handshake await so a failed authenticateAndOpenSession() can still reap it (no orphan / pinned loop) - rejected spawn clears nativeProcess so a later create() retries - dispose no longer force-zeros the hold counter; unbalanced release warns - loud warning if the secure-exec child handle can't be resolved Also: - tests/shared-sidecar-clean-exit.test.ts: regression for the hang - tests/agent-pkg-matrix.e2e.test.ts: skipped-by-default 4 pkg mgr × 4 agent real-API streaming matrix (AGENTOS_MATRIX_E2E) - fix(agentos-sidecar): embed AGENTOS_SYSTEM_PROMPT.md in the crate so cargo publish can package it; drop now-empty fixtures dir from agentos-core files - docs(agents): document OpenCode model config on its per-agent page; Sessions points to the per-agent docs for agent-specific configuration
446625a to
42dce13
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Headline fix: the default shared sidecar pool no longer hangs one-shot scripts on exit.
AgentOs.create()uses a process-global shared sidecar;vm.dispose()released the VM lease but left the sidecar's child process + stdio referenced, pinning the Node event loop — so every standalone script (all quickstart examples) completed its work and then hung until SIGINT.How it works now
stdin/stdout/stderrareref()'d while holds > 0 andunref()'d at 0. A counter (not a boolean) so concurrent create/dispose can't clobber each other.unref()≠ kill. The sidecar stays pooled and reusable across VM disposal and is re-ref()'d on the next lease. It's tied to the host-process lifetime, not the VM lifetime — so a long-running server keeps it; a finished one-shot script lets the loop drain.process.on("exit")SIGKILLs pooled sidecars, so a clean exit reaps them immediately (no orphan, no stdin-EOF grace wait). NoSIGINT/SIGTERMhandlers — the host owns signals (SIGINT still flows via the process group; SIGTERM-driven exit still closes stdin).Also in this branch
tests/agent-pkg-matrix.e2e.test.ts— skipped-by-default (gated byAGENTOS_MATRIX_E2E) real-API matrix: 4 package managers × 4 agents, fresh installs, asserts live token streaming. Codifies the issues hit shipping the preview (stale model ids, OpenCode config, permission keys, gap-based streaming, ACP-bootstrap flakiness).tests/shared-sidecar-clean-exit.test.ts— regression for the hang above (spawns a realcreate()+dispose()script with noprocess.exit()and asserts it terminates on its own).agentos-sidecarcrate packaging fix —AGENTOS_SYSTEM_PROMPT.mdmoved into the crate (src/) socargo publishcan package it (the prior out-of-crateinclude_str!broke the isolated package-verify build); dropped the now-emptyfixturesdir from@rivet-dev/agentos-core'sfiles.baseURLending in/v1+cwd), the one real "didn't work out of the box" gap.Review
Reviewed by two subagents (correctness/concurrency + branch/tests/docs). Addressed: counted (not boolean) hold to survive concurrent create/dispose; clear cached child + reset hold count on full dispose; loud warning if the secure-exec child handle can't be resolved (so the optimization can't silently regress); removed the empty
fixturesdir fromfiles;{ recursive: true }in the docsmkdir;dist-existence skip guard on the clean-exit test.Test evidence
agent-pkg-matrixagainst the published release: 16/16 live-streaming.🤖 Generated with Claude Code