fix(sandbox): non-destructive stop + reap; auto-start on reattach (bd openlock-27e)#40
Merged
Conversation
…enlock-27e) Bug: `openlock stop <name>` and the 30-minute idle-reaper both routed through `deleteSandbox`, which tears down the container, the workspace volume, and the session-scoped handshake secret in one call. Help text claimed "preserves state" and "no removal" — both lies. A workspace that survived a `Ctrl-C` would silently disappear after half an hour of being idle. Fix: - container.ts: add buildSandboxStopArgv / buildSandboxStartArgv and stopSandbox / startSandbox wrappers around the new openshell sandbox stop / start verbs (vessux/OpenShell#3). - session-ops.ts: stopSession and reapIdleStaleSessions now call stopSandbox instead of deleteSandbox. cleanSession keeps the destructive deleteSandbox call (that is the explicit-teardown path). - session.ts: when reattaching to a session whose container is in "exited" state, call startSandbox before waitForSandboxReady. The supervisor lives inside the container, so without an explicit start the existing wait path would time out. The existing _descriptions.ts already advertises "preserves state" and "no removal" — those statements are now finally true. Other: - package.json: scope `bun test` to `./src/ ./tests/` so the test runner does not pick up the openshell-fork checkout's z3-sys build artifacts (a TypeScript test file shipped inside the vendored z3 source). The wired-up TS layer talks to the fork's new Stop/Start RPCs via the openshell CLI. Dev-mode builds (this repo plus a sibling openshell-fork checkout) pick up the binaries from openshell-fork/target/debug; production installs will need a fork release tag bump once the upstream fork PR merges.
Fork v0.5.0 ships the `openshell sandbox stop` / `openshell sandbox start` verbs that this branch's new `stopSandbox`/`startSandbox` wrappers call. Production installs need the binary tarballs that v0.5.0 publishes; dev-mode picks up the local `openshell-fork/target/debug` checkout regardless and ran fine without this bump. See vessux/OpenShell#3 + release notes for v0.5.0.
`openshell sandbox <verb>` operates on the gateway-registered sandbox
name (the one passed to `--name` at create), not the underlying podman
container name `openshell-sandbox-<name>`. openlock was prefixing every
name with `SANDBOX_PREFIX` before invoking the binary, so the gateway
always answered NotFound:
$ openshell sandbox get openshell-sandbox-foo → NotFound
$ openshell sandbox get foo → Ready
`deleteSandbox` masked this by silently swallowing stderr/exit code
(`{stdout:"ignore", stderr:"ignore"}`), so `openlock clean` reported
success while leaving the podman container orphaned — discovered when
trying to smoke-test PR-#40's auto-start path which depends on
`getSandboxState` returning a non-`missing` value.
Drop the prefix from every callsite that talks to the openshell binary
(session.ts / session-ops.ts / cli/exec / cli/shell), drop the unused
`SANDBOX_PREFIX` constant + file (integration tests hardcode the
container name where they hit podman directly), and surface
non-NotFound delete failures instead of swallowing them.
Also fixes `getSandboxState`:
- `openshell sandbox get` has never accepted `-o json` (only `--policy-only`
on top of `[name]`), so the JSON.parse path always exit-coded to
`"missing"`. Drop the flag, parse the human "Phase: X" line instead.
- Add ANSI-stripping + a `parseSandboxGetPhase` unit test.
Without this, PR-#40's `if (state === "exited") await startSandbox(...)`
in `reattachSession` is dead code: `getSandboxState` returns `"missing"`
first and we exit before reaching the start.
Fork v0.5.0 has no SandboxPhase::Stopped — an explicit `openshell sandbox stop` transitions phase Ready -> Error because the watch loop treats any ContainerExited as a terminal failure. Before this, parseSandboxGetPhase mapped Error to "other"; reattachSession then printed "Attaching to running" but did not invoke startSandbox, and waitForSandboxReady hung probing a stopped container. Mapping Error to "exited" lets reattach trigger startSandbox; genuine provisioning failures still surface when startSandbox/waitForSandboxReady fail. Tracked as bd openlock-z9i (fork-side fix: add Stopped variant or intentional-stop flag).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
openlock stopand the 30-minute idle reaper no longer destroy workspace volumes — they call the new forkopenshell sandbox stopverb instead.openshell sandbox startbefore waiting for the supervisor._descriptions.tsalready advertised "preserves state" / "no removal" — those statements are now finally accurate.Why
session-ops.tswas routing bothstopSessionandreapIdleStaleSessionsthroughdeleteSandbox, which callsopenshell sandbox deleteand tears down the container, the workspace volume, and the session-scoped handshake secret atomically. A workspace that survived a Ctrl-C would silently disappear after half an hour of inactivity. Discovered while reviewing post-#39 sidebar items.Filed as
bd openlock-27e.Changes
src/sandbox/container.ts: newbuildSandboxStopArgv/buildSandboxStartArgvargv builders andstopSandbox/startSandboxasync wrappers mirroringdeleteSandbox.src/sandbox/session-ops.ts:stopSessionnow callsstopSandbox(wasdeleteSandbox).reapIdleStaleSessionsnow callsstopSandbox.cleanSessionstill callsdeleteSandbox— that path is the explicit-teardown verb.src/sandbox/session.ts:reattachSessioncallsstartSandboxwhen the existing container's state isexited, beforewaitForSandboxReady. The supervisor lives inside the container, so without an explicit start the existing wait path would time out.src/sandbox/container.test.ts: argv-shape tests for both new builders.package.json: scopebun testto./src/ ./tests/so the runner does not pick up the openshell-fork checkout's z3-sys build artifacts (a.test.tsfile ships inside vendored z3 sources).Fork dependency
The CLI verbs
openshell sandbox stop/openshell sandbox startare added by vessux/OpenShell#3. Dev-mode openlock builds pick up the new binaries from the localopenshell-fork/target/debugcheckout, but a production install will need a fork release tag bump (OPENSHELL_FORK_TAGinsrc/sandbox/fork-binaries.ts) once that fork PR merges and a release is cut.This PR intentionally does not bump the fork tag — that should land as a separate small commit after the fork release is available.
Test plan
bun run lint— only pre-existing biome warning intests/integration/post-create-openrouter-real.test.ts:154; no new warnings.bun run typecheck— clean.bun run knip— clean.bun run test— 579 pass / 0 fail / 6 skip across 70 src files.openlock sandbox foo, exit,openlock stop foo,openlock sandbox fooshould reattach with workspace intact instead of failing.openlock reapfollowed by reattach — same expectation.