Skip to content

fix(sandbox): non-destructive stop + reap; auto-start on reattach (bd openlock-27e)#40

Merged
vessux merged 4 commits into
mainfrom
feat/sandbox-stop-start
May 26, 2026
Merged

fix(sandbox): non-destructive stop + reap; auto-start on reattach (bd openlock-27e)#40
vessux merged 4 commits into
mainfrom
feat/sandbox-stop-start

Conversation

@vessux
Copy link
Copy Markdown
Owner

@vessux vessux commented May 26, 2026

Summary

  • openlock stop and the 30-minute idle reaper no longer destroy workspace volumes — they call the new fork openshell sandbox stop verb instead.
  • Reattaching to a session whose container has been stopped now auto-starts it via openshell sandbox start before waiting for the supervisor.
  • Help text in _descriptions.ts already advertised "preserves state" / "no removal" — those statements are now finally accurate.

Why

session-ops.ts was routing both stopSession and reapIdleStaleSessions through deleteSandbox, which calls openshell sandbox delete and tears down the container, the workspace volume, and the session-scoped handshake secret atomically. A workspace that survived a Ctrl-C would silently disappear after half an hour of inactivity. Discovered while reviewing post-#39 sidebar items.

Filed as bd openlock-27e.

Changes

  • src/sandbox/container.ts: new buildSandboxStopArgv / buildSandboxStartArgv argv builders and stopSandbox / startSandbox async wrappers mirroring deleteSandbox.
  • src/sandbox/session-ops.ts:
    • stopSession now calls stopSandbox (was deleteSandbox).
    • reapIdleStaleSessions now calls stopSandbox.
    • cleanSession still calls deleteSandbox — that path is the explicit-teardown verb.
  • src/sandbox/session.ts: reattachSession calls startSandbox when the existing container's state is exited, before waitForSandboxReady. The supervisor lives inside the container, so without an explicit start the existing wait path would time out.
  • src/sandbox/container.test.ts: argv-shape tests for both new builders.
  • package.json: scope bun test to ./src/ ./tests/ so the runner does not pick up the openshell-fork checkout's z3-sys build artifacts (a .test.ts file ships inside vendored z3 sources).

Fork dependency

The CLI verbs openshell sandbox stop / openshell sandbox start are added by vessux/OpenShell#3. Dev-mode openlock builds pick up the new binaries from the local openshell-fork/target/debug checkout, but a production install will need a fork release tag bump (OPENSHELL_FORK_TAG in src/sandbox/fork-binaries.ts) once that fork PR merges and a release is cut.

This PR intentionally does not bump the fork tag — that should land as a separate small commit after the fork release is available.

Test plan

  • bun run lint — only pre-existing biome warning in tests/integration/post-create-openrouter-real.test.ts:154; no new warnings.
  • bun run typecheck — clean.
  • bun run knip — clean.
  • bun run test — 579 pass / 0 fail / 6 skip across 70 src files.
  • Manual smoke once the fork tag is bumped: openlock sandbox foo, exit, openlock stop foo, openlock sandbox foo should reattach with workspace intact instead of failing.
  • Manual smoke: openlock reap followed by reattach — same expectation.

vessux added 3 commits May 26, 2026 09:08
…enlock-27e)

Bug: `openlock stop <name>` and the 30-minute idle-reaper both routed
through `deleteSandbox`, which tears down the container, the workspace
volume, and the session-scoped handshake secret in one call. Help text
claimed "preserves state" and "no removal" — both lies. A workspace
that survived a `Ctrl-C` would silently disappear after half an hour
of being idle.

Fix:

- container.ts: add buildSandboxStopArgv / buildSandboxStartArgv and
  stopSandbox / startSandbox wrappers around the new openshell sandbox
  stop / start verbs (vessux/OpenShell#3).
- session-ops.ts: stopSession and reapIdleStaleSessions now call
  stopSandbox instead of deleteSandbox. cleanSession keeps the
  destructive deleteSandbox call (that is the explicit-teardown path).
- session.ts: when reattaching to a session whose container is in
  "exited" state, call startSandbox before waitForSandboxReady. The
  supervisor lives inside the container, so without an explicit start
  the existing wait path would time out.

The existing _descriptions.ts already advertises "preserves state" and
"no removal" — those statements are now finally true.

Other:

- package.json: scope `bun test` to `./src/ ./tests/` so the test
  runner does not pick up the openshell-fork checkout's z3-sys build
  artifacts (a TypeScript test file shipped inside the vendored z3
  source).

The wired-up TS layer talks to the fork's new Stop/Start RPCs via
the openshell CLI. Dev-mode builds (this repo plus a sibling
openshell-fork checkout) pick up the binaries from
openshell-fork/target/debug; production installs will need a fork
release tag bump once the upstream fork PR merges.
Fork v0.5.0 ships the `openshell sandbox stop` / `openshell sandbox start`
verbs that this branch's new `stopSandbox`/`startSandbox` wrappers call.
Production installs need the binary tarballs that v0.5.0 publishes;
dev-mode picks up the local `openshell-fork/target/debug` checkout
regardless and ran fine without this bump.

See vessux/OpenShell#3 + release notes for v0.5.0.
`openshell sandbox <verb>` operates on the gateway-registered sandbox
name (the one passed to `--name` at create), not the underlying podman
container name `openshell-sandbox-<name>`. openlock was prefixing every
name with `SANDBOX_PREFIX` before invoking the binary, so the gateway
always answered NotFound:

  $ openshell sandbox get openshell-sandbox-foo  → NotFound
  $ openshell sandbox get foo                    → Ready

`deleteSandbox` masked this by silently swallowing stderr/exit code
(`{stdout:"ignore", stderr:"ignore"}`), so `openlock clean` reported
success while leaving the podman container orphaned — discovered when
trying to smoke-test PR-#40's auto-start path which depends on
`getSandboxState` returning a non-`missing` value.

Drop the prefix from every callsite that talks to the openshell binary
(session.ts / session-ops.ts / cli/exec / cli/shell), drop the unused
`SANDBOX_PREFIX` constant + file (integration tests hardcode the
container name where they hit podman directly), and surface
non-NotFound delete failures instead of swallowing them.

Also fixes `getSandboxState`:
- `openshell sandbox get` has never accepted `-o json` (only `--policy-only`
  on top of `[name]`), so the JSON.parse path always exit-coded to
  `"missing"`. Drop the flag, parse the human "Phase: X" line instead.
- Add ANSI-stripping + a `parseSandboxGetPhase` unit test.

Without this, PR-#40's `if (state === "exited") await startSandbox(...)`
in `reattachSession` is dead code: `getSandboxState` returns `"missing"`
first and we exit before reaching the start.
@vessux vessux closed this May 26, 2026
@vessux vessux reopened this May 26, 2026
Fork v0.5.0 has no SandboxPhase::Stopped — an explicit `openshell sandbox
stop` transitions phase Ready -> Error because the watch loop treats any
ContainerExited as a terminal failure. Before this, parseSandboxGetPhase
mapped Error to "other"; reattachSession then printed "Attaching to
running" but did not invoke startSandbox, and waitForSandboxReady hung
probing a stopped container.

Mapping Error to "exited" lets reattach trigger startSandbox; genuine
provisioning failures still surface when startSandbox/waitForSandboxReady
fail. Tracked as bd openlock-z9i (fork-side fix: add Stopped variant or
intentional-stop flag).
@vessux vessux merged commit c0d7241 into main May 26, 2026
5 checks passed
@vessux vessux deleted the feat/sandbox-stop-start branch May 26, 2026 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant