feat(sandbox): expose Stop/Start RPCs in gateway API + CLI#3
Merged
Conversation
The driver-layer stop_sandbox already exists for every compute backend (podman, docker, kubernetes, vm) and the internal compute_driver.proto exposes a StopSandbox RPC. But the gateway-facing openshell.proto only surfaces DeleteSandbox, which destroys the workspace volume alongside the container. Operators who want to halt a sandbox for resource reasons have to delete it and lose state. This patch adds: - StopSandbox and StartSandbox RPCs to proto/openshell.proto with matching Request/Response messages. - Compute-runtime wrappers (stop_sandbox, start_sandbox) that look up the sandbox by name, forward stop to the driver, and forward start to the StartupResume hook so the existing Docker resume path is reused. - A resume_sandbox method on PodmanComputeDriver mirroring the Docker implementation, plus a StartupResume impl wired in new_podman so the Start verb works on podman too. - gRPC handlers (handle_stop_sandbox, handle_start_sandbox), trait wires, and matching authz entries (both gated on sandbox:write). - CLI verbs: openshell sandbox stop [--all] and openshell sandbox start with completion for sandbox names and the same forward-cleanup behavior as delete. Phase is left to the existing watch loop to update as containers transition; no new SandboxPhase variant is required. Restart-from- stopped works mid-session via Start and at gateway boot via the existing resume_persisted_sandboxes sweep. Tests: - Podman driver: resume_sandbox returns Ok(false) when container is missing, is a no-op when already running, and issues POST /start when stopped. - Compute wrappers: NotFound for missing sandboxes, Unimplemented when no StartupResume hook is configured, forwards to the hook with the correct id/name, surfaces the hook's Ok(false) verdict. - gRPC handlers: empty-name InvalidArgument and missing-sandbox NotFound for both Stop and Start. The seven mock OpenShell impls in integration tests are updated to satisfy the expanded trait surface.
This was referenced May 26, 2026
vessux
added a commit
to vessux/openlock
that referenced
this pull request
May 26, 2026
Fork v0.5.0 ships the `openshell sandbox stop` / `openshell sandbox start` verbs that this branch's new `stopSandbox`/`startSandbox` wrappers call. Production installs need the binary tarballs that v0.5.0 publishes; dev-mode picks up the local `openshell-fork/target/debug` checkout regardless and ran fine without this bump. See vessux/OpenShell#3 + release notes for v0.5.0.
vessux
added a commit
to vessux/openlock
that referenced
this pull request
May 26, 2026
… openlock-27e) (#40) * fix(sandbox): non-destructive stop + reap; auto-start on reattach (openlock-27e) Bug: `openlock stop <name>` and the 30-minute idle-reaper both routed through `deleteSandbox`, which tears down the container, the workspace volume, and the session-scoped handshake secret in one call. Help text claimed "preserves state" and "no removal" — both lies. A workspace that survived a `Ctrl-C` would silently disappear after half an hour of being idle. Fix: - container.ts: add buildSandboxStopArgv / buildSandboxStartArgv and stopSandbox / startSandbox wrappers around the new openshell sandbox stop / start verbs (vessux/OpenShell#3). - session-ops.ts: stopSession and reapIdleStaleSessions now call stopSandbox instead of deleteSandbox. cleanSession keeps the destructive deleteSandbox call (that is the explicit-teardown path). - session.ts: when reattaching to a session whose container is in "exited" state, call startSandbox before waitForSandboxReady. The supervisor lives inside the container, so without an explicit start the existing wait path would time out. The existing _descriptions.ts already advertises "preserves state" and "no removal" — those statements are now finally true. Other: - package.json: scope `bun test` to `./src/ ./tests/` so the test runner does not pick up the openshell-fork checkout's z3-sys build artifacts (a TypeScript test file shipped inside the vendored z3 source). The wired-up TS layer talks to the fork's new Stop/Start RPCs via the openshell CLI. Dev-mode builds (this repo plus a sibling openshell-fork checkout) pick up the binaries from openshell-fork/target/debug; production installs will need a fork release tag bump once the upstream fork PR merges. * chore(deps): bump OPENSHELL_FORK_TAG to v0.5.0 Fork v0.5.0 ships the `openshell sandbox stop` / `openshell sandbox start` verbs that this branch's new `stopSandbox`/`startSandbox` wrappers call. Production installs need the binary tarballs that v0.5.0 publishes; dev-mode picks up the local `openshell-fork/target/debug` checkout regardless and ran fine without this bump. See vessux/OpenShell#3 + release notes for v0.5.0. * fix(sandbox): name openshell sandboxes without podman prefix `openshell sandbox <verb>` operates on the gateway-registered sandbox name (the one passed to `--name` at create), not the underlying podman container name `openshell-sandbox-<name>`. openlock was prefixing every name with `SANDBOX_PREFIX` before invoking the binary, so the gateway always answered NotFound: $ openshell sandbox get openshell-sandbox-foo → NotFound $ openshell sandbox get foo → Ready `deleteSandbox` masked this by silently swallowing stderr/exit code (`{stdout:"ignore", stderr:"ignore"}`), so `openlock clean` reported success while leaving the podman container orphaned — discovered when trying to smoke-test PR-#40's auto-start path which depends on `getSandboxState` returning a non-`missing` value. Drop the prefix from every callsite that talks to the openshell binary (session.ts / session-ops.ts / cli/exec / cli/shell), drop the unused `SANDBOX_PREFIX` constant + file (integration tests hardcode the container name where they hit podman directly), and surface non-NotFound delete failures instead of swallowing them. Also fixes `getSandboxState`: - `openshell sandbox get` has never accepted `-o json` (only `--policy-only` on top of `[name]`), so the JSON.parse path always exit-coded to `"missing"`. Drop the flag, parse the human "Phase: X" line instead. - Add ANSI-stripping + a `parseSandboxGetPhase` unit test. Without this, PR-#40's `if (state === "exited") await startSandbox(...)` in `reattachSession` is dead code: `getSandboxState` returns `"missing"` first and we exit before reaching the start. * fix(sandbox): map openshell Phase=Error to "exited" (no Stopped variant) Fork v0.5.0 has no SandboxPhase::Stopped — an explicit `openshell sandbox stop` transitions phase Ready -> Error because the watch loop treats any ContainerExited as a terminal failure. Before this, parseSandboxGetPhase mapped Error to "other"; reattachSession then printed "Attaching to running" but did not invoke startSandbox, and waitForSandboxReady hung probing a stopped container. Mapping Error to "exited" lets reattach trigger startSandbox; genuine provisioning failures still surface when startSandbox/waitForSandboxReady fail. Tracked as bd openlock-z9i (fork-side fix: add Stopped variant or intentional-stop flag).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
StopSandbox/StartSandboxgateway RPCs that halt a sandbox container without destroying its workspace volume, then bring it back live.stop_sandbox(podman/docker/kubernetes/vm) andStartupResume::resume_sandbox(Docker today; this PR adds Podman).openshell sandbox stop|startCLI verbs with--allparity todelete.Why
Driver-layer Stop and the internal
compute_driver.protoRPC are already in place, but the gateway-facingopenshell.protoonly exposesDeleteSandbox. That path tears down the workspace volume along with the container, so anyone wanting to halt a sandbox for resource reasons has to lose state.Changes
proto/openshell.proto: addStopSandbox+StartSandboxRPCs and Request/Response messages.openshell-server:compute/mod.rs:pub async fn stop_sandbox(name)(forwards to driver) andpub async fn start_sandbox(name)(forwards toStartupResume).grpc/sandbox.rs:handle_stop_sandbox/handle_start_sandboxwith empty-name guard and not-found mapping.grpc/mod.rs: trait wires.auth/authz.rs: both RPCs gated onsandbox:write.openshell-driver-podman:driver.rs: newresume_sandbox(sandbox_id, sandbox_name)mirroring the Docker implementation (inspect → start if not running; returnOk(false)when missing).StartupResumefornew_podmanso the Start verb works on podman without a gateway restart.openshell-cli:SandboxCommands::Stop { names, all }/SandboxCommands::Start { names }enum variants and match arms.run::sandbox_stop/run::sandbox_startrunners with sandbox-name completion and the same forward-cleanup behavior as delete.Phase is left to the existing watch loop to update as containers transition; no new
SandboxPhasevariant is required. Restart-from-stopped works mid-session via the new Start verb, and at gateway boot via the existingresume_persisted_sandboxessweep.Testing
cargo test -p openshell-driver-podman -p openshell-server -p openshell-cli(all green).cargo clippy --tests --no-depsclean.cargo fmt --checkclean.New unit tests:
resume_sandboxnot-found / already-running / stopped-starts paths.stop_sandboxnot-found / ok,start_sandboxnot-found / unimplemented-without-hook / forwards-to-hook / backend-missing-reports-false.InvalidArgumentand missing-sandboxNotFoundfor both verbs.The seven mock
OpenShellimpls in integration tests are updated to satisfy the expanded trait surface.Checklist
cargo fmt,cargo clippy --tests,cargo testall clean for affected crates.