Skip to content

Commit f819f7d

Browse files
authored
fix(vm): restore sandboxes after gateway restart (#1407)
1 parent 403c754 commit f819f7d

15 files changed

Lines changed: 778 additions & 94 deletions

File tree

.github/workflows/driver-vm-linux.yml

Lines changed: 4 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -134,24 +134,10 @@ jobs:
134134
run: |
135135
set -euo pipefail
136136
COMPRESSED_DIR="${PWD}/target/vm-runtime-compressed"
137-
mkdir -p "$COMPRESSED_DIR"
138-
139-
EXTRACT_DIR=$(mktemp -d)
140-
zstd -d "runtime-download/vm-runtime-${{ matrix.platform }}.tar.zst" --stdout \
141-
| tar -xf - -C "$EXTRACT_DIR"
142-
143-
echo "Extracted runtime files:"
144-
ls -lah "$EXTRACT_DIR"
145-
146-
for file in "$EXTRACT_DIR"/*; do
147-
[ -f "$file" ] || continue
148-
name=$(basename "$file")
149-
[ "$name" = "provenance.json" ] && continue
150-
zstd -19 -f -q -T0 -o "${COMPRESSED_DIR}/${name}.zst" "$file"
151-
done
152-
153-
echo "Staged compressed runtime artifacts:"
154-
ls -lah "$COMPRESSED_DIR"
137+
VM_RUNTIME_TARBALL="${PWD}/runtime-download/vm-runtime-${{ matrix.platform }}.tar.zst" \
138+
VM_RUNTIME_PLATFORM="${{ matrix.platform }}" \
139+
OPENSHELL_VM_RUNTIME_COMPRESSED_DIR="$COMPRESSED_DIR" \
140+
tasks/scripts/vm/compress-vm-runtime.sh
155141
156142
- name: Build bundled supervisor
157143
run: |

.github/workflows/driver-vm-macos.yml

Lines changed: 4 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -165,24 +165,10 @@ jobs:
165165
run: |
166166
set -euo pipefail
167167
COMPRESSED_DIR="${PWD}/target/vm-runtime-compressed-macos"
168-
mkdir -p "$COMPRESSED_DIR"
169-
170-
EXTRACT_DIR=$(mktemp -d)
171-
zstd -d "runtime-download/vm-runtime-darwin-aarch64.tar.zst" --stdout \
172-
| tar -xf - -C "$EXTRACT_DIR"
173-
174-
echo "Extracted darwin runtime files:"
175-
ls -lah "$EXTRACT_DIR"
176-
177-
for file in "$EXTRACT_DIR"/*; do
178-
[ -f "$file" ] || continue
179-
name=$(basename "$file")
180-
[ "$name" = "provenance.json" ] && continue
181-
zstd -19 -f -q -T0 -o "${COMPRESSED_DIR}/${name}.zst" "$file"
182-
done
183-
184-
echo "Staged macOS compressed runtime artifacts:"
185-
ls -lah "$COMPRESSED_DIR"
168+
VM_RUNTIME_TARBALL="${PWD}/runtime-download/vm-runtime-darwin-aarch64.tar.zst" \
169+
VM_RUNTIME_PLATFORM="darwin-aarch64" \
170+
OPENSHELL_VM_RUNTIME_COMPRESSED_DIR="$COMPRESSED_DIR" \
171+
tasks/scripts/vm/compress-vm-runtime.sh
186172
187173
- name: Download bundled supervisor
188174
uses: actions/download-artifact@v4

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

architecture/compute-runtimes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ reason strings.
2929
| Docker | Local development with Docker available. | Container plus nested sandbox namespace. | Uses host networking so loopback gateway endpoints work from the supervisor. |
3030
| Podman | Rootless or single-machine deployments. | Container plus nested sandbox namespace. | Uses the Podman REST API, OCI image volumes, and CDI GPU devices when available. |
3131
| Kubernetes | Cluster deployment through Helm. | Pod plus nested sandbox namespace. | Uses Kubernetes API objects, service accounts, secrets, PVC-backed workspace storage, and GPU resources. |
32-
| VM | Experimental microVM isolation. | Per-sandbox libkrun VM. | Gateway spawns `openshell-driver-vm` as a subprocess over a private, state-local Unix socket. The VM driver boots a cached bootstrap `rootfs.ext4`, prepares requested OCI images inside a bootstrap VM with `umoci`, attaches the prepared image disk read-only, and gives each sandbox a writable `overlay.ext4` for merged-root changes and runtime material. |
32+
| VM | Experimental microVM isolation. | Per-sandbox libkrun VM. | Gateway spawns `openshell-driver-vm` as a subprocess over a private, state-local Unix socket. The VM driver boots a cached bootstrap `rootfs.ext4`, prepares requested OCI images inside a bootstrap VM with `umoci`, attaches the prepared image disk read-only, and gives each sandbox a writable `overlay.ext4` for merged-root changes and runtime material. The driver persists each accepted launch request beside the overlay and restarts those VMs on driver startup without recreating the overlay. |
3333

3434
Per-sandbox CPU and memory values currently enter the driver layer through
3535
template resource limits. Docker and Podman apply them as runtime limits.

crates/openshell-driver-vm/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ openshell-vfio = { path = "../openshell-vfio" }
2525
bollard = { version = "0.20", features = ["ssh"] }
2626
tokio = { workspace = true }
2727
tonic = { workspace = true, features = ["transport"] }
28+
prost = { workspace = true }
2829
prost-types = { workspace = true }
2930
futures = { workspace = true }
3031
tokio-stream = { workspace = true, features = ["net"] }

crates/openshell-driver-vm/README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,13 @@ the overlay while cached image disks remain unchanged. The overlay disk must be
189189
large enough to hold the compressed payload, unpacked rootfs, and sandbox writes
190190
during the first prepare.
191191
192+
The driver also writes the accepted `DriverSandbox` launch request to
193+
`<state-dir>/sandboxes/<id>/sandbox.pb`. If the gateway restarts, it starts a
194+
new VM driver process; that process scans the sandbox state directories,
195+
restarts each persisted VM launcher, and preserves any existing `overlay.ext4`
196+
instead of cloning a fresh overlay template. If a restart happened before the
197+
overlay was created, the driver creates it during the resume attempt.
198+
192199
## Logs and debugging
193200
194201
Raise log verbosity for both processes:

0 commit comments

Comments
 (0)