You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .agents/skills/debug-openshell-cluster/SKILL.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -312,7 +312,7 @@ If DNS is broken, all image pulls from the distribution registry will fail, as w
312
312
|`metrics-server` errors in logs | Normal k3s noise, not the root cause | These errors are benign — look for the actual failing health check component |
313
313
| Stale NotReady nodes from previous deploys | Volume reused across container recreations | The deploy flow now auto-cleans stale nodes; if it still fails, manually delete NotReady nodes (see Step 2) or choose "Recreate" when prompted |
314
314
| gRPC `UNIMPLEMENTED` for newer RPCs in push mode | Helm values still point at older pulled images instead of the pushed refs | Verify rendered `openshell-helmchart.yaml` uses the expected push refs (`server`, `sandbox`, `pki-job`) and not `:latest`|
315
-
| Sandbox pods crash with `/opt/openshell/bin/openshell-sandbox: no such file or directory`| Supervisor binary missing from cluster image | The cluster image was built/published without the `supervisor-builder`stage. Rebuild with `mise run docker:build:cluster` and recreate gateway. Bootstrap auto-detects via `HEALTHCHECK_MISSING_SUPERVISOR` marker |
315
+
| Sandbox pods crash with `/opt/openshell/bin/openshell-sandbox: no such file or directory`| Supervisor binary missing from cluster image | The cluster image was built/published without the `supervisor-builder`target in `deploy/docker/Dockerfile.images`. Rebuild with `mise run docker:build:cluster` and recreate gateway. Bootstrap auto-detects via `HEALTHCHECK_MISSING_SUPERVISOR` marker |
316
316
|`HEALTHCHECK_MISSING_SUPERVISOR` in health check logs |`/opt/openshell/bin/openshell-sandbox` not found in gateway container | Rebuild cluster image: `mise run docker:build:cluster`, then `openshell gateway destroy <name> && openshell gateway start`|
Copy file name to clipboardExpand all lines: AGENTS.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -99,7 +99,7 @@ These pipelines connect skills into end-to-end workflows. Individual skill files
99
99
100
100
## Cluster Infrastructure Changes
101
101
102
-
- If you change cluster bootstrap infrastructure (e.g., `openshell-bootstrap` crate, `Dockerfile.cluster`, `cluster-entrypoint.sh`, `cluster-healthcheck.sh`, deploy logic in `openshell-cli`), update the `debug-openshell-cluster` skill in `.agents/skills/debug-openshell-cluster/SKILL.md` to reflect those changes.
102
+
- If you change cluster bootstrap infrastructure (e.g., `openshell-bootstrap` crate, `deploy/docker/Dockerfile.images`, `cluster-entrypoint.sh`, `cluster-healthcheck.sh`, deploy logic in `openshell-cli`), update the `debug-openshell-cluster` skill in `.agents/skills/debug-openshell-cluster/SKILL.md` to reflect those changes.
@@ -15,11 +15,11 @@ The gateway runs the control plane API server. It is deployed as a StatefulSet i
15
15
16
16
The cluster image is a single-container Kubernetes distribution that bundles the Helm charts, Kubernetes manifests, and the `openshell-sandbox` supervisor binary needed to bootstrap the control plane.
The supervisor binary (`openshell-sandbox`) is cross-compiled in a build stage and placed at `/opt/openshell/bin/openshell-sandbox`. It is exposed to sandbox pods at runtime via a read-only `hostPath` volume mount — it is not baked into sandbox images.
22
+
The supervisor binary (`openshell-sandbox`) is built by the shared `supervisor-builder`stage in `deploy/docker/Dockerfile.images` and placed at `/opt/openshell/bin/openshell-sandbox`. It is exposed to sandbox pods at runtime via a read-only `hostPath` volume mount — it is not baked into sandbox images.
23
23
24
24
## Sandbox Images
25
25
@@ -42,7 +42,7 @@ The incremental deploy (`cluster-deploy-fast.sh`) fingerprints local Git changes
@@ -226,7 +226,7 @@ After deploy, the CLI calls `save_active_gateway(name)`, writing the gateway nam
226
226
227
227
## Container Image
228
228
229
-
The gateway image is defined in `deploy/docker/Dockerfile.cluster`:
229
+
The cluster image is defined by target `cluster`in `deploy/docker/Dockerfile.images`:
230
230
231
231
```
232
232
Base: rancher/k3s:v1.35.2-k3s1
@@ -296,7 +296,7 @@ GPU support is part of the single-node gateway bootstrap path rather than a sepa
296
296
297
297
-`openshell gateway start --gpu` threads a boolean deploy option through `crates/openshell-cli`, `crates/openshell-bootstrap`, and `crates/openshell-bootstrap/src/docker.rs`.
298
298
- When enabled, the cluster container is created with Docker `DeviceRequests`, which is the API equivalent of `docker run --gpus all`.
299
-
-`deploy/docker/Dockerfile.cluster` installs NVIDIA Container Toolkit packages in a dedicated Ubuntu stage and copies the runtime binaries, config, and `libnvidia-container` shared libraries into the final Ubuntu-based cluster image.
299
+
-`deploy/docker/Dockerfile.images` installs NVIDIA Container Toolkit packages in a dedicated Ubuntu stage and copies the runtime binaries, config, and `libnvidia-container` shared libraries into the final Ubuntu-based cluster image.
300
300
-`deploy/docker/cluster-entrypoint.sh` checks `GPU_ENABLED=true` and copies GPU-only manifests from `/opt/openshell/gpu-manifests/` into k3s's manifests directory.
301
301
-`deploy/kube/gpu-manifests/nvidia-device-plugin-helmchart.yaml` installs the NVIDIA device plugin chart, currently pinned to `0.18.2`, along with GPU Feature Discovery and Node Feature Discovery.
302
302
- k3s auto-detects `nvidia-container-runtime` on `PATH`, registers the `nvidia` containerd runtime, and creates the `nvidia``RuntimeClass` automatically.
0 commit comments