You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .agents/skills/debug-openshell-cluster/SKILL.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -104,6 +104,7 @@ Look for:
104
104
- k3s startup errors (certificate issues, port binding failures)
105
105
- Manifest copy errors from `/opt/openshell/manifests/`
106
106
-`iptables` or `cgroup` errors (privilege/capability issues)
107
+
-`Warning: br_netfilter does not appear to be loaded` — this is advisory only; many kernels work without the explicit module. Only act on it if you also see DNS failures or pod-to-service connectivity problems (see Common Failure Patterns).
107
108
108
109
### Step 2: Check k3s Cluster Health
109
110
@@ -344,6 +345,7 @@ If DNS is broken, all image pulls from the distribution registry will fail, as w
344
345
| Port conflict | Another service on the configured gateway host port (default 8080) | Stop conflicting service or use `--port` on `openshell gateway start` to pick a different host port |
345
346
| gRPC connect refused to `127.0.0.1:443` in CI | Docker daemon is remote (`DOCKER_HOST=tcp://...`) but metadata still points to loopback | Verify metadata endpoint host matches `DOCKER_HOST` and includes non-loopback host |
346
347
| DNS failures inside container | Entrypoint DNS detection failed |`openshell doctor exec -- cat /etc/rancher/k3s/resolv.conf` and `openshell doctor logs --lines 20`|
348
+
| Pods can't reach kube-dns / ClusterIP services |`br_netfilter` not loaded; bridge traffic bypasses iptables DNAT rules |`sudo modprobe br_netfilter` on the host, then `echo br_netfilter \| sudo tee /etc/modules-load.d/br_netfilter.conf` to persist. Known to be required on Jetson Linux 5.15-tegra; other kernels (e.g. standard x86/aarch64 Linux) may have bridge netfilter built in and work without the module. The entrypoint logs a warning when `/proc/sys/net/bridge/bridge-nf-call-iptables` is absent but does not abort — only act on it if DNS or service connectivity is actually broken. |
347
349
| Node DiskPressure / MemoryPressure / PIDPressure | Insufficient disk, memory, or PIDs on host | Free disk (`docker system prune -a --volumes`), increase memory, or expand host resources. Bootstrap auto-detects via `HEALTHCHECK_NODE_PRESSURE` marker |
348
350
| Pods evicted with "The node had condition: [DiskPressure]" | Host disk full, kubelet evicting pods | Free disk space on host, then `openshell gateway destroy <name> && openshell gateway start`|
349
351
|`metrics-server` errors in logs | Normal k3s noise, not the root cause | These errors are benign — look for the actual failing health check component |
|`.agents/`| Agent skills and persona definitions |
191
192
193
+
## RFCs
194
+
195
+
For cross-cutting architectural decisions, API contract changes, or process proposals that need broad consensus, use the RFC process. RFCs live in `rfc/` — copy the template, fill it in, and open a PR for discussion. See [rfc/README.md](rfc/README.md) for the full lifecycle and guidelines on when to write an RFC versus a spike issue or architecture doc.
196
+
192
197
## Documentation
193
198
194
199
If your change affects user-facing behavior (new flags, changed defaults, new features, bug fixes that contradict existing docs), update the relevant pages under `docs/` in the same PR.
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -128,7 +128,7 @@ OpenShell can pass host GPUs into sandboxes for local inference, fine-tuning, or
128
128
openshell sandbox create --gpu --from [gpu-enabled-sandbox] -- claude
129
129
```
130
130
131
-
The CLI auto-bootstraps a GPU-enabled gateway on first use. GPU intent is also inferred automatically for community images with `gpu` in the name.
131
+
The CLI auto-bootstraps a GPU-enabled gateway on first use, auto-selecting CDI when available and otherwise falling back to Docker's NVIDIA GPU request path (`--gpus all`). GPU intent is also inferred automatically for community images with `gpu` in the name.
132
132
133
133
**Requirements:** NVIDIA drivers and the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) must be installed on the host. The sandbox image itself must include the appropriate GPU drivers and libraries for your workload — the default `base` image does not. See the [BYOC example](https://github.com/NVIDIA/OpenShell/tree/main/examples/bring-your-own-container) for building a custom sandbox image with GPU support.
Copy file name to clipboardExpand all lines: architecture/gateway-single-node.md
+12-6Lines changed: 12 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -296,8 +296,10 @@ When environment variables are set, the entrypoint modifies the HelmChart manife
296
296
297
297
GPU support is part of the single-node gateway bootstrap path rather than a separate architecture.
298
298
299
-
-`openshell gateway start --gpu` threads a boolean deploy option through `crates/openshell-cli`, `crates/openshell-bootstrap`, and `crates/openshell-bootstrap/src/docker.rs`.
300
-
- When enabled, the cluster container is created with Docker `DeviceRequests`, which is the API equivalent of `docker run --gpus all`.
299
+
-`openshell gateway start --gpu` threads GPU device options through `crates/openshell-cli`, `crates/openshell-bootstrap`, and `crates/openshell-bootstrap/src/docker.rs`.
300
+
- When enabled, the cluster container is created with Docker `DeviceRequests`. The injection mechanism is selected based on whether CDI is enabled on the daemon (`SystemInfo.CDISpecDirs` via `GET /info`):
301
+
-**CDI enabled** (daemon reports non-empty `CDISpecDirs`): CDI device injection — `driver="cdi"` with `nvidia.com/gpu=all`. Specs are expected to be pre-generated on the host (e.g. automatically by the `nvidia-cdi-refresh.service` or manually via `nvidia-ctk generate`).
302
+
-**CDI not enabled**: `--gpus all` device request — `driver="nvidia"`, `count=-1`, which relies on the NVIDIA Container Runtime hook.
301
303
-`deploy/docker/Dockerfile.images` installs NVIDIA Container Toolkit packages in a dedicated Ubuntu stage and copies the runtime binaries, config, and `libnvidia-container` shared libraries into the final Ubuntu-based cluster image.
302
304
-`deploy/docker/cluster-entrypoint.sh` checks `GPU_ENABLED=true` and copies GPU-only manifests from `/opt/openshell/gpu-manifests/` into k3s's manifests directory.
303
305
-`deploy/kube/gpu-manifests/nvidia-device-plugin-helmchart.yaml` installs the NVIDIA device plugin chart, currently pinned to `0.18.2`. NFD and GFD are disabled; the device plugin's default `nodeAffinity` (which requires `feature.node.kubernetes.io/pci-10de.present=true` or `nvidia.com/gpu.present=true` from NFD/GFD) is overridden to empty so the DaemonSet schedules on the single-node cluster without requiring those labels. The chart is configured with `deviceListStrategy: cdi-cri` so the device plugin injects devices via direct CDI device requests in the CRI.
@@ -308,15 +310,19 @@ The runtime chain is:
308
310
309
311
```text
310
312
Host GPU drivers & NVIDIA Container Toolkit
311
-
└─ Docker: --gpus all (DeviceRequests in bollard API)
313
+
└─ Docker: DeviceRequests (CDI when enabled, --gpus all otherwise)
312
314
└─ k3s/containerd: nvidia-container-runtime on PATH -> auto-detected
└─ Pods: request nvidia.com/gpu in resource limits (CDI injection — no runtimeClassName needed)
315
317
```
316
318
317
-
Device injection uses CDI (`deviceListStrategy: cdi-cri`): the device plugin injects devices via direct CDI device requests in the CRI. Sandbox pods only need `nvidia.com/gpu: 1` in their resource limits — no `runtimeClassName` field is set on GPU pods.
319
+
### `--gpu` flag
318
320
319
-
The expected smoke test is a plain pod requesting `nvidia.com/gpu: 1` (without `runtimeClassName`) and running `nvidia-smi`.
321
+
The `--gpu` flag on `gateway start` enables GPU passthrough. OpenShell auto-selects CDI when enabled on the daemon and falls back to Docker's NVIDIA GPU request path (`--gpus all`) otherwise.
322
+
323
+
Device injection uses CDI (`deviceListStrategy: cdi-cri`): the device plugin injects devices via direct CDI device requests in the CRI. Sandbox pods only need `nvidia.com/gpu: 1` in their resource limits, and GPU pods do not set `runtimeClassName`.
324
+
325
+
The expected smoke test is a plain pod requesting `nvidia.com/gpu: 1` without `runtimeClassName` and running `nvidia-smi`.
320
326
321
327
## Remote Image Transfer
322
328
@@ -383,7 +389,7 @@ When `openshell sandbox create` cannot connect to a gateway (connection refused,
383
389
1.`should_attempt_bootstrap()` in `crates/openshell-cli/src/bootstrap.rs` checks the error type. It returns `true` for connectivity errors and missing default TLS materials, but `false` for TLS handshake/auth errors.
384
390
2. If running in a terminal, the user is prompted to confirm.
385
391
3.`run_bootstrap()` deploys a gateway named `"openshell"`, sets it as active, and returns fresh `TlsOptions` pointing to the newly-written mTLS certs.
386
-
4. When `sandbox create` requests GPU explicitly (`--gpu`) or infers it from an image whose final name component contains `gpu` (such as `nvidia-gpu`), the bootstrap path enables gateway GPU support before retrying sandbox creation.
392
+
4. When `sandbox create` requests GPU explicitly (`--gpu`) or infers it from an image whose final name component contains `gpu` (such as `nvidia-gpu`), the bootstrap path enables gateway GPU support before retrying sandbox creation, using the same CDI-or-fallback selection as `gateway start --gpu`.
In cluster mode, the sandbox starts a background refresh loop as soon as the inference context is created. The loop polls the gateway every 5 seconds by default (`OPENSHELL_ROUTE_REFRESH_INTERVAL_SECS` override) and uses the bundle revision hash to skip no-op cache writes.
109
+
In cluster mode, the sandbox starts a background refresh loop as soon as the inference context is created. The loop polls the gateway every 5 seconds by default (`OPENSHELL_ROUTE_REFRESH_INTERVAL_SECS` override) and uses the bundle revision hash to skip no-op cache writes. The revision hash covers all route fields including `timeout_secs`, so any configuration change (provider, model, or timeout) triggers a cache update on the next poll.
110
110
111
111
### Interception flow
112
112
@@ -143,7 +143,7 @@ If no pattern matches, the proxy returns `403 Forbidden` with `{"error": "connec
143
143
### Route cache
144
144
145
145
-`InferenceContext` holds a `Router`, the pattern list, and an `Arc<RwLock<Vec<ResolvedRoute>>>` route cache.
146
-
- In cluster mode, `spawn_route_refresh()` polls `GetInferenceBundle` every 30 seconds (`ROUTE_REFRESH_INTERVAL_SECS`). On failure, stale routes are kept.
146
+
- In cluster mode, `spawn_route_refresh()` polls `GetInferenceBundle` every 5 seconds (`OPENSHELL_ROUTE_REFRESH_INTERVAL_SECS`). On failure, stale routes are kept.
147
147
- In file mode (`--inference-routes`), routes load once at startup from YAML. No refresh task is spawned.
148
148
- In cluster mode, an empty initial bundle still enables the inference context so the refresh task can pick up later configuration.
Routes with `mock://` scheme endpoints return canned responses without making HTTP requests. Mock responses are protocol-aware (OpenAI chat completion, OpenAI completion, Anthropic messages, or generic JSON). Mock routes include an `x-openshell-mock: true` response header.
211
211
212
-
### HTTP client
212
+
### Per-request timeout
213
213
214
-
The router uses a `reqwest::Client` with a 60-second timeout. Timeouts and connection failures map to `RouterError::UpstreamUnavailable`.
214
+
Each `ResolvedRoute` carries a `timeout` field (`Duration`). The `reqwest::Client` has no global timeout; instead, each outgoing request applies `.timeout(route.timeout)` on the request builder. When `timeout_secs` is `0` in the proto message, the default of 60 seconds is used (defined as `DEFAULT_ROUTE_TIMEOUT` in `config.rs`). Timeouts and connection failures map to `RouterError::UpstreamUnavailable`.
215
+
216
+
Timeout changes propagate dynamically to running sandboxes. The bundle revision hash includes `timeout_secs`, so when the timeout is updated via `openshell inference update --timeout`, the refresh loop detects the revision change and updates the route cache within one polling interval (5 seconds by default).
215
217
216
218
## Standalone Route File
217
219
@@ -297,13 +299,16 @@ The system route is stored as a separate `InferenceRoute` record in the gateway
-`openshell inference get` -- displays both user and system inference configuration
303
306
-`openshell inference get --system` -- displays only the system inference configuration
304
307
305
308
The `--provider` flag references a provider record name (not a provider type). The provider must already exist in the cluster and have a supported inference type (`openai`, `anthropic`, or `nvidia`).
306
309
310
+
The `--timeout` flag sets the per-request timeout in seconds for upstream inference calls. When omitted or set to `0`, the default of 60 seconds applies. Timeout changes propagate to running sandboxes within the route refresh interval (5 seconds by default).
311
+
307
312
Inference writes verify by default. `--no-verify` is the explicit opt-out for endpoints that are not up yet.
Copy file name to clipboardExpand all lines: architecture/sandbox.md
+12-6Lines changed: 12 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -431,15 +431,21 @@ Landlock restricts the child process's filesystem access to an explicit allowlis
431
431
1. Build path lists from `filesystem.read_only` and `filesystem.read_write`
432
432
2. If `include_workdir` is true, add the working directory to `read_write`
433
433
3. If both lists are empty, skip Landlock entirely (no-op)
434
-
4. Create a Landlock ruleset targeting ABI V1:
434
+
4. Create a Landlock ruleset targeting ABI V2:
435
435
- Read-only paths receive `AccessFs::from_read(abi)` rights
436
436
- Read-write paths receive `AccessFs::from_all(abi)` rights
437
-
5. Call `ruleset.restrict_self()` -- this applies to the calling process and all descendants
437
+
5. For each path, attempt `PathFd::new()`. If it fails:
438
+
-`BestEffort`: Log a warning with the error classification (not found, permission denied, symlink loop, etc.) and skip the path. Continue building the ruleset from remaining valid paths.
439
+
-`HardRequirement`: Return a fatal error, aborting the sandbox.
440
+
6. If all paths failed (zero rules applied), return an error rather than calling `restrict_self()` on an empty ruleset (which would block all filesystem access)
441
+
7. Call `ruleset.restrict_self()` -- this applies to the calling process and all descendants
438
442
439
-
Error behavior depends on `LandlockCompatibility`:
443
+
Kernel-level error behavior (e.g., Landlock ABI unavailable) depends on `LandlockCompatibility`:
440
444
-`BestEffort`: Log a warning and continue without filesystem isolation
441
445
-`HardRequirement`: Return a fatal error, aborting the sandbox
442
446
447
+
**Baseline path filtering**: System-injected baseline paths (e.g., `/app`) are pre-filtered by `enrich_proto_baseline_paths()` / `enrich_sandbox_baseline_paths()` using `Path::exists()` before they reach Landlock. User-specified paths are not pre-filtered -- they are evaluated at Landlock apply time so misconfigurations surface as warnings or errors.
|`L7RequestInfo`|`{ action, target, query_params }`| HTTP method, path, and decoded query multimap for policy evaluation |
966
972
967
973
### Access presets
968
974
@@ -1041,7 +1047,7 @@ This enables credential injection on all HTTPS endpoints automatically, without
1041
1047
1042
1048
Implements `L7Provider` for HTTP/1.1:
1043
1049
1044
-
-**`parse_request()`**: Reads up to 16 KiB of headers, parses the request line (method, path), determines body framing from `Content-Length` or `Transfer-Encoding: chunked` headers. Returns `L7Request` with raw header bytes (may include overflow body bytes).
1050
+
-**`parse_request()`**: Reads up to 16 KiB of headers, parses the request line (method, path), decodes query parameters into a multimap, determines body framing from `Content-Length` or `Transfer-Encoding: chunked` headers. Returns `L7Request` with raw header bytes (may include overflow body bytes).
1045
1051
1046
1052
-**`relay()`**: Forwards request headers and body to upstream (handling Content-Length, chunked, and no-body cases), then reads and relays the full response back to the client.
1047
1053
@@ -1054,7 +1060,7 @@ Implements `L7Provider` for HTTP/1.1:
1054
1060
`relay_with_inspection()` in `crates/openshell-sandbox/src/l7/relay.rs` is the main relay loop:
1055
1061
1056
1062
1. Parse one HTTP request from client via the provider
1057
-
2. Build L7 input JSON with `request.method`, `request.path`, plus the CONNECT-level context (host, port, binary, ancestors, cmdline)
1063
+
2. Build L7 input JSON with `request.method`, `request.path`, `request.query_params`, plus the CONNECT-level context (host, port, binary, ancestors, cmdline)
1058
1064
3. Evaluate `data.openshell.sandbox.allow_request` and `data.openshell.sandbox.request_deny_reason`
1059
1065
4. Log the L7 decision (tagged `L7_REQUEST`)
1060
1066
5. If allowed (or audit mode): relay request to upstream and response back to client, then loop
0 commit comments