Skip to content

Commit f362963

Browse files
committed
feat(sandbox): add provider entity to support configuring tools such as claude, outlook, etc (!23)
## Summary - Add `Provider` entity for managing 3p deps from a sandbox - Add provider CRUD API/server persistence and new CLI workflows (`nav provider create/get/list/update/delete`), including `--from-existing` laptop discovery. - Integrate providers into sandbox create flow: infer from command (`-- claude`), support repeatable `--provider <type>`, prompt before auto-create, and allow manual in-sandbox setup. - Add a dedicated `navigator-providers` crate with per-provider modules and mockable discovery test helpers. ## Key UX Changes - `nav sandbox create --provider gitlab -- claude` - Missing provider prompt now asks before creating from local state. - `nav provider list --names` for scripting/cleanup. ## Test Plan - `mise run cluster:deploy` - `mise run test:e2e:sandbox` - `mise run pre-commit` Closes NVIDIA#19 Closes NVIDIA#22 Closes NVIDIA#11
1 parent 58beed8 commit f362963

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+3244
-292
lines changed

.agent/skills/debug-navigator-cluster/SKILL.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Diagnose why a navigator cluster failed to start after `nav cluster admin deploy
2020
7. **Prepare local images** (if `NAVIGATOR_PUSH_IMAGES` is set): In `internal` registry mode, bootstrap waits for the in-cluster registry and pushes tagged images there. In `external` mode, bootstrap uses legacy `ctr -n k8s.io images import` push-mode behavior.
2121
8. Wait for cluster health checks to pass (up to 6 min):
2222
- k3s API server readiness (`/readyz`)
23-
- `navigator` deployment available in `navigator` namespace
23+
- `navigator` statefulset ready in `navigator` namespace
2424
- `navigator-gateway` Gateway programmed in `navigator` namespace
2525
- If TLS enabled: `navigator-cli-client` secret exists with cert data
2626
9. Extract mTLS credentials if TLS is enabled (up to 3 min)
@@ -129,19 +129,19 @@ If `/readyz` fails, k3s is still starting or has crashed. Check container logs (
129129

130130
If pods are in `CrashLoopBackOff`, `ImagePullBackOff`, or `Pending`, investigate those pods specifically.
131131

132-
### Step 4: Check Navigator Server Deployment
132+
### Step 4: Check Navigator Server StatefulSet
133133

134-
The Navigator server is deployed via a HelmChart CR. Check its status:
134+
The Navigator server is deployed via a HelmChart CR as a StatefulSet with persistent storage. Check its status:
135135

136136
```bash
137-
# Deployment status
138-
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator get deployment/navigator -o wide'
137+
# StatefulSet status
138+
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator get statefulset/navigator -o wide'
139139

140140
# Navigator pod logs
141-
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator logs deployment/navigator --tail=100'
141+
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator logs statefulset/navigator --tail=100'
142142

143-
# Describe deployment for events
144-
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator describe deployment/navigator'
143+
# Describe statefulset for events
144+
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator describe statefulset/navigator'
145145

146146
# Helm install job logs (the job that installs the Navigator chart)
147147
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n kube-system logs -l job-name=helm-install-navigator --tail=200'
@@ -286,7 +286,7 @@ If DNS is broken, all image pulls from the distribution registry will fail, as w
286286
| Container exited, OOMKilled | Insufficient memory | Increase host memory or reduce workload |
287287
| Container exited, non-zero exit | k3s crash, port conflict, privilege issue | Check `docker logs` and `docker inspect` for details |
288288
| `/readyz` fails | k3s still starting or crashed | Wait longer or check container logs for k3s errors |
289-
| Navigator pods `Pending` | Insufficient CPU/memory for scheduling | Check `kubectl describe pod` for scheduling failures |
289+
| Navigator pods `Pending` | Insufficient CPU/memory for scheduling, or PVC not bound | Check `kubectl describe pod` for scheduling failures and `kubectl get pvc -n navigator` for volume status |
290290
| Navigator pods `CrashLoopBackOff` | Server application error | Check `kubectl logs` on the crashing pod |
291291
| Navigator pods `ImagePullBackOff` (push mode) | Images not imported or wrong containerd namespace | Check `k3s ctr -n k8s.io images ls` for component images (Step 6) |
292292
| Navigator pods `ImagePullBackOff` (pull mode) | Registry auth or DNS issue | Check `/etc/rancher/k3s/registries.yaml` credentials and DNS (Step 8) |
@@ -364,8 +364,8 @@ run docker exec "${CONTAINER}" sh -lc "${KCFG} kubectl get pods -A -o wide" 2>&1
364364
echo "=== Failing Pods ==="
365365
run docker exec "${CONTAINER}" sh -lc "${KCFG} kubectl get pods -A --field-selector=status.phase!=Running,status.phase!=Succeeded" 2>&1
366366

367-
echo "=== Navigator Deployment ==="
368-
run docker exec "${CONTAINER}" sh -lc "${KCFG} kubectl -n navigator get deployment/navigator -o wide" 2>&1
367+
echo "=== Navigator StatefulSet ==="
368+
run docker exec "${CONTAINER}" sh -lc "${KCFG} kubectl -n navigator get statefulset/navigator -o wide" 2>&1
369369

370370
echo "=== Navigator Gateway ==="
371371
run docker exec "${CONTAINER}" sh -lc "${KCFG} kubectl -n navigator get gateway/navigator-gateway" 2>&1

.gitlab-ci.yml

Lines changed: 52 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -140,28 +140,7 @@ build_ci_image:
140140
# =============================================================================
141141
# Lint Jobs
142142
# =============================================================================
143-
fmt_check:
144-
extends: .rust_job_rules
145-
stage: lint
146-
rules:
147-
- if: $CI_COMMIT_TAG
148-
when: never
149-
- when: on_success
150-
script:
151-
- mise run fmt:check
152-
153-
clippy:
154-
extends: .rust_job_rules
155-
stage: lint
156-
rules:
157-
- if: $CI_COMMIT_TAG
158-
when: never
159-
- when: on_success
160-
script:
161-
- mise run clippy
162-
163-
python_lint:
164-
extends: .python_job_rules
143+
lint:
165144
stage: lint
166145
rules:
167146
- if: $CI_COMMIT_TAG
@@ -170,7 +149,7 @@ python_lint:
170149
before_script:
171150
- uv sync --frozen
172151
script:
173-
- mise run python:lint
152+
- mise run lint
174153

175154
# =============================================================================
176155
# Test Jobs
@@ -183,7 +162,7 @@ rust_test:
183162
when: never
184163
- when: on_success
185164
script:
186-
- mise run test:rust
165+
- PATH="/root/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" /root/.cargo/bin/cargo test --workspace
187166

188167
python_test:
189168
extends: .python_job_rules
@@ -195,7 +174,53 @@ python_test:
195174
before_script:
196175
- uv sync --frozen
197176
script:
198-
- mise run test:python
177+
- |
178+
uv run python -m grpc_tools.protoc \
179+
-Iproto \
180+
--python_out=python/navigator/_proto \
181+
--pyi_out=python/navigator/_proto \
182+
--grpc_python_out=python/navigator/_proto \
183+
proto/inference.proto \
184+
proto/navigator.proto \
185+
proto/datamodel.proto \
186+
proto/sandbox.proto
187+
- |
188+
uv run python - <<'PY'
189+
from pathlib import Path
190+
import re
191+
192+
line_rewrites = {
193+
"python/navigator/_proto/inference_pb2_grpc.py": [
194+
(r"^import inference_pb2 as inference__pb2$", "from . import inference_pb2 as inference__pb2"),
195+
],
196+
"python/navigator/_proto/navigator_pb2_grpc.py": [
197+
(r"^import navigator_pb2 as navigator__pb2$", "from . import navigator_pb2 as navigator__pb2"),
198+
(r"^import sandbox_pb2 as sandbox__pb2$", "from . import sandbox_pb2 as sandbox__pb2"),
199+
],
200+
"python/navigator/_proto/navigator_pb2.py": [
201+
(r"^import datamodel_pb2 as datamodel__pb2$", "from . import datamodel_pb2 as datamodel__pb2"),
202+
(r"^import sandbox_pb2 as sandbox__pb2$", "from . import sandbox_pb2 as sandbox__pb2"),
203+
],
204+
"python/navigator/_proto/datamodel_pb2.py": [
205+
(r"^import sandbox_pb2 as sandbox__pb2$", "from . import sandbox_pb2 as sandbox__pb2"),
206+
],
207+
"python/navigator/_proto/datamodel_pb2_grpc.py": [
208+
(r"^import datamodel_pb2 as datamodel__pb2$", "from . import datamodel_pb2 as datamodel__pb2"),
209+
],
210+
"python/navigator/_proto/sandbox_pb2_grpc.py": [
211+
(r"^import sandbox_pb2 as sandbox__pb2$", "from . import sandbox_pb2 as sandbox__pb2"),
212+
],
213+
}
214+
215+
for path, rules in line_rewrites.items():
216+
file_path = Path(path)
217+
text = file_path.read_text()
218+
text = text.replace("from . from . import", "from . import")
219+
for pattern, replacement in rules:
220+
text = re.sub(pattern, replacement, text, flags=re.MULTILINE)
221+
file_path.write_text(text)
222+
PY
223+
- uv run pytest python/
199224

200225
python_e2e_sandbox_test:
201226
extends: .e2e_job_rules
@@ -222,8 +247,8 @@ python_e2e_sandbox_test:
222247
script:
223248
- socat UNIX-LISTEN:/var/run/docker.sock,fork,reuseaddr TCP:docker:2375 &
224249
- sleep 1
225-
- mise run cluster
226-
- mise run test:e2e:sandbox
250+
- mise run --no-prepare cluster
251+
- mise run --no-prepare test:e2e:sandbox
227252

228253
# =============================================================================
229254
# Publish Jobs

Cargo.lock

Lines changed: 13 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

architecture/providers.md

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
# Providers
2+
3+
## Overview
4+
5+
Navigator uses a first-class `Provider` entity to represent external tool credentials and
6+
configuration (for example `claude`, `gitlab`, `github`, `outlook`).
7+
8+
Providers exist as an abstraction layer for configuring tools that rely on third-party
9+
access. Rather than each tool managing its own credentials and service configuration,
10+
providers centralize that concern: a user configures a provider once, and any sandbox that
11+
needs that external service can reference it.
12+
13+
At sandbox creation time, providers configure the sandbox environment with the
14+
credentials and settings the tool needs. Access is then enforced through the sandbox
15+
policy — the policy decides which outbound requests are allowed or denied based on
16+
the providers attached to that sandbox.
17+
18+
Core goals:
19+
20+
- manage providers directly via CLI,
21+
- discover provider data from the local machine automatically,
22+
- require providers during sandbox creation,
23+
- project provider context into sandbox runtime,
24+
- drive sandbox policy to allow or deny outbound access to third-party services.
25+
26+
## Data Model
27+
28+
Provider is defined in `proto/datamodel.proto`:
29+
30+
- `id`: unique entity id
31+
- `name`: user-managed name
32+
- `type`: canonical provider slug (`claude`, `gitlab`, `github`, etc.)
33+
- `credentials`: `map<string, string>` for secret values
34+
- `config`: `map<string, string>` for non-secret settings
35+
36+
The gRPC surface is defined in `proto/navigator.proto`:
37+
38+
- `CreateProvider`
39+
- `GetProvider`
40+
- `ListProviders`
41+
- `UpdateProvider`
42+
- `DeleteProvider`
43+
44+
## Components
45+
46+
- `crates/navigator-providers`
47+
- canonical provider type normalization and command detection,
48+
- provider registry and per-provider discovery plugins,
49+
- shared discovery engine and context abstraction for testability.
50+
- `crates/navigator-cli`
51+
- `nav provider ...` command handlers,
52+
- sandbox provider requirement resolution in `sandbox create`.
53+
- `crates/navigator-server`
54+
- provider CRUD gRPC handlers,
55+
- persistence using `object_type = "provider"`.
56+
57+
## Provider Plugins
58+
59+
Each provider has its own module under `crates/navigator-providers/src/providers/`.
60+
61+
Current modules:
62+
63+
- `claude.rs`
64+
- `codex.rs`
65+
- `opencode.rs`
66+
- `openclaw.rs`
67+
- `gitlab.rs`
68+
- `github.rs`
69+
- `outlook.rs`
70+
71+
Each plugin defines:
72+
73+
- canonical `id()`,
74+
- discovery spec (env vars + config paths),
75+
- `discover_existing()` behavior.
76+
77+
The registry is assembled in `ProviderRegistry::new()` by registering each provider module.
78+
79+
## Discovery Architecture
80+
81+
Discovery behavior is split into three layers:
82+
83+
1. provider module defines static spec (`ProviderDiscoverySpec`),
84+
2. shared engine (`discover_with_spec`) performs env/file scanning,
85+
3. runtime context (`DiscoveryContext`) supplies filesystem/environment reads.
86+
87+
`DiscoveryContext` has:
88+
89+
- `RealDiscoveryContext` for production runtime,
90+
- `MockDiscoveryContext` test helper for deterministic tests.
91+
92+
This keeps provider tests isolated from host environment and filesystem.
93+
94+
## CLI Flows
95+
96+
### Provider CRUD
97+
98+
`nav provider create --type <type> --name <name> [--from-existing] [--credential k=v]... [--config k=v]...`
99+
100+
- `--from-existing` merges discovered laptop data into explicit CLI key-value args.
101+
- Explicit `--credential` / `--config` values take precedence.
102+
103+
Also supported:
104+
105+
- `nav provider get <name>`
106+
- `nav provider list`
107+
- `nav provider update <name> ...`
108+
- `nav provider delete <name> [<name>...]`
109+
110+
### Sandbox Create
111+
112+
`nav sandbox create --provider gitlab -- claude`
113+
114+
Resolution logic:
115+
116+
1. infer provider from command token after `--` (for example `claude`),
117+
2. union with explicit `--provider <type>` flags,
118+
3. ensure each required provider type exists,
119+
4. if interactive and missing, auto-create from existing local state,
120+
5. set `NAVIGATOR_PROVIDER_TYPES` in sandbox spec environment.
121+
122+
Non-interactive mode fails with a clear missing-provider error.
123+
124+
> **Note:** Providers can also be configured from within the sandbox itself. This allows
125+
> sandbox users to set up or update provider credentials and configuration at runtime,
126+
> without requiring them to be fully resolved before sandbox creation.
127+
128+
## Persistence and Validation
129+
130+
Server enforces:
131+
132+
- `provider.type` must be non-empty,
133+
- name uniqueness for providers,
134+
- generated `id` on create,
135+
- id preservation on update.
136+
137+
Providers are stored with `object_type = "provider"` in the shared object store.
138+
139+
## Security Notes
140+
141+
- Provider credentials are stored in `credentials` map and treated as sensitive.
142+
- CLI output intentionally avoids printing credential values.
143+
- CLI displays only non-sensitive summaries (counts/key names where relevant).
144+
145+
## Test Strategy
146+
147+
- Per-provider unit tests in each provider module.
148+
- Shared normalization/command-detection tests in `crates/navigator-providers/src/lib.rs`.
149+
- Mocked discovery context tests cover env and path-based behavior.
150+
- CLI and server integration tests validate end-to-end RPC compatibility.

architecture/server-persistence.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,21 @@ database URL scheme.
88

99
Supported backends:
1010
- Postgres (`postgres://` or `postgresql://`)
11-
- SQLite (`sqlite:`)
11+
- SQLite (`sqlite:`) — file-backed (default) or in-memory
1212

1313
The server requires a database URL. The CLI enforces `--db-url` / `NAVIGATOR_DB_URL`, and
1414
`run_server` will reject an empty value.
1515

16-
Example in-memory SQLite URL:
17-
`sqlite::memory:?cache=shared`
16+
The default database URL is `sqlite:/var/navigator/navigator.db`, which stores data on a persistent
17+
volume. In-memory SQLite (`sqlite::memory:?cache=shared`) can be used for ephemeral environments
18+
but data will be lost on pod restart.
19+
20+
## Deployment Storage
21+
22+
The Navigator server runs as a **StatefulSet** with a `volumeClaimTemplate` that provisions a 1Gi
23+
`ReadWriteOnce` PersistentVolumeClaim mounted at `/var/navigator`. On k3s clusters this uses the
24+
built-in `local-path-provisioner` StorageClass. The SQLite database file is stored at
25+
`/var/navigator/navigator.db` and survives pod restarts and rescheduling.
1826

1927
## Data Model
2028

0 commit comments

Comments
 (0)