Skip to content

Commit c094769

Browse files
committed
feat(server): add an inference router (!13)
Closes NVIDIA#3
1 parent cacadd7 commit c094769

39 files changed

Lines changed: 4295 additions & 190 deletions

File tree

Cargo.lock

Lines changed: 277 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,10 @@ nix = { version = "0.29", features = ["signal", "process", "user", "fs", "term"]
5858
serde = { version = "1", features = ["derive"] }
5959
serde_json = "1"
6060
serde_yaml = "0.9"
61+
toml = "0.8"
62+
63+
# HTTP client
64+
reqwest = { version = "0.12", default-features = false, features = ["json", "rustls-tls"] }
6165

6266
# Utilities
6367
futures = "0.3"
Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# Inference Router Plan
2+
3+
Status: In Progress
4+
Date: 2026-02-09
5+
6+
## Goals
7+
8+
- Use entity-managed inference configuration.
9+
- Use `routing_hint` as the user-facing signal.
10+
- Start with a simple and strict v1 model.
11+
- Use fail-closed sandbox authorization.
12+
13+
## Design principles
14+
15+
- `routing_hint` is advisory intent from userland, not an internal route ID.
16+
- Use a single entity in v1 (`InferenceRoute`) to keep control plane simple.
17+
- v1 is intentionally small: 1:1 route-to-model mapping.
18+
- Add richer route modes only after route-only CRUD + tests are solid.
19+
20+
## Naming
21+
22+
- `Inference`: top-level namespace/service name.
23+
- `routing_hint`: request field used by callers.
24+
- `InferenceRoute`: maps hint -> routing behavior.
25+
- `InferencePolicy`: sandbox authorization settings.
26+
27+
## Entity model
28+
29+
### InferenceRoute
30+
31+
Represents how a `routing_hint` resolves.
32+
33+
Fields:
34+
- `id`
35+
- `spec`
36+
- `routing_hint`
37+
- `base_url`
38+
- `protocol` (`openai_chat_completions` initially)
39+
- `api_key` (plaintext for now)
40+
- `model_id`
41+
- `enabled`
42+
43+
Persistence:
44+
- Store as protobuf payloads in the existing `objects` table.
45+
- Route auth is plaintext in v1; future update will replace `api_key` with secret references.
46+
47+
## Request semantics
48+
49+
`CompletionRequest` remains centered on `routing_hint`.
50+
51+
- `routing_hint` is optional/advisory; server resolves an effective route.
52+
- `model_id` is resolved from route `spec.model_id` in v1.
53+
54+
v1 behavior:
55+
- Route contains full upstream target + model config.
56+
- Userland sends `routing_hint` + messages only.
57+
58+
v2 behavior:
59+
- Add optional passthrough model mode for downstream-router scenarios.
60+
61+
## Sandbox policy model
62+
63+
v1 control:
64+
- `allowed_routing_hints`
65+
66+
Planned extension:
67+
- optional `allowed_model_ids`
68+
- optional provider/capability dimensions if needed later
69+
70+
Enforcement order:
71+
1. Authenticate request identity.
72+
2. Resolve route from `routing_hint`.
73+
3. Apply sandbox policy checks.
74+
4. Call upstream.
75+
76+
## API plan (entity management)
77+
78+
Routes:
79+
- `CreateInferenceRoute`
80+
- `UpdateInferenceRoute`
81+
- `DeleteInferenceRoute`
82+
- `ListInferenceRoutes`
83+
84+
CLI:
85+
- Use `nav inference create|update|delete|list` for route CRUD and inspection.
86+
87+
## Router behavior
88+
89+
- Resolve by `routing_hint`.
90+
- Load active route from entity store.
91+
- Support dynamic refresh without restart (watch or polling).
92+
- Perform protocol mapping to upstream API shape.
93+
94+
## Responsibility split
95+
96+
- Server responsibilities:
97+
- authenticate sandbox request
98+
- enforce `InferencePolicy` (`allowed_routing_hints`)
99+
- load enabled, policy-allowed route candidates from store
100+
- Router responsibilities:
101+
- select route from candidate set using request context (`routing_hint` today)
102+
- execute upstream inference call
103+
- remain the single place for future routing logic (fallbacks, scoring, policy-aware strategy)
104+
105+
## AuthZ and governance
106+
107+
Required decisions:
108+
- RBAC for inference entity CRUD.
109+
- Audit trail for route mutations.
110+
111+
## Implementation phases
112+
113+
### Phase 1
114+
- Define entities and protobufs.
115+
- Implement CRUD APIs for routes.
116+
- Implement router resolution from entities.
117+
- Use fixed route-to-model mapping only.
118+
- Enforce sandbox `allowed_routing_hints`.
119+
120+
### Phase 2
121+
- Add optional passthrough-model route mode.
122+
- Add optional per-sandbox `allowed_model_ids`.
123+
- Add richer provider/capability policy dimensions if needed.
124+
125+
## Implementation status
126+
127+
- Done: `InferenceRoute` entity with `spec` shape (`id + spec`).
128+
- Done: route-only gRPC CRUD (`Create/Update/Delete/ListInferenceRoute`).
129+
- Done: completion path resolves route from store by `routing_hint`.
130+
- Done: CLI route CRUD commands (`nav inference create|update|delete|list`).
131+
- Done: sandbox policy enforcement remains `allowed_routing_hints`.
132+
- Pending: expand integration tests for CRUD + completion + policy edges.
133+
134+
## Validation and testing
135+
136+
- Unit: entity validation, route resolution, policy evaluation.
137+
- Integration: CRUD, router refresh, completion path.
138+
- Security: API key redaction in logs/outputs.
139+
140+
## Open decisions
141+
142+
- Missing/unknown `routing_hint`: strict error or default route.
143+
- Whether `routing_hint` must be globally unique or namespaced.
144+
- Principal identity source and RBAC model for non-sandbox clients.
Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
# Sandbox Policy Refactor: Single YAML + Typed Proto + Baked Rules
2+
3+
**Status:** Implemented
4+
**Date:** 2026-02-11
5+
6+
## Goal
7+
8+
Consolidate sandbox policy into a single YAML file parsed by the CLI, transmitted as a fully-typed proto, and consumed by the sandbox with baked-in OPA rules. Eliminate the separate rego data file as a user-facing artifact.
9+
10+
## Design Summary
11+
12+
### Single YAML policy file
13+
14+
The user maintains one file (`dev-sandbox-policy.yaml`) containing everything:
15+
16+
```yaml
17+
version: 1
18+
inference:
19+
allowed_routing_hints:
20+
- local
21+
filesystem:
22+
include_workdir: true
23+
read_only: ["/usr", "/lib"]
24+
read_write: ["/sandbox", "/tmp"]
25+
landlock:
26+
compatibility: best_effort
27+
process:
28+
run_as_user: sandbox
29+
run_as_group: sandbox
30+
network_policies:
31+
claude_code:
32+
endpoints:
33+
- { host: api.anthropic.com, port: 443 }
34+
- { host: statsig.anthropic.com, port: 443 }
35+
- { host: sentry.io, port: 443 }
36+
- { host: raw.githubusercontent.com, port: 443 }
37+
- { host: platform.claude.com, port: 443 }
38+
binaries:
39+
- { path: /usr/local/bin/claude }
40+
gitlab:
41+
endpoints:
42+
- { host: gitlab.com, port: 443 }
43+
- { host: gitlab.mycorp.com, port: 443 }
44+
binaries:
45+
- { path: /usr/bin/glab }
46+
```
47+
48+
This file is **baked into the CLI** as the default (via `include_str!`). Users can override with `--sandbox-policy <path>` or `NAVIGATOR_SANDBOX_POLICY` env var.
49+
50+
### Proto (`sandbox.proto`)
51+
52+
Fully typed, reusing tags 2-5 (no backward-compat constraint):
53+
54+
```protobuf
55+
message SandboxPolicy {
56+
uint32 version = 1;
57+
FilesystemPolicy filesystem = 2;
58+
LandlockPolicy landlock = 3;
59+
ProcessPolicy process = 4;
60+
map<string, NetworkPolicyRule> network_policies = 5;
61+
InferencePolicy inference = 6;
62+
}
63+
```
64+
65+
New messages for network policies: `NetworkPolicyRule`, `NetworkEndpoint`, `NetworkBinary`.
66+
`LandlockPolicy.compatibility` changes from enum to string.
67+
Old `NetworkPolicy`/`NetworkMode`/`ProxyPolicy` removed from proto (sandbox-internal concern).
68+
69+
### Data flow
70+
71+
```
72+
YAML ──[CLI]──> Proto (typed) ──[server stores]──> Proto ──[sandbox fetches via gRPC]──> OPA engine
73+
```
74+
75+
1. **CLI**: Parses YAML, populates typed `SandboxPolicy` proto, sends to server at sandbox creation.
76+
2. **Server**: Stores proto as-is. Reads `inference` field directly for routing authorization. Returns full proto on `GetSandboxPolicy`.
77+
3. **Sandbox**: Fetches proto via gRPC. Converts typed proto fields to JSON, wraps under `{"sandbox": {...}}` key, feeds to `engine.add_data_json()`. Uses baked-in rego rules (via `include_str!`). Rego rules are unchanged — they still reference `data.sandbox.*`.
78+
79+
### Baked-in rego rules
80+
81+
The rego rules file (`dev-sandbox-policy.rego`) is baked into the **sandbox binary** via `include_str!`. The OPA engine is constructed from baked rules + JSON data derived from the proto. The `--rego-policy`/`--rego-data` CLI args on the sandbox binary are kept as dev-only overrides.
82+
83+
### TODO (future)
84+
85+
- Drop rego passthrough rules for filesystem/landlock/process — deserialize directly from proto with serde instead of querying OPA for static config.
86+
- Remove the `--rego-policy`/`--rego-data` sandbox args once the gRPC path is fully proven.
87+
- Delete `dev-sandbox-policy-data.rego` once all tests are migrated to use `from_proto` or inline rego data.
88+
89+
### Questions for review
90+
91+
1. **`dev-sandbox-policy-data.rego` kept for now** — it's still used by the existing `from_strings` OPA tests and the `--rego-policy`/`--rego-data` dev override path. Should we migrate the `from_strings` tests to `from_proto` and remove the file?
92+
2. **`NetworkMode`/`ProxyPolicy` still internal to sandbox** — the sandbox derives `NetworkMode::Proxy` when `network_policies` is non-empty in the proto. The proxy's bind address is still hardcoded/auto-detected. Is this the right default, or should there be an explicit way to set proxy config?
93+
3. **`name` field in `NetworkPolicyRule`** — the proto has both the map key and a `name` field inside the message. The CLI defaults `name` to the map key if not set. Should we remove the `name` field from the proto and just use the map key?
94+
95+
## Implementation Steps
96+
97+
### Step 1: Update `sandbox.proto`
98+
99+
- Rewrite `SandboxPolicy` with typed fields on tags 1-6
100+
- Add new messages: `NetworkPolicyRule`, `NetworkEndpoint`, `NetworkBinary`
101+
- Change `LandlockPolicy.compatibility` from enum to string
102+
- Remove old `NetworkPolicy`, `NetworkMode`, `ProxyPolicy`, `ProxyConfig` messages from proto
103+
- Keep `InferencePolicy`, `GetSandboxPolicyRequest`, `GetSandboxPolicyResponse` as-is
104+
- Keep `LandlockCompatibility` enum removed (replaced by string field)
105+
106+
### Step 2: Regenerate proto code
107+
108+
- Run `mise run build` (or `cargo build -p navigator-core`) to trigger `tonic_build` codegen
109+
- The generated `navigator.sandbox.v1.rs` will reflect the new proto shape
110+
111+
### Step 3: Update `dev-sandbox-policy.yaml`
112+
113+
- Expand to include all policy fields (filesystem, landlock, process, network_policies)
114+
- This becomes the single source of truth
115+
116+
### Step 4: Update CLI (`navigator-cli`)
117+
118+
- Bake `dev-sandbox-policy.yaml` via `include_str!` in `run.rs`
119+
- Rewrite `DevSandboxPolicyFile` struct and `load_dev_sandbox_policy()` to match new YAML shape
120+
- Convert parsed YAML → typed `SandboxPolicy` proto (using new proto messages)
121+
- Support `--sandbox-policy <path>` flag / `NAVIGATOR_SANDBOX_POLICY` env var to override
122+
- Update `print_sandbox_policy()` and `policy_to_yaml()` for the new proto shape
123+
- Remove old `DevFilesystemPolicy`, `DevNetworkPolicy`, `DevProxyPolicy`, `DevLandlockPolicy`, `DevProcessPolicy` structs (replace with new ones matching the flat YAML)
124+
125+
### Step 5: Update sandbox (`navigator-sandbox`)
126+
127+
**`policy.rs`:**
128+
- Update `SandboxPolicy` (internal) and `TryFrom<ProtoSandboxPolicy>` conversion
129+
- `NetworkMode` / `NetworkPolicy` / `ProxyPolicy` remain as internal types but are no longer derived from proto — instead, the sandbox sets `NetworkMode::Proxy` when `network_policies` is non-empty, `NetworkMode::Block` otherwise
130+
- Update `FilesystemPolicy`, `LandlockPolicy`, `ProcessPolicy` conversions for new proto shape
131+
132+
**`opa.rs`:**
133+
- Bake `dev-sandbox-policy.rego` via `include_str!` as `const BAKED_POLICY_RULES: &str`
134+
- Add a new constructor: `OpaEngine::from_policy_proto(proto: &ProtoSandboxPolicy) -> Result<Self>` that:
135+
1. Loads baked rules via `engine.add_policy()`
136+
2. Converts proto `network_policies` (and filesystem/landlock/process for rego passthrough compatibility) to JSON matching the `data.sandbox.*` shape the rego rules expect
137+
3. Loads the JSON via `engine.add_data_json()`
138+
- Keep `from_files()` and `from_strings()` for dev/testing
139+
140+
**`lib.rs` (`load_policy`):**
141+
- In gRPC mode: after fetching proto, construct `OpaEngine` from proto (using new constructor) instead of returning `None`
142+
- In rego file mode: keep as-is (dev override)
143+
144+
### Step 6: Update server (`navigator-server`)
145+
146+
**`inference.rs`:**
147+
- The `InferencePolicy` extraction path stays the same (it reads `sandbox.spec.policy.inference`)
148+
- No changes needed — the server is a passthrough for the policy proto
149+
150+
**`grpc.rs`:**
151+
- `GetSandboxPolicy` handler stays the same — returns stored proto
152+
153+
### Step 7: Fix compilation across crates
154+
155+
- `navigator-sandbox/src/process.rs` — update references to `policy.network.mode`
156+
- `navigator-sandbox/src/sandbox/linux/seccomp.rs` — same
157+
- `navigator-sandbox/src/proxy.rs` — update `ProxyPolicy` usage
158+
- `navigator-sandbox/src/ssh.rs` — update `SandboxPolicy` usage
159+
- Test files in `navigator-cli/tests/` and `navigator-server/tests/` — update mock `SandboxPolicy` construction
160+
- `navigator-core/src/proto/mod.rs` — update re-exports if needed
161+
162+
### Step 8: Delete obsolete files
163+
164+
- `dev-sandbox-policy-data.rego` — replaced by YAML → proto → JSON flow
165+
- The rego rules file (`dev-sandbox-policy.rego`) stays in the repo but is now baked into the sandbox binary
166+
167+
### Step 9: Tests
168+
169+
- Update existing OPA engine tests to work with proto-based constructor
170+
- Update CLI policy loading tests
171+
- Update server integration test mocks for new proto shape
172+
- Verify `mise run test:rust` passes
173+
174+
### Step 10: Pre-commit and build verification
175+
176+
- `mise run pre-commit`
177+
- `mise run build`
178+
- `mise run test`

build/python.toml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,3 +38,20 @@ run = "uv run ruff format {{vars.python_paths}}"
3838
["python:typecheck"]
3939
description = "Type check Python code with ty"
4040
run = "uv run ty check {{vars.python_paths}}"
41+
42+
["python:proto"]
43+
description = "Generate Python protobuf stubs from .proto files"
44+
env = { UV_NO_SYNC = "1" }
45+
run = """
46+
#!/usr/bin/env bash
47+
set -euo pipefail
48+
uv run python -m grpc_tools.protoc \
49+
-Iproto \
50+
--python_out=python/navigator/_proto \
51+
--pyi_out=python/navigator/_proto \
52+
--grpc_python_out=python/navigator/_proto \
53+
proto/inference.proto
54+
# Fix absolute imports in generated gRPC stubs to use relative imports
55+
sed -i '' 's/^import inference_pb2/from . import inference_pb2/' \
56+
python/navigator/_proto/inference_pb2_grpc.py
57+
"""

0 commit comments

Comments
 (0)