Skip to content

Commit d9c2bcc

Browse files
johntmyerslinuxdevel
authored andcommitted
feat(settings): gateway-to-sandbox runtime settings channel (NVIDIA#474)
* feat(gateway/sandbox): add global and sandbox runtime settings flow
1 parent 0e13a5c commit d9c2bcc

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+6060
-500
lines changed

.github/workflows/docker-build.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,9 @@ jobs:
126126
env:
127127
DOCKER_BUILDER: openshell
128128
OPENSHELL_CARGO_VERSION: ${{ steps.version.outputs.cargo_version }}
129+
# Enable dev-settings feature for test settings (dummy_bool, dummy_int)
130+
# used by e2e tests.
131+
EXTRA_CARGO_FEATURES: openshell-core/dev-settings
129132
run: mise run --no-prepare docker:build:${{ inputs.component }}
130133

131134
- name: Build cluster image

architecture/README.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -235,17 +235,19 @@ Sandbox behavior is governed by policies written in YAML and evaluated by an emb
235235

236236
Inference routing to `inference.local` is configured separately at the cluster level and does not require network policy entries. The OPA engine evaluates only explicit network policies; `inference.local` connections bypass OPA entirely and are handled by the proxy's dedicated inference interception path.
237237

238-
Policies are not intended to be hand-edited by end users in normal operation. They are associated with sandboxes at creation time and fetched by the sandbox supervisor at startup via gRPC. For development and testing, policies can also be loaded from local files.
238+
Policies are not intended to be hand-edited by end users in normal operation. They are associated with sandboxes at creation time and fetched by the sandbox supervisor at startup via gRPC. For development and testing, policies can also be loaded from local files. A gateway-global policy can override all sandbox policies via `openshell policy set --global`.
239239

240-
For more detail, see [Policy Language](security-policy.md).
240+
In addition to policy, the gateway delivers runtime **settings** -- typed key-value pairs (e.g., `log_level`) that can be configured per-sandbox or globally. Settings and policy are delivered together through the `GetSandboxSettings` RPC and tracked by a single `config_revision` fingerprint. See [Gateway Settings Channel](gateway-settings.md) for details.
241+
242+
For more detail on the policy language, see [Policy Language](security-policy.md).
241243

242244
### Command-Line Interface
243245

244246
The CLI is the primary way users interact with the platform. It provides commands organized into four groups:
245247

246248
- **Gateway management** (`openshell gateway`): Deploy, stop, destroy, and inspect clusters. Supports both local and remote (SSH) targets.
247249
- **Sandbox management** (`openshell sandbox`): Create sandboxes (with optional file upload and provider auto-discovery), connect to sandboxes via SSH, and delete sandboxes.
248-
- **Top-level commands**: `openshell status` (cluster health), `openshell logs` (sandbox logs), `openshell forward` (port forwarding), `openshell policy` (sandbox policy management).
250+
- **Top-level commands**: `openshell status` (cluster health), `openshell logs` (sandbox logs), `openshell forward` (port forwarding), `openshell policy` (sandbox policy management), `openshell settings` (effective sandbox settings and global/sandbox key updates).
249251
- **Provider management** (`openshell provider`): Create, update, list, and delete external service credentials.
250252
- **Inference management** (`openshell cluster inference`): Configure cluster-level inference by specifying a provider and model. The gateway resolves endpoint and credential details from the named provider record.
251253

@@ -312,4 +314,5 @@ This opens an interactive SSH session into the sandbox, with all provider creden
312314
| [Policy Language](security-policy.md) | The YAML/Rego policy system that governs sandbox behavior. |
313315
| [Inference Routing](inference-routing.md) | Transparent interception and sandbox-local routing of AI inference API calls to configured backends. |
314316
| [System Architecture](system-architecture.md) | Top-level system architecture diagram with all deployable components and communication flows. |
317+
| [Gateway Settings Channel](gateway-settings.md) | Runtime settings channel: two-tier key-value configuration, global policy override, settings registry, CLI/TUI commands. |
315318
| [TUI](tui.md) | Terminal user interface for sandbox interaction. |

architecture/gateway-security.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -229,7 +229,7 @@ These are used to build a `tonic::transport::ClientTlsConfig` with:
229229
- `identity()` -- presents the shared client certificate for mTLS.
230230

231231
The sandbox calls two RPCs over this authenticated channel:
232-
- `GetSandboxPolicy` -- fetches the YAML policy that governs the sandbox's behavior.
232+
- `GetSandboxSettings` -- fetches the YAML policy that governs the sandbox's behavior.
233233
- `GetSandboxProviderEnvironment` -- fetches provider credentials as environment variables.
234234

235235
## SSH Tunnel Authentication

architecture/gateway-settings.md

Lines changed: 561 additions & 0 deletions
Large diffs are not rendered by default.

architecture/gateway.md

Lines changed: 23 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ Proto definitions consumed by the gateway:
8282
| `proto/openshell.proto` | `openshell.v1` | `OpenShell` service, sandbox/provider/SSH/watch messages |
8383
| `proto/inference.proto` | `openshell.inference.v1` | `Inference` service: `SetClusterInference`, `GetClusterInference`, `GetInferenceBundle` |
8484
| `proto/datamodel.proto` | `openshell.datamodel.v1` | `Sandbox`, `SandboxSpec`, `SandboxStatus`, `Provider`, `SandboxPhase` |
85-
| `proto/sandbox.proto` | `openshell.sandbox.v1` | `SandboxPolicy`, `NetworkPolicyRule` |
85+
| `proto/sandbox.proto` | `openshell.sandbox.v1` | `SandboxPolicy`, `NetworkPolicyRule`, `SettingValue`, `EffectiveSetting`, `SettingScope`, `PolicySource`, `GetSandboxSettingsRequest/Response`, `GetGatewaySettingsRequest/Response` |
8686

8787
## Startup Sequence
8888

@@ -141,6 +141,9 @@ pub struct ServerState {
141141
pub sandbox_index: SandboxIndex,
142142
pub sandbox_watch_bus: SandboxWatchBus,
143143
pub tracing_log_bus: TracingLogBus,
144+
pub ssh_connections_by_token: Mutex<HashMap<String, u32>>,
145+
pub ssh_connections_by_sandbox: Mutex<HashMap<String, u32>>,
146+
pub settings_mutex: tokio::sync::Mutex<()>,
144147
}
145148
```
146149

@@ -149,6 +152,7 @@ pub struct ServerState {
149152
- **`sandbox_index`** -- in-memory bidirectional index mapping sandbox names and agent pod names to sandbox IDs. Used by the event tailer to correlate Kubernetes events.
150153
- **`sandbox_watch_bus`** -- `broadcast`-based notification bus keyed by sandbox ID. Producers call `notify(&id)` when the persisted sandbox record changes; consumers in `WatchSandbox` streams receive `()` signals and re-read the record.
151154
- **`tracing_log_bus`** -- captures `tracing` events that include a `sandbox_id` field and republishes them as `SandboxLogLine` messages. Maintains a per-sandbox tail buffer (default 200 entries). Also contains a nested `PlatformEventBus` for Kubernetes events.
155+
- **`settings_mutex`** -- serializes settings mutations (global and sandbox) to prevent read-modify-write races. Held for the duration of any setting set/delete or global policy set/delete operation. See [Gateway Settings Channel](gateway-settings.md#global-policy-lifecycle).
152156

153157
## Protocol Multiplexing
154158

@@ -225,13 +229,14 @@ Full CRUD for `Provider` objects, which store typed credentials (e.g., API keys
225229
| `UpdateProvider` | Updates an existing provider by name. Preserves the stored `id` and `name`; replaces `type`, `credentials`, and `config`. |
226230
| `DeleteProvider` | Deletes a provider by name. Returns `deleted: true/false`. |
227231

228-
#### Policy and Provider Environment Delivery
232+
#### Policy, Settings, and Provider Environment Delivery
229233

230-
These RPCs are called by sandbox pods at startup to bootstrap themselves.
234+
These RPCs are called by sandbox pods at startup and during runtime polling.
231235

232236
| RPC | Description |
233237
|-----|-------------|
234-
| `GetSandboxPolicy` | Returns the `SandboxPolicy` from a sandbox's spec, looked up by sandbox ID. |
238+
| `GetSandboxSettings` | Returns effective sandbox config looked up by sandbox ID: policy payload, policy metadata (version, hash, source, `global_policy_version`), merged effective settings, and a `config_revision` fingerprint for change detection. Two-tier resolution: registered keys start unset, sandbox values overlay, global values override. The reserved `policy` key in global settings can override the sandbox's own policy. When a global policy is active, `policy_source` is `GLOBAL` and `global_policy_version` carries the active revision number. See [Gateway Settings Channel](gateway-settings.md). |
239+
| `GetGatewaySettings` | Returns gateway-global settings only (excluding the reserved `policy` key). Returns registered keys with empty values when unconfigured, and a monotonic `settings_revision`. |
235240
| `GetSandboxProviderEnvironment` | Resolves provider credentials into environment variables for a sandbox. Iterates the sandbox's `spec.providers` list, fetches each `Provider`, and collects credential key-value pairs. First provider wins on duplicate keys. Skips credential keys that do not match `^[A-Za-z_][A-Za-z0-9_]*$`. |
236241

237242
#### Policy Recommendation (Network Rules)
@@ -242,9 +247,9 @@ These RPCs support the sandbox-initiated policy recommendation pipeline. The san
242247
|-----|-------------|
243248
| `SubmitPolicyAnalysis` | Receives pre-formed `PolicyChunk` proposals from a sandbox. Validates each chunk, persists via upsert on `(sandbox_id, host, port, binary)` dedup key, notifies watch bus. |
244249
| `GetDraftPolicy` | Returns all draft chunks for a sandbox with current draft version. |
245-
| `ApproveDraftChunk` | Approves a pending or rejected chunk. Merges the proposed rule into the active policy (appends binary to existing rule or inserts new rule). |
246-
| `RejectDraftChunk` | Rejects a pending chunk or revokes an approved chunk. If revoking, removes the binary from the active policy rule. |
247-
| `ApproveAllDraftChunks` | Bulk approves all pending chunks for a sandbox. |
250+
| `ApproveDraftChunk` | Approves a pending or rejected chunk. Merges the proposed rule into the active policy (appends binary to existing rule or inserts new rule). **Blocked when a global policy is active** -- returns `FailedPrecondition`. |
251+
| `RejectDraftChunk` | Rejects a pending chunk or revokes an approved chunk. If revoking, removes the binary from the active policy rule. Rejection of `pending` chunks is always allowed. **Revoking approved chunks is blocked when a global policy is active** -- returns `FailedPrecondition`. |
252+
| `ApproveAllDraftChunks` | Bulk approves all pending chunks for a sandbox. **Blocked when a global policy is active** -- returns `FailedPrecondition`. |
248253
| `EditDraftChunk` | Updates the proposed rule on a pending chunk. |
249254
| `GetDraftHistory` | Returns all chunks (including rejected) for audit trail. |
250255

@@ -457,12 +462,16 @@ Objects are identified by `(object_type, id)` with a unique constraint on `(obje
457462

458463
### Object Types
459464

460-
| Object type string | Proto message | Traits implemented |
461-
|--------------------|---------------|-------------------|
462-
| `"sandbox"` | `Sandbox` | `ObjectType`, `ObjectId`, `ObjectName` |
463-
| `"provider"` | `Provider` | `ObjectType`, `ObjectId`, `ObjectName` |
464-
| `"ssh_session"` | `SshSession` | `ObjectType`, `ObjectId`, `ObjectName` |
465-
| `"inference_route"` | `InferenceRoute` | `ObjectType`, `ObjectId`, `ObjectName` |
465+
| Object type string | Proto message / format | Traits implemented | Notes |
466+
|--------------------|------------------------|-------------------|-------|
467+
| `"sandbox"` | `Sandbox` | `ObjectType`, `ObjectId`, `ObjectName` | |
468+
| `"provider"` | `Provider` | `ObjectType`, `ObjectId`, `ObjectName` | |
469+
| `"ssh_session"` | `SshSession` | `ObjectType`, `ObjectId`, `ObjectName` | |
470+
| `"inference_route"` | `InferenceRoute` | `ObjectType`, `ObjectId`, `ObjectName` | |
471+
| `"gateway_settings"` | JSON `StoredSettings` | Generic `put`/`get` | Singleton, id=`"global"`. Contains the reserved `policy` key for global policy delivery. |
472+
| `"sandbox_settings"` | JSON `StoredSettings` | Generic `put`/`get` | Per-sandbox, id=`"settings:{sandbox_uuid}"` |
473+
474+
The `sandbox_policies` table stores versioned policy revisions for both sandbox-scoped and global policies. Global revisions use the sentinel `sandbox_id = "__global__"`. See [Gateway Settings Channel](gateway-settings.md#storage-model) for schema details.
466475

467476
### Generic Protobuf Codec
468477

@@ -559,6 +568,7 @@ Updated by the sandbox watcher on every Applied event and by gRPC handlers durin
559568
## Cross-References
560569

561570
- [Sandbox Architecture](sandbox.md) -- sandbox-side policy enforcement, proxy, and isolation details
571+
- [Gateway Settings Channel](gateway-settings.md) -- runtime settings channel, two-tier resolution, CLI/TUI commands
562572
- [Inference Routing](inference-routing.md) -- end-to-end inference interception flow, sandbox-side proxy logic, and route resolution
563573
- [Container Management](build-containers.md) -- how sandbox container images are built and configured
564574
- [Sandbox Connect](sandbox-connect.md) -- client-side SSH connection flow

architecture/sandbox-providers.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -304,7 +304,7 @@ variables (injected into the pod spec by the gateway's Kubernetes sandbox creati
304304

305305
In `run_sandbox()` (`crates/openshell-sandbox/src/lib.rs`):
306306

307-
1. loads the sandbox policy via gRPC (`GetSandboxPolicy`),
307+
1. loads the sandbox policy via gRPC (`GetSandboxSettings`),
308308
2. fetches provider credentials via gRPC (`GetSandboxProviderEnvironment`),
309309
3. if the fetch fails, continues with an empty map (graceful degradation with a warning).
310310

0 commit comments

Comments
 (0)