You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
`InferenceProviderProfile` is the single source of truth for provider-specific inference knowledge: default endpoint, supported protocols, credential key lookup order, auth header style, and default headers.
Each profile also defines `credential_key_names` (e.g. `["OPENAI_API_KEY"]`) and `base_url_config_keys` (e.g. `["OPENAI_BASE_URL"]`) used by the gateway to resolve credentials and endpoint overrides from provider records.
54
+
Each profile also defines `credential_key_names` (e.g. `["OPENAI_API_KEY"]`) and `base_url_config_keys` (e.g. `["OPENAI_BASE_URL"]`) used by the gateway to resolve credentials and endpoint overrides from provider records. The Ollama profile uses `OLLAMA_API_KEY` for credentials and checks both `OLLAMA_BASE_URL` and `OLLAMA_HOST` for endpoint overrides. Its default endpoint uses `host.openshell.internal` so sandboxes can reach an Ollama instance running on the gateway host.
53
55
54
56
Unknown provider types return `None` from `profile_for()` and default to `Bearer` auth with no default headers via `auth_for_provider_type()`.
55
57
@@ -70,7 +72,19 @@ The gateway implements the `Inference` gRPC service defined in `proto/inference.
70
72
5. Builds a managed route spec that stores only `provider_name` and `model_id`. The spec intentionally leaves `base_url`, `api_key`, and `protocols` empty -- these are resolved dynamically at bundle time from the provider record.
71
73
6. Upserts the route with name `inference.local`. Version starts at 1 and increments monotonically on each update.
72
74
73
-
`GetClusterInference` returns `provider_name`, `model_id`, and `version` for the managed route. Returns `NOT_FOUND` if cluster inference is not configured.
75
+
`GetClusterInference` returns `provider_name`, `model_id`, `version`, and any configured `models` entries for the managed route. Returns `NOT_FOUND` if cluster inference is not configured.
76
+
77
+
### Multi-model routes
78
+
79
+
`upsert_multi_model_route()` configures multiple provider/model pairs on a single route, each identified by a short alias:
80
+
81
+
1. Validates that each `InferenceModelEntry` has non-empty `alias`, `provider_name`, and `model_id`.
82
+
2. Checks that aliases are unique (case-insensitive).
83
+
3. Verifies each provider exists and is inference-capable.
84
+
4. Optionally probes each endpoint (skipped with `--no-verify`).
85
+
5. Stores the full `models` vector in the route config. The first entry's provider/model are also written to the legacy single-model fields for backward compatibility.
86
+
87
+
At bundle time, each `InferenceModelEntry` is resolved into a separate `ResolvedRoute` whose `name` is set to the alias. The router's alias-first selection (see Route Selection) then matches the agent's `model` field against these names.
When `models` is non-empty in a set request, the gateway uses `upsert_multi_model_route()` and ignores the legacy `provider_name`/`model_id` fields. When `models` is empty, the legacy single-model path is used.
117
+
100
118
## Data Plane (Sandbox)
101
119
102
120
Files:
@@ -117,7 +135,7 @@ When a `CONNECT inference.local:443` arrives:
117
135
1. Proxy responds `200 Connection Established`.
118
136
2.`handle_inference_interception()` TLS-terminates the client connection using the sandbox CA (MITM).
119
137
3. Raw HTTP requests are parsed from the TLS tunnel using `try_parse_http_request()` (supports Content-Length and chunked transfer encoding).
120
-
4. Each parsed request is passed to `route_inference_request()`.
138
+
4. Each parsed request is passed to `route_inference_request()`. Before routing, the proxy extracts a `model_hint` from the JSON request body's `model` field (if present). This hint is passed to the router for alias-based route selection.
121
139
5. The tunnel supports HTTP keep-alive: multiple requests can be processed sequentially.
122
140
6. Buffer starts at 64 KiB (`INITIAL_INFERENCE_BUF`) and grows up to 10 MiB (`MAX_INFERENCE_BUF`). Requests exceeding the max get `413 Payload Too Large`.
Query strings are stripped before matching. Path matching is exact for most patterns; `/v1/models/*` and `/v1/codex/*` match any sub-path (e.g. `/v1/models/gpt-4.1`, `/v1/codex/responses`). Absolute-form URIs (e.g. `https://inference.local/v1/chat/completions`) are normalized to path-only form by `normalize_inference_path()` before detection.
138
162
139
-
Query strings are stripped before matching. Path matching is exact for most patterns; `/v1/models/*` matches any sub-path (e.g.`/v1/models/gpt-4.1`). Absolute-form URIs (e.g. `https://inference.local/v1/chat/completions`) are normalized to path-only form by `normalize_inference_path()` before detection.
163
+
Ollama patterns use `/api/` paths (no`/v1/` prefix), matching Ollama's native API. This allows agents to use the Ollama client library directly against `inference.local`.
140
164
141
165
If no pattern matches, the proxy returns `403 Forbidden` with `{"error": "connection not allowed by policy"}`.
142
166
@@ -161,7 +185,16 @@ Files:
161
185
162
186
### Route selection
163
187
164
-
`proxy_with_candidates()` finds the first route whose `protocols` list contains the detected source protocol (normalized to lowercase). If no route matches, returns `RouterError::NoCompatibleRoute`.
188
+
`select_route()` picks the best route from the candidate list using a two-phase strategy:
189
+
190
+
1.**Alias match (preferred)**: If a `model_hint` is provided (extracted from the request body's `model` field), select the first candidate whose `name` equals the hint AND whose `protocols` list contains the detected source protocol.
191
+
2.**Protocol fallback**: If no alias matches, fall back to the first candidate whose `protocols` list contains the source protocol.
192
+
193
+
This enables multi-route configurations where the agent selects a backend by setting the `model` field to an alias name (e.g. `"model": "my-gpt"` routes to the aliased provider). If the model field is absent, not a known alias, or parsing fails, routing falls back to protocol-based selection.
194
+
195
+
If no route matches either phase, returns `RouterError::NoCompatibleRoute`.
196
+
197
+
`proxy_with_candidates()` and `proxy_with_candidates_streaming()` both accept an optional `model_hint: Option<&str>` parameter, passed through from the sandbox proxy.
165
198
166
199
### Request rewriting
167
200
@@ -171,7 +204,7 @@ Files:
171
204
2.**Header stripping**: Removes `authorization`, `x-api-key`, `host`, and any header names that will be set from route defaults.
172
205
3.**Default headers**: Applies route-level default headers (e.g. `anthropic-version: 2023-06-01`) unless the client already sent them.
173
206
4.**Model rewrite**: Parses the request body as JSON and replaces the `model` field with the route's configured model. Non-JSON bodies are forwarded unchanged.
174
-
5.**URL construction**: `build_backend_url()` appends the request path to the route endpoint. If the endpoint already ends with`/v1`and the request path starts with `/v1/`, the duplicate prefix is deduplicated.
207
+
5.**URL construction**: `build_backend_url()` appends the request path to the route endpoint. If the request path is exactly`/v1`or starts with `/v1/`, the `/v1` prefix is always stripped before appending. This handles both `/v1`-suffixed endpoints (e.g. `api.openai.com/v1`) and non-versioned endpoints (e.g. `chatgpt.com/backend-api` for Codex) uniformly.
175
208
176
209
### Header sanitization
177
210
@@ -297,12 +330,24 @@ The system route is stored as a separate `InferenceRoute` record in the gateway
-`openshell inference set --system --provider <name> --model <id>` -- configures system inference
302
336
-`openshell inference get` -- displays both user and system inference configuration
303
337
-`openshell inference get --system` -- displays only the system inference configuration
304
338
305
-
The `--provider` flag references a provider record name (not a provider type). The provider must already exist in the cluster and have a supported inference type (`openai`, `anthropic`, or `nvidia`).
339
+
The `--provider` flag references a provider record name (not a provider type). The provider must already exist in the cluster and have a supported inference type (`openai`, `anthropic`, `nvidia`, or `ollama`).
340
+
341
+
`--model-alias` can be repeated to configure multiple providers simultaneously. It conflicts with `--provider` and `--model` -- the two modes are mutually exclusive. Example:
0 commit comments