From c768e8de965fba91d2dcec86ff8687103f23338f Mon Sep 17 00:00:00 2001 From: Hanmiao Li <894876246@qq.com> Date: Tue, 2 Jun 2026 14:58:31 +0800 Subject: [PATCH 1/2] docs: provider fallback chain design (#2574) --- docs/design/provider-fallback-chain.md | 134 +++++++++++++++++++++++++ 1 file changed, 134 insertions(+) create mode 100644 docs/design/provider-fallback-chain.md diff --git a/docs/design/provider-fallback-chain.md b/docs/design/provider-fallback-chain.md new file mode 100644 index 000000000..786768a25 --- /dev/null +++ b/docs/design/provider-fallback-chain.md @@ -0,0 +1,134 @@ +# Provider Fallback Chain — Design Document (#2574) + +## Summary + +Add an automatic provider fallback chain so that when the active provider +returns a non-recoverable error (429, selected 5xx, connection timeout), +CodeWhale switches to the next configured provider without interrupting +the user's workflow. + +## Motivation + +Currently, users must manually run `/provider` to switch when their +primary provider fails. This is especially disruptive during long-running +agentic tasks. A fallback chain keeps the agent working without user +intervention. + +## Design + +### Configuration + +```toml +[providers] +active = "nvidia-nim" +fallback = ["deepseek", "openrouter"] + +[providers.nvidia-nim] +api_key = "nvapi-..." +base_url = "https://integrate.api.nvidia.com/v1" +model = "meta/llama-4" + +[providers.deepseek] +api_key = "$DEEPSEEK_API_KEY" +model = "deepseek-v4-pro" + +[providers.openrouter] +api_key = "$OPENROUTER_API_KEY" +model = "deepseek/deepseek-v4-0324" +``` + +- `fallback` — ordered list of provider names to try +- `active` — the primary provider (existing `provider` key, renamed for clarity) + +### Fallback triggers + +| Error | Fallback? | Rationale | +|---|---|---| +| 429 (rate limit) | ✅ | Quota exhausted — swap key/provider | +| 502 / 503 / 504 | ✅ | Provider infrastructure issue | +| Connection timeout / DNS failure | ✅ | Network path broken | +| 401 / 403 | ❌ | Auth issue — no other provider will help | +| 400 (bad request) | ❌ | Client error — not provider-specific | +| Stream interrupted mid-content | ❌ | Already consumed partial response | + +### Sequence + +``` +1. Try primary provider (nvidia-nim) +2. On fallback-eligible error → wait 1s → try fallback[0] (deepseek) +3. On fallback-eligible error → wait 1s → try fallback[1] (openrouter) +4. All exhausted → surface clear error to user +``` + +### Transcript / UI + +- Status toast: `DeepSeek unavailable — switched to OpenRouter` +- Transcript marker: `[provider: nvidia-nim → deepseek]` +- `/provider` command shows current chain position: `deepseek (fallback #1)` +- Original (`active`) provider is remembered so user can `/provider reset` to go back + +### Capability awareness + +Before switching, the engine checks that the fallback provider supports +the current turn's needs: + +| Capability | Check | +|---|---| +| Tools / function calling | Fallback provider must support tools | +| Reasoning effort | Must support same reasoning levels | +| Context length | Model must have ≥ current turn's token count | +| Vision | Must support image inputs if turn has images | + +If no fallback provider meets capabilities, the error is surfaced directly. + +### Retry integration + +Existing `[retry]` settings apply per-provider **before** fallback triggers. +A provider gets `max_retries` attempts with `retry_delay` between them. +Only after retry exhaustion does fallback move to the next provider. + +### Config schema validation + +On startup, validate: +- Each `fallback` entry is a known provider +- No duplicate providers in chain +- Fallback providers have valid `api_key` (or env var set) +- Warn if fallback model has different capability profile + +### Implementation plan (phased) + +#### Phase 1 (draft PR #1): Config schema + validation + +- Add `fallback` field to `ProvidersConfig` +- Startup validation of fallback chain +- Unit tests for config parsing + +#### Phase 2 (draft PR #2): Engine fallback logic + +- `try_with_fallback()` in `client.rs` +- Error classification (is fallback-eligible?) +- Saves original provider, pushes fallback status event +- Integration test with mock HTTP + +#### Phase 3 (draft PR #3): UI feedback + +- Status toast on switch +- Transcript marker +- `/provider` shows fallback position +- `/provider reset` command + +### Rejected alternatives + +- **Per-request model routing**: Too fine-grained; turns have state (system prompt, tools) that + shouldn't change mid-turn +- **Weighted random selection**: Unpredictable billing; users need deterministic behavior +- **Sub-agent-level fallback**: Complicates sub-agent lifecycle for marginal gain + +### Open questions + +1. Should fallback persist across sessions or reset each launch? + → **Reset each launch** (avoids silently staying on fallback forever) +2. Should `/compact` reset to primary provider? + → **No** — compaction changes context, not provider +3. Tool call mid-turn: if tool call succeeds but next API call fails, do we fallback? + → **Yes**, same turn can span providers as long as capabilities match From 912aa7fbed145c6e2e842d266f8c9c4fea4928c4 Mon Sep 17 00:00:00 2001 From: Hanmiao Li <894876246@qq.com> Date: Tue, 2 Jun 2026 15:04:40 +0800 Subject: [PATCH 2/2] docs: provider fallback chain design, 3-phase plan (#2574) --- docs/design/provider-fallback-chain.md | 134 ------------------------- 1 file changed, 134 deletions(-) delete mode 100644 docs/design/provider-fallback-chain.md diff --git a/docs/design/provider-fallback-chain.md b/docs/design/provider-fallback-chain.md deleted file mode 100644 index 786768a25..000000000 --- a/docs/design/provider-fallback-chain.md +++ /dev/null @@ -1,134 +0,0 @@ -# Provider Fallback Chain — Design Document (#2574) - -## Summary - -Add an automatic provider fallback chain so that when the active provider -returns a non-recoverable error (429, selected 5xx, connection timeout), -CodeWhale switches to the next configured provider without interrupting -the user's workflow. - -## Motivation - -Currently, users must manually run `/provider` to switch when their -primary provider fails. This is especially disruptive during long-running -agentic tasks. A fallback chain keeps the agent working without user -intervention. - -## Design - -### Configuration - -```toml -[providers] -active = "nvidia-nim" -fallback = ["deepseek", "openrouter"] - -[providers.nvidia-nim] -api_key = "nvapi-..." -base_url = "https://integrate.api.nvidia.com/v1" -model = "meta/llama-4" - -[providers.deepseek] -api_key = "$DEEPSEEK_API_KEY" -model = "deepseek-v4-pro" - -[providers.openrouter] -api_key = "$OPENROUTER_API_KEY" -model = "deepseek/deepseek-v4-0324" -``` - -- `fallback` — ordered list of provider names to try -- `active` — the primary provider (existing `provider` key, renamed for clarity) - -### Fallback triggers - -| Error | Fallback? | Rationale | -|---|---|---| -| 429 (rate limit) | ✅ | Quota exhausted — swap key/provider | -| 502 / 503 / 504 | ✅ | Provider infrastructure issue | -| Connection timeout / DNS failure | ✅ | Network path broken | -| 401 / 403 | ❌ | Auth issue — no other provider will help | -| 400 (bad request) | ❌ | Client error — not provider-specific | -| Stream interrupted mid-content | ❌ | Already consumed partial response | - -### Sequence - -``` -1. Try primary provider (nvidia-nim) -2. On fallback-eligible error → wait 1s → try fallback[0] (deepseek) -3. On fallback-eligible error → wait 1s → try fallback[1] (openrouter) -4. All exhausted → surface clear error to user -``` - -### Transcript / UI - -- Status toast: `DeepSeek unavailable — switched to OpenRouter` -- Transcript marker: `[provider: nvidia-nim → deepseek]` -- `/provider` command shows current chain position: `deepseek (fallback #1)` -- Original (`active`) provider is remembered so user can `/provider reset` to go back - -### Capability awareness - -Before switching, the engine checks that the fallback provider supports -the current turn's needs: - -| Capability | Check | -|---|---| -| Tools / function calling | Fallback provider must support tools | -| Reasoning effort | Must support same reasoning levels | -| Context length | Model must have ≥ current turn's token count | -| Vision | Must support image inputs if turn has images | - -If no fallback provider meets capabilities, the error is surfaced directly. - -### Retry integration - -Existing `[retry]` settings apply per-provider **before** fallback triggers. -A provider gets `max_retries` attempts with `retry_delay` between them. -Only after retry exhaustion does fallback move to the next provider. - -### Config schema validation - -On startup, validate: -- Each `fallback` entry is a known provider -- No duplicate providers in chain -- Fallback providers have valid `api_key` (or env var set) -- Warn if fallback model has different capability profile - -### Implementation plan (phased) - -#### Phase 1 (draft PR #1): Config schema + validation - -- Add `fallback` field to `ProvidersConfig` -- Startup validation of fallback chain -- Unit tests for config parsing - -#### Phase 2 (draft PR #2): Engine fallback logic - -- `try_with_fallback()` in `client.rs` -- Error classification (is fallback-eligible?) -- Saves original provider, pushes fallback status event -- Integration test with mock HTTP - -#### Phase 3 (draft PR #3): UI feedback - -- Status toast on switch -- Transcript marker -- `/provider` shows fallback position -- `/provider reset` command - -### Rejected alternatives - -- **Per-request model routing**: Too fine-grained; turns have state (system prompt, tools) that - shouldn't change mid-turn -- **Weighted random selection**: Unpredictable billing; users need deterministic behavior -- **Sub-agent-level fallback**: Complicates sub-agent lifecycle for marginal gain - -### Open questions - -1. Should fallback persist across sessions or reset each launch? - → **Reset each launch** (avoids silently staying on fallback forever) -2. Should `/compact` reset to primary provider? - → **No** — compaction changes context, not provider -3. Tool call mid-turn: if tool call succeeds but next API call fails, do we fallback? - → **Yes**, same turn can span providers as long as capabilities match