fix: use per candidate provider for model_fallbacks#2143
Conversation
73762da to
5635190
Compare
yinwm
left a comment
There was a problem hiding this comment.
Thanks for the fix! The root cause analysis is spot-on and the pre-creation approach is clean. I prefer this over #1637 (see my reasoning there).
Two blocking issues before we can merge:
1. Rebase needed — conflicts with merged #2038
This PR branches from before #2038 was merged. In the current main, the fallback closure uses activeProvider (which may be LightProvider when routing selects the light tier):
// current main (after #2038)
return activeProvider.Chat(ctx, messagesForCall, toolDefsForCall, model, llmOpts)But your change defaults to agent.Provider:
p := agent.Provider // should be activeProviderPlease rebase onto latest main and ensure the default fallback respects light model routing. The fix should be:
p := activeProvider
if cp, ok := agent.CandidateProviders[providers.ModelKey(provider, model)]; ok {
p = cp
}
return p.Chat(...)2. Model matching in registerCandidateProviders is fragile
The direct string comparison fullModel == cfg.ModelList[i].Model assumes model_list entries always use "provider/model" format. Consider reusing resolvedModelConfig() from model_resolution.go instead, which already handles alias resolution and model config lookup.
Non-blocking suggestions:
- Replace
log.Printfwithlogger.WarnCFfor consistency with the rest of the codebase - Add unit tests for cross-provider fallback resolution (this is important for a core routing fix)
Once these are addressed, I'm happy to approve.
|
Thanks for the feedback. |
6bf2ce7 to
e766036
Compare
|
|
@yinwm Would you give feedback so I can improve as per your view? Thank you! |
f318ea1 to
3e8b1cf
Compare
|
Squashed commit history, so I made the commits into one meaningful commit. |
|
@yinwm Another update here.
|
Each fallback model now uses its own api_base and api_key from model_list instead of inheriting the primary model's provider config. Previously, a single LLMProvider was created from the primary model's ModelConfig and reused for all fallback candidates — only the model ID string was swapped. This caused all fallback requests to be routed to the primary provider's endpoint, making cross-provider fallback chains non-functional (e.g., OpenRouter primary with Gemini fallback would send the Gemini request to OpenRouter's API). Fix: pre-create a per-candidate LLMProvider at agent initialization time by looking up each candidate's ModelConfig from model_list. The fallback run closure now selects the correct provider per candidate via CandidateProviders map, falling back to agent.Provider when no override is found. Fixes sipeed#2140 Made-with: Cursor test: add test for instance.go fix: fix test refactor: optimize fix: fix Golang lint issues chore: comment cleanup
9362c6e to
15181e5
Compare
yinwm
left a comment
There was a problem hiding this comment.
Great work! Both blocking issues from the first review have been properly addressed. The rebase onto latest main is clean, activeProvider is correctly used as the default, and resolvedModelConfig() is now the canonical resolution path. The test coverage is thorough — 352 lines covering the exact #2140 scenario, edge cases, and the graceful fallback to activeProvider for unregistered candidates.
One non-blocking suggestion for a follow-up: populateCandidateProvidersFromNames uses resolvedModelConfig() (which only matches by model_name), but buildModelListResolver (used by resolveModelCandidates) has an additional fallback path that also matches by Model and modelID. This means if a fallback is referenced by model ID instead of alias, the candidate will be created in the fallback chain but won't have a corresponding CandidateProviders entry. Consider aligning these two resolution paths in a follow-up PR.
|
@corevibe555 Great debugging on the model_fallbacks cross-provider issue. Pinning it down to a single LLMProvider getting reused across every candidate is exactly the kind of root cause that's easy to miss, so the per candidate rebuild is a clean fix. We're setting up the PicoClaw Dev Group on Discord for contributors to connect and collaborate. If you'd like to join, drop a note to |
* fix: use per-candidate provider for model_fallbacks Each fallback model now uses its own api_base and api_key from model_list instead of inheriting the primary model's provider config. Previously, a single LLMProvider was created from the primary model's ModelConfig and reused for all fallback candidates — only the model ID string was swapped. This caused all fallback requests to be routed to the primary provider's endpoint, making cross-provider fallback chains non-functional (e.g., OpenRouter primary with Gemini fallback would send the Gemini request to OpenRouter's API). Fix: pre-create a per-candidate LLMProvider at agent initialization time by looking up each candidate's ModelConfig from model_list. The fallback run closure now selects the correct provider per candidate via CandidateProviders map, falling back to agent.Provider when no override is found. Fixes sipeed#2140 Made-with: Cursor test: add test for instance.go fix: fix test refactor: optimize fix: fix Golang lint issues chore: comment cleanup * refactor: use resolvedModelConfig() instead of buildModelIndex() * fix
* fix: use per-candidate provider for model_fallbacks Each fallback model now uses its own api_base and api_key from model_list instead of inheriting the primary model's provider config. Previously, a single LLMProvider was created from the primary model's ModelConfig and reused for all fallback candidates — only the model ID string was swapped. This caused all fallback requests to be routed to the primary provider's endpoint, making cross-provider fallback chains non-functional (e.g., OpenRouter primary with Gemini fallback would send the Gemini request to OpenRouter's API). Fix: pre-create a per-candidate LLMProvider at agent initialization time by looking up each candidate's ModelConfig from model_list. The fallback run closure now selects the correct provider per candidate via CandidateProviders map, falling back to agent.Provider when no override is found. Fixes sipeed#2140 Made-with: Cursor test: add test for instance.go fix: fix test refactor: optimize fix: fix Golang lint issues chore: comment cleanup * refactor: use resolvedModelConfig() instead of buildModelIndex() * fix
Closes #2140
📝 Description
When using
model_fallbackswith models from different providers, all fallback requestswere sent to the primary model's
api_basewith the primary model'sapi_keyinsteadof each fallback's own configuration. This made cross-provider fallback chains
non-functional (e.g. an OpenRouter primary with a Gemini fallback would send the Gemini
request to OpenRouter's API, resulting in a 404).
Root cause: a single
LLMProviderwas constructed from the primary model'sModelConfigat startup and reused for every candidate in the fallback chain. The chainonly swapped the model ID string — the underlying HTTP client (with its baked-in
api_baseandapi_key) never changed.Fix: at agent initialization, a dedicated
LLMProvideris pre-created for eachcandidate found in
model_listand stored in a newCandidateProvidersmap onAgentInstance(keyed byprovider/model). The fallback run closure now selects thecorrect provider for the active candidate from this map, falling back to
agent.Providerwhen no override is found. This covers both primary fallback candidatesand light-model routing candidates.
🗣️ Type of Change
🤖 AI Code Generation
🔗 Related Issue
Fixes #2140
📚 Technical Context (Skip for Docs)
FallbackChain.Executecallback capturedagent.Provider(theprimary model's provider) in its closure and passed only the model ID string per
candidate. Creating providers eagerly at agent creation time (rather than lazily per
request) avoids runtime overhead while ensuring each fallback uses its own credentials.
The
CandidateProvidersmap is keyed byproviders.ModelKey(provider, model)tomatch the same key used inside the fallback chain's
runclosure.🧪 Test Environment
📸 Evidence (Optional)
Click to view Logs/Screenshots
Before fix — fallback routed to OpenRouter with wrong key: