Skip to content

fix: use per candidate provider for model_fallbacks#2143

Merged
yinwm merged 4 commits intosipeed:mainfrom
corevibe555:fix/model-fallbacks-per-candidate-provider
Apr 7, 2026
Merged

fix: use per candidate provider for model_fallbacks#2143
yinwm merged 4 commits intosipeed:mainfrom
corevibe555:fix/model-fallbacks-per-candidate-provider

Conversation

@corevibe555
Copy link
Copy Markdown
Contributor

Closes #2140

📝 Description

When using model_fallbacks with models from different providers, all fallback requests
were sent to the primary model's api_base with the primary model's api_key instead
of each fallback's own configuration. This made cross-provider fallback chains
non-functional (e.g. an OpenRouter primary with a Gemini fallback would send the Gemini
request to OpenRouter's API, resulting in a 404).

Root cause: a single LLMProvider was constructed from the primary model's
ModelConfig at startup and reused for every candidate in the fallback chain. The chain
only swapped the model ID string — the underlying HTTP client (with its baked-in
api_base and api_key) never changed.

Fix: at agent initialization, a dedicated LLMProvider is pre-created for each
candidate found in model_list and stored in a new CandidateProviders map on
AgentInstance (keyed by provider/model). The fallback run closure now selects the
correct provider for the active candidate from this map, falling back to
agent.Provider when no override is found. This covers both primary fallback candidates
and light-model routing candidates.

🗣️ Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 📖 Documentation update
  • ⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

  • 🤖 Fully AI-generated (100% AI, 0% Human)
  • 🛠️ Mostly AI-generated (AI draft, Human verified/modified)
  • 👨‍💻 Mostly Human-written (Human lead, AI assisted or none)

🔗 Related Issue

Fixes #2140

📚 Technical Context (Skip for Docs)

  • Reference URL: [BUG] model_fallbacks inherits primary model's api_base/api_key instead of using each fallback's own config #2140
  • Reasoning: The FallbackChain.Execute callback captured agent.Provider (the
    primary model's provider) in its closure and passed only the model ID string per
    candidate. Creating providers eagerly at agent creation time (rather than lazily per
    request) avoids runtime overhead while ensuring each fallback uses its own credentials.
    The CandidateProviders map is keyed by providers.ModelKey(provider, model) to
    match the same key used inside the fallback chain's run closure.

🧪 Test Environment

  • Hardware: PC
  • OS: Linux
  • Model/Provider: OpenRouter (primary) + Google Gemini (fallback)
  • Channels:

📸 Evidence (Optional)

Click to view Logs/Screenshots

Before fix — fallback routed to OpenRouter with wrong key:

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 29, 2026

CLA assistant check
All committers have signed the CLA.

@corevibe555 corevibe555 force-pushed the fix/model-fallbacks-per-candidate-provider branch from 73762da to 5635190 Compare March 29, 2026 03:15
@sipeed-bot sipeed-bot bot added type: bug Something isn't working domain: agent domain: provider go Pull requests that update go code labels Mar 29, 2026
Copy link
Copy Markdown
Collaborator

@yinwm yinwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! The root cause analysis is spot-on and the pre-creation approach is clean. I prefer this over #1637 (see my reasoning there).

Two blocking issues before we can merge:

1. Rebase needed — conflicts with merged #2038

This PR branches from before #2038 was merged. In the current main, the fallback closure uses activeProvider (which may be LightProvider when routing selects the light tier):

// current main (after #2038)
return activeProvider.Chat(ctx, messagesForCall, toolDefsForCall, model, llmOpts)

But your change defaults to agent.Provider:

p := agent.Provider  // should be activeProvider

Please rebase onto latest main and ensure the default fallback respects light model routing. The fix should be:

p := activeProvider
if cp, ok := agent.CandidateProviders[providers.ModelKey(provider, model)]; ok {
    p = cp
}
return p.Chat(...)

2. Model matching in registerCandidateProviders is fragile

The direct string comparison fullModel == cfg.ModelList[i].Model assumes model_list entries always use "provider/model" format. Consider reusing resolvedModelConfig() from model_resolution.go instead, which already handles alias resolution and model config lookup.

Non-blocking suggestions:

  • Replace log.Printf with logger.WarnCF for consistency with the rest of the codebase
  • Add unit tests for cross-provider fallback resolution (this is important for a core routing fix)

Once these are addressed, I'm happy to approve.

@corevibe555
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback.
I am working on this.

@corevibe555 corevibe555 force-pushed the fix/model-fallbacks-per-candidate-provider branch 3 times, most recently from 6bf2ce7 to e766036 Compare March 29, 2026 20:33
@corevibe555
Copy link
Copy Markdown
Contributor Author

corevibe555 commented Mar 29, 2026

  1. Done

  2. Done as requested(Updated)

@corevibe555
Copy link
Copy Markdown
Contributor Author

@yinwm Would you give feedback so I can improve as per your view? Thank you!

@sipeed sipeed deleted a comment Mar 31, 2026
@corevibe555 corevibe555 force-pushed the fix/model-fallbacks-per-candidate-provider branch from f318ea1 to 3e8b1cf Compare March 31, 2026 05:02
@corevibe555
Copy link
Copy Markdown
Contributor Author

Squashed commit history, so I made the commits into one meaningful commit.

@corevibe555 corevibe555 requested a review from yinwm March 31, 2026 22:08
@corevibe555
Copy link
Copy Markdown
Contributor Author

@yinwm Another update here.
I've gone through the PR and updated again as per your original request.

  1. Rebase needed — conflicts with merged fix(agent): use light provider for routed model calls #2038
    Rebase done, code updated properly as per your comment.

  2. Model matching in registerCandidateProviders is fragile
    Utilized resolvedModelConfig(), fixed tests.

Each fallback model now uses its own api_base and api_key from
model_list instead of inheriting the primary model's provider config.

Previously, a single LLMProvider was created from the primary model's
ModelConfig and reused for all fallback candidates — only the model ID
string was swapped. This caused all fallback requests to be routed to
the primary provider's endpoint, making cross-provider fallback chains
non-functional (e.g., OpenRouter primary with Gemini fallback would
send the Gemini request to OpenRouter's API).

Fix: pre-create a per-candidate LLMProvider at agent initialization
time by looking up each candidate's ModelConfig from model_list. The
fallback run closure now selects the correct provider per candidate
via CandidateProviders map, falling back to agent.Provider when no
override is found.

Fixes sipeed#2140

Made-with: Cursor

test: add test for instance.go

fix: fix test

refactor: optimize

fix: fix Golang lint issues

chore: comment cleanup
@corevibe555 corevibe555 force-pushed the fix/model-fallbacks-per-candidate-provider branch from 9362c6e to 15181e5 Compare April 2, 2026 11:07
Copy link
Copy Markdown
Collaborator

@yinwm yinwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Both blocking issues from the first review have been properly addressed. The rebase onto latest main is clean, activeProvider is correctly used as the default, and resolvedModelConfig() is now the canonical resolution path. The test coverage is thorough — 352 lines covering the exact #2140 scenario, edge cases, and the graceful fallback to activeProvider for unregistered candidates.

One non-blocking suggestion for a follow-up: populateCandidateProvidersFromNames uses resolvedModelConfig() (which only matches by model_name), but buildModelListResolver (used by resolveModelCandidates) has an additional fallback path that also matches by Model and modelID. This means if a fallback is referenced by model ID instead of alias, the candidate will be created in the fallback chain but won't have a corresponding CandidateProviders entry. Consider aligning these two resolution paths in a follow-up PR.

@yinwm yinwm merged commit 6ce0306 into sipeed:main Apr 7, 2026
4 checks passed
@sipeed-bot
Copy link
Copy Markdown

sipeed-bot bot commented Apr 8, 2026

@corevibe555 Great debugging on the model_fallbacks cross-provider issue. Pinning it down to a single LLMProvider getting reused across every candidate is exactly the kind of root cause that's easy to miss, so the per candidate rebuild is a clean fix.

We're setting up the PicoClaw Dev Group on Discord for contributors to connect and collaborate. If you'd like to join, drop a note to support@sipeed.com with the subject [Join PicoClaw Dev Group] + corevibe555 and we'll send the invite link your way.

ra1phdd pushed a commit to ra1phdd/picoclaw-pkg that referenced this pull request Apr 12, 2026
* fix: use per-candidate provider for model_fallbacks

Each fallback model now uses its own api_base and api_key from
model_list instead of inheriting the primary model's provider config.

Previously, a single LLMProvider was created from the primary model's
ModelConfig and reused for all fallback candidates — only the model ID
string was swapped. This caused all fallback requests to be routed to
the primary provider's endpoint, making cross-provider fallback chains
non-functional (e.g., OpenRouter primary with Gemini fallback would
send the Gemini request to OpenRouter's API).

Fix: pre-create a per-candidate LLMProvider at agent initialization
time by looking up each candidate's ModelConfig from model_list. The
fallback run closure now selects the correct provider per candidate
via CandidateProviders map, falling back to agent.Provider when no
override is found.

Fixes sipeed#2140

Made-with: Cursor

test: add test for instance.go

fix: fix test

refactor: optimize

fix: fix Golang lint issues

chore: comment cleanup

* refactor: use resolvedModelConfig() instead of buildModelIndex()

* fix
armmer016 pushed a commit to armmer016/khunquant that referenced this pull request Apr 14, 2026
* fix: use per-candidate provider for model_fallbacks

Each fallback model now uses its own api_base and api_key from
model_list instead of inheriting the primary model's provider config.

Previously, a single LLMProvider was created from the primary model's
ModelConfig and reused for all fallback candidates — only the model ID
string was swapped. This caused all fallback requests to be routed to
the primary provider's endpoint, making cross-provider fallback chains
non-functional (e.g., OpenRouter primary with Gemini fallback would
send the Gemini request to OpenRouter's API).

Fix: pre-create a per-candidate LLMProvider at agent initialization
time by looking up each candidate's ModelConfig from model_list. The
fallback run closure now selects the correct provider per candidate
via CandidateProviders map, falling back to agent.Provider when no
override is found.

Fixes sipeed#2140

Made-with: Cursor

test: add test for instance.go

fix: fix test

refactor: optimize

fix: fix Golang lint issues

chore: comment cleanup

* refactor: use resolvedModelConfig() instead of buildModelIndex()

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: agent domain: provider go Pull requests that update go code type: bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] model_fallbacks inherits primary model's api_base/api_key instead of using each fallback's own config

3 participants