refactor: per-agent LLM config with logical-name routing#102
Merged
Conversation
Each agent now declares its own (provider, model, api_key, base_url) under `agent.llm`. The router builds one provider instance per unique (provider_type, api_key, base_url) tuple and binds each agent's name as a logical model name that resolves to (provider_instance, real_model). The harness in each container always calls back to the server's /v1/ proxy with `model=<agent_id>`; the server resolves the logical name and dispatches. Real upstream API keys never leave the server — agent identity is encoded in placeholder keys (sk-ant-<agent_id>, sk-xpressclaw-<agent_id>) that the proxy parses. What changed: - AgentLlmConfig gains `model` and `model_path` fields; the legacy top-level `agent.model` is migrated into `llm.model` on load and is no longer serialized. - Global LlmConfig keeps only `custom_pricing`; provider/key/url/model fields removed. Old YAMLs ignore the dropped keys cleanly. - env_overrides (ANTHROPIC_API_KEY, OPENAI_API_KEY, OPENAI_BASE_URL) applies per-agent based on each agent's declared provider. - LlmRouter::build_from_config(&Config) walks agents, deduplicates provider instances, and binds each agent's name. Unknown names error; no random "first available provider" fallback. - LlmRouter::set_agent_model() lets budget-driven degradation re-point an agent at a cheaper real model without rebuilding the router. - /v1/messages resolves the agent (from model field or sk-ant-<id>) before deciding to direct-proxy to anthropic.com vs. convert-and-route. Per-agent Anthropic keys are now used correctly. - Container env never carries real cloud keys; LLM_MODEL=<agent_id>. - add_agent no longer copies global config into a new agent — it gets a sensible local-Ollama default that the user can edit. - update_agent_config and add_agent rebuild the router so changes take effect immediately. - Frontend: LiveConfig.llm becomes a per-agent providers summary; the settings/llm page renders per-agent entries instead of global state. Tests: 6 new tests cover migration, logical-name resolution, real-model resolution, provider-instance deduplication, runtime model swap, and that real keys never leak to container env. cargo fmt, clippy clean, 402 tests pass, svelte-check 0 errors.
7 tasks
wmeddie
added a commit
that referenced
this pull request
May 7, 2026
The embedded llama.cpp path (provider=local, GGUF download via hf-hub, LazyLlamaCppProvider, metal/cuda/local-llm Cargo features) is removed. Ollama becomes the only supported local backend. ADR-023 captures the decision; ADR-011 is marked superseded. Why: in-process llama.cpp invariant violations were taking down the server, and shipping per-platform builds (Metal, CUDA, CPU) added a lot of build-time complexity for releases. Ollama runs out-of-process with its own platform-tuned builds, and we already speak its HTTP API. Removed: - crates/xpressclaw-core/src/llm/llamacpp.rs - llama-cpp-2, hf-hub, encoding_rs from workspace + core Cargo.toml - local-llm, metal, cuda features (core, server, cli) - AgentLlmConfig.model_path field - The "local" arm in LlmRouter::materialize_provider - DownloadProgress / DownloadStatus / download_status route - use_embedded request flag - resolve_gguf_source and the post-setup GGUF download flow - nvcc/Metal detection in build.sh - "--skip llamacpp" filter in CI - Wizard's Built-in provider button + download progress UI Kept: - provider=ollama with HTTP proxy (LocalProvider) - Reconciler's per-host Ollama pull loop (reconcile_models) - Hardware detection + recommend_model for Ollama tag picking Future (not in this PR): publish XpressAI custom GGUFs (e.g. Qwen3.6-27B-RYS-UD) to Ollama Hub so agents can pull them via the same provider=ollama path. Stacked on #102. cargo fmt, clippy clean, 401 tests pass, svelte-check 0 errors.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Each agent now declares its own
(provider, model, api_key, base_url)underagent.llm. The router builds one provider instance per unique(provider_type, api_key, base_url)tuple and binds each agent's name as a logical model name. The harness in each container always calls back to the server's/v1/proxy withmodel=<agent_id>; the server resolves the logical name and dispatches. Real upstream API keys never leave the server.This replaces #100, which had several latent bugs (non-deterministic provider selection, dropped llamacpp registration, broken add-agent flow, frontend referencing removed fields). See the linked review for the full list. This PR addresses the same intent but cleanly.
Background
A diagnostic agent identified three real bugs that this PR fixes:
LlmRouterwas global-only — it ignored per-agentllmoverrides when the harness called back via/v1/.AgentLlmConfighadprovider/api_key/base_urlbut nomodel— the model lived atAgentConfig.modelat the top level, splitting the LLM block across two places.add_agentcopied the global provider/key/url into each new agent's config, so global edits stopped propagating to existing agents.The user's intent for the fix: each agent in
xpressclaw.yamldeclares its own LLM, the router routes per-agent, and runtime budget controls can re-point an agent at a cheaper model without restarting the container.What changed
AgentLlmConfiggainsmodelandmodel_pathfields. The legacy top-levelagent.modelis migrated intollm.modelon first load and is no longer serialized.LlmConfigkeeps onlycustom_pricing. The dropped fields (default_provider,openai_api_key, etc.) are silently ignored when loading old YAMLs.env_overrides(ANTHROPIC_API_KEY,OPENAI_API_KEY,OPENAI_BASE_URL) applies per-agent based on each agent's declared provider.LlmRouter::build_from_config(&Config)walks agents, deduplicates provider instances byprovider_type|api_key|base_url, and binds each agent's name as a logical model name. Unknown names error — no random "first available provider" fallback.LlmRouter::set_agent_model()lets budget-driven degradation re-point a binding at runtime without rebuilding the router./v1/messagesresolves the agent (from the model field, or fromsk-ant-<agent_id>placeholder header) before deciding to direct-proxy to anthropic.com vs. convert-and-route. Per-agent Anthropic keys are now used correctly.LLM_MODEL=<agent_id>plus placeholder API keys encoding the agent_id. The proxy is the only thing holding real upstream credentials.add_agentno longer copies global config — new agents get a sensible local-Ollama default that the user edits afterwards.update_agent_configandadd_agentrebuild the router on edit so changes take effect without a restart. The previous behavior left the router stale until reboot.LiveConfig.llmbecomes a per-agent providers summary;settings/llm/+page.svelterenders per-agent entries instead of removed global fields.Tests
Six new tests, all passing alongside the existing 396:
migrate_legacy_top_level_modelanddoes_not_overwrite_explicit_llm_modellock the YAML migration.effective_modelchecks the fallback ordering.router_resolves_logical_agent_nameandrouter_resolves_real_model_namecover both lookup paths.router_two_agents_share_one_provider_instanceproves the dedup is real (2 agents on the same OpenAI key share one OpenAiProvider; a third with a different key gets its own instance).set_agent_model_repoints_to_cheaper_modelcovers the runtime swap path.build_container_spec_no_real_keys_in_containerlocks in that real API keys never leak into container env, even when the agent has them configured.Test plan
cargo fmt --all --check— cleancargo clippy --workspace --all-targets— cleancargo test --workspace— 402 passednpx svelte-check— 0 errorsOPENAI_API_KEYenv var, verify it's picked up only by openai-provider agents