refactor: per-agent LLM config with logical-name routing by wmeddie · Pull Request #102 · XpressAI/xpressclaw

wmeddie · 2026-05-07T00:16:19Z

Summary

Each agent now declares its own (provider, model, api_key, base_url) under agent.llm. The router builds one provider instance per unique (provider_type, api_key, base_url) tuple and binds each agent's name as a logical model name. The harness in each container always calls back to the server's /v1/ proxy with model=<agent_id>; the server resolves the logical name and dispatches. Real upstream API keys never leave the server.

This replaces #100, which had several latent bugs (non-deterministic provider selection, dropped llamacpp registration, broken add-agent flow, frontend referencing removed fields). See the linked review for the full list. This PR addresses the same intent but cleanly.

Background

A diagnostic agent identified three real bugs that this PR fixes:

The server's LlmRouter was global-only — it ignored per-agent llm overrides when the harness called back via /v1/.
AgentLlmConfig had provider/api_key/base_url but no model — the model lived at AgentConfig.model at the top level, splitting the LLM block across two places.
add_agent copied the global provider/key/url into each new agent's config, so global edits stopped propagating to existing agents.

The user's intent for the fix: each agent in xpressclaw.yaml declares its own LLM, the router routes per-agent, and runtime budget controls can re-point an agent at a cheaper model without restarting the container.

What changed

AgentLlmConfig gains model and model_path fields. The legacy top-level agent.model is migrated into llm.model on first load and is no longer serialized.
Global LlmConfig keeps only custom_pricing. The dropped fields (default_provider, openai_api_key, etc.) are silently ignored when loading old YAMLs.
env_overrides (ANTHROPIC_API_KEY, OPENAI_API_KEY, OPENAI_BASE_URL) applies per-agent based on each agent's declared provider.
LlmRouter::build_from_config(&Config) walks agents, deduplicates provider instances by provider_type|api_key|base_url, and binds each agent's name as a logical model name. Unknown names error — no random "first available provider" fallback.
LlmRouter::set_agent_model() lets budget-driven degradation re-point a binding at runtime without rebuilding the router.
/v1/messages resolves the agent (from the model field, or from sk-ant-<agent_id> placeholder header) before deciding to direct-proxy to anthropic.com vs. convert-and-route. Per-agent Anthropic keys are now used correctly.
Container env never carries real cloud keys; LLM_MODEL=<agent_id> plus placeholder API keys encoding the agent_id. The proxy is the only thing holding real upstream credentials.
add_agent no longer copies global config — new agents get a sensible local-Ollama default that the user edits afterwards.
update_agent_config and add_agent rebuild the router on edit so changes take effect without a restart. The previous behavior left the router stale until reboot.
Frontend LiveConfig.llm becomes a per-agent providers summary; settings/llm/+page.svelte renders per-agent entries instead of removed global fields.

Tests

Six new tests, all passing alongside the existing 396:

migrate_legacy_top_level_model and does_not_overwrite_explicit_llm_model lock the YAML migration.
effective_model checks the fallback ordering.
router_resolves_logical_agent_name and router_resolves_real_model_name cover both lookup paths.
router_two_agents_share_one_provider_instance proves the dedup is real (2 agents on the same OpenAI key share one OpenAiProvider; a third with a different key gets its own instance).
set_agent_model_repoints_to_cheaper_model covers the runtime swap path.
build_container_spec_no_real_keys_in_container locks in that real API keys never leak into container env, even when the agent has them configured.

Test plan

cargo fmt --all --check — clean
cargo clippy --workspace --all-targets — clean
cargo test --workspace — 402 passed
npx svelte-check — 0 errors
Manual: add an agent via the wizard with provider=openai, verify the new agent has its own llm block with model+key, not a global one
Manual: edit one agent to provider=anthropic, leave another on provider=openai, verify each agent talks to its own upstream when chatting
Manual: set OPENAI_API_KEY env var, verify it's picked up only by openai-provider agents

Each agent now declares its own (provider, model, api_key, base_url) under `agent.llm`. The router builds one provider instance per unique (provider_type, api_key, base_url) tuple and binds each agent's name as a logical model name that resolves to (provider_instance, real_model). The harness in each container always calls back to the server's /v1/ proxy with `model=<agent_id>`; the server resolves the logical name and dispatches. Real upstream API keys never leave the server — agent identity is encoded in placeholder keys (sk-ant-<agent_id>, sk-xpressclaw-<agent_id>) that the proxy parses. What changed: - AgentLlmConfig gains `model` and `model_path` fields; the legacy top-level `agent.model` is migrated into `llm.model` on load and is no longer serialized. - Global LlmConfig keeps only `custom_pricing`; provider/key/url/model fields removed. Old YAMLs ignore the dropped keys cleanly. - env_overrides (ANTHROPIC_API_KEY, OPENAI_API_KEY, OPENAI_BASE_URL) applies per-agent based on each agent's declared provider. - LlmRouter::build_from_config(&Config) walks agents, deduplicates provider instances, and binds each agent's name. Unknown names error; no random "first available provider" fallback. - LlmRouter::set_agent_model() lets budget-driven degradation re-point an agent at a cheaper real model without rebuilding the router. - /v1/messages resolves the agent (from model field or sk-ant-<id>) before deciding to direct-proxy to anthropic.com vs. convert-and-route. Per-agent Anthropic keys are now used correctly. - Container env never carries real cloud keys; LLM_MODEL=<agent_id>. - add_agent no longer copies global config into a new agent — it gets a sensible local-Ollama default that the user can edit. - update_agent_config and add_agent rebuild the router so changes take effect immediately. - Frontend: LiveConfig.llm becomes a per-agent providers summary; the settings/llm page renders per-agent entries instead of global state. Tests: 6 new tests cover migration, logical-name resolution, real-model resolution, provider-instance deduplication, runtime model swap, and that real keys never leak to container env. cargo fmt, clippy clean, 402 tests pass, svelte-check 0 errors.

The embedded llama.cpp path (provider=local, GGUF download via hf-hub, LazyLlamaCppProvider, metal/cuda/local-llm Cargo features) is removed. Ollama becomes the only supported local backend. ADR-023 captures the decision; ADR-011 is marked superseded. Why: in-process llama.cpp invariant violations were taking down the server, and shipping per-platform builds (Metal, CUDA, CPU) added a lot of build-time complexity for releases. Ollama runs out-of-process with its own platform-tuned builds, and we already speak its HTTP API. Removed: - crates/xpressclaw-core/src/llm/llamacpp.rs - llama-cpp-2, hf-hub, encoding_rs from workspace + core Cargo.toml - local-llm, metal, cuda features (core, server, cli) - AgentLlmConfig.model_path field - The "local" arm in LlmRouter::materialize_provider - DownloadProgress / DownloadStatus / download_status route - use_embedded request flag - resolve_gguf_source and the post-setup GGUF download flow - nvcc/Metal detection in build.sh - "--skip llamacpp" filter in CI - Wizard's Built-in provider button + download progress UI Kept: - provider=ollama with HTTP proxy (LocalProvider) - Reconciler's per-host Ollama pull loop (reconcile_models) - Hardware detection + recommend_model for Ollama tag picking Future (not in this PR): publish XpressAI custom GGUFs (e.g. Qwen3.6-27B-RYS-UD) to Ollama Hub so agents can pull them via the same provider=ollama path. Stacked on #102. cargo fmt, clippy clean, 401 tests pass, svelte-check 0 errors.

wmeddie temporarily deployed to integration May 7, 2026 00:30 — with GitHub Actions Inactive

wmeddie mentioned this pull request May 7, 2026

refactor: remove embedded llama.cpp, rely on Ollama for local inference #103

Merged

7 tasks

wmeddie merged commit 95d9d7b into main May 7, 2026
4 checks passed

wmeddie deleted the fix/per-agent-llm-config-v2 branch May 7, 2026 01:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: per-agent LLM config with logical-name routing#102

refactor: per-agent LLM config with logical-name routing#102
wmeddie merged 1 commit into
mainfrom
fix/per-agent-llm-config-v2

wmeddie commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wmeddie commented May 7, 2026

Summary

Background

What changed

Tests

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant