Skip to content

refactor: per-agent LLM config with logical-name routing#102

Merged
wmeddie merged 1 commit into
mainfrom
fix/per-agent-llm-config-v2
May 7, 2026
Merged

refactor: per-agent LLM config with logical-name routing#102
wmeddie merged 1 commit into
mainfrom
fix/per-agent-llm-config-v2

Conversation

@wmeddie
Copy link
Copy Markdown
Member

@wmeddie wmeddie commented May 7, 2026

Summary

Each agent now declares its own (provider, model, api_key, base_url) under agent.llm. The router builds one provider instance per unique (provider_type, api_key, base_url) tuple and binds each agent's name as a logical model name. The harness in each container always calls back to the server's /v1/ proxy with model=<agent_id>; the server resolves the logical name and dispatches. Real upstream API keys never leave the server.

This replaces #100, which had several latent bugs (non-deterministic provider selection, dropped llamacpp registration, broken add-agent flow, frontend referencing removed fields). See the linked review for the full list. This PR addresses the same intent but cleanly.

Background

A diagnostic agent identified three real bugs that this PR fixes:

  1. The server's LlmRouter was global-only — it ignored per-agent llm overrides when the harness called back via /v1/.
  2. AgentLlmConfig had provider/api_key/base_url but no model — the model lived at AgentConfig.model at the top level, splitting the LLM block across two places.
  3. add_agent copied the global provider/key/url into each new agent's config, so global edits stopped propagating to existing agents.

The user's intent for the fix: each agent in xpressclaw.yaml declares its own LLM, the router routes per-agent, and runtime budget controls can re-point an agent at a cheaper model without restarting the container.

What changed

  • AgentLlmConfig gains model and model_path fields. The legacy top-level agent.model is migrated into llm.model on first load and is no longer serialized.
  • Global LlmConfig keeps only custom_pricing. The dropped fields (default_provider, openai_api_key, etc.) are silently ignored when loading old YAMLs.
  • env_overrides (ANTHROPIC_API_KEY, OPENAI_API_KEY, OPENAI_BASE_URL) applies per-agent based on each agent's declared provider.
  • LlmRouter::build_from_config(&Config) walks agents, deduplicates provider instances by provider_type|api_key|base_url, and binds each agent's name as a logical model name. Unknown names error — no random "first available provider" fallback.
  • LlmRouter::set_agent_model() lets budget-driven degradation re-point a binding at runtime without rebuilding the router.
  • /v1/messages resolves the agent (from the model field, or from sk-ant-<agent_id> placeholder header) before deciding to direct-proxy to anthropic.com vs. convert-and-route. Per-agent Anthropic keys are now used correctly.
  • Container env never carries real cloud keys; LLM_MODEL=<agent_id> plus placeholder API keys encoding the agent_id. The proxy is the only thing holding real upstream credentials.
  • add_agent no longer copies global config — new agents get a sensible local-Ollama default that the user edits afterwards.
  • update_agent_config and add_agent rebuild the router on edit so changes take effect without a restart. The previous behavior left the router stale until reboot.
  • Frontend LiveConfig.llm becomes a per-agent providers summary; settings/llm/+page.svelte renders per-agent entries instead of removed global fields.

Tests

Six new tests, all passing alongside the existing 396:

  • migrate_legacy_top_level_model and does_not_overwrite_explicit_llm_model lock the YAML migration.
  • effective_model checks the fallback ordering.
  • router_resolves_logical_agent_name and router_resolves_real_model_name cover both lookup paths.
  • router_two_agents_share_one_provider_instance proves the dedup is real (2 agents on the same OpenAI key share one OpenAiProvider; a third with a different key gets its own instance).
  • set_agent_model_repoints_to_cheaper_model covers the runtime swap path.
  • build_container_spec_no_real_keys_in_container locks in that real API keys never leak into container env, even when the agent has them configured.

Test plan

  • cargo fmt --all --check — clean
  • cargo clippy --workspace --all-targets — clean
  • cargo test --workspace — 402 passed
  • npx svelte-check — 0 errors
  • Manual: add an agent via the wizard with provider=openai, verify the new agent has its own llm block with model+key, not a global one
  • Manual: edit one agent to provider=anthropic, leave another on provider=openai, verify each agent talks to its own upstream when chatting
  • Manual: set OPENAI_API_KEY env var, verify it's picked up only by openai-provider agents

Each agent now declares its own (provider, model, api_key, base_url) under
`agent.llm`. The router builds one provider instance per unique
(provider_type, api_key, base_url) tuple and binds each agent's name as a
logical model name that resolves to (provider_instance, real_model). The
harness in each container always calls back to the server's /v1/ proxy
with `model=<agent_id>`; the server resolves the logical name and
dispatches. Real upstream API keys never leave the server — agent
identity is encoded in placeholder keys (sk-ant-<agent_id>,
sk-xpressclaw-<agent_id>) that the proxy parses.

What changed:
- AgentLlmConfig gains `model` and `model_path` fields; the legacy
  top-level `agent.model` is migrated into `llm.model` on load and is
  no longer serialized.
- Global LlmConfig keeps only `custom_pricing`; provider/key/url/model
  fields removed. Old YAMLs ignore the dropped keys cleanly.
- env_overrides (ANTHROPIC_API_KEY, OPENAI_API_KEY, OPENAI_BASE_URL)
  applies per-agent based on each agent's declared provider.
- LlmRouter::build_from_config(&Config) walks agents, deduplicates
  provider instances, and binds each agent's name. Unknown names error;
  no random "first available provider" fallback.
- LlmRouter::set_agent_model() lets budget-driven degradation re-point
  an agent at a cheaper real model without rebuilding the router.
- /v1/messages resolves the agent (from model field or sk-ant-<id>)
  before deciding to direct-proxy to anthropic.com vs. convert-and-route.
  Per-agent Anthropic keys are now used correctly.
- Container env never carries real cloud keys; LLM_MODEL=<agent_id>.
- add_agent no longer copies global config into a new agent — it gets a
  sensible local-Ollama default that the user can edit.
- update_agent_config and add_agent rebuild the router so changes take
  effect immediately.
- Frontend: LiveConfig.llm becomes a per-agent providers summary; the
  settings/llm page renders per-agent entries instead of global state.

Tests: 6 new tests cover migration, logical-name resolution, real-model
resolution, provider-instance deduplication, runtime model swap, and
that real keys never leak to container env. cargo fmt, clippy clean,
402 tests pass, svelte-check 0 errors.
@wmeddie wmeddie merged commit 95d9d7b into main May 7, 2026
4 checks passed
@wmeddie wmeddie deleted the fix/per-agent-llm-config-v2 branch May 7, 2026 01:23
wmeddie added a commit that referenced this pull request May 7, 2026
The embedded llama.cpp path (provider=local, GGUF download via hf-hub,
LazyLlamaCppProvider, metal/cuda/local-llm Cargo features) is removed.
Ollama becomes the only supported local backend. ADR-023 captures the
decision; ADR-011 is marked superseded.

Why: in-process llama.cpp invariant violations were taking down the
server, and shipping per-platform builds (Metal, CUDA, CPU) added a lot
of build-time complexity for releases. Ollama runs out-of-process with
its own platform-tuned builds, and we already speak its HTTP API.

Removed:
- crates/xpressclaw-core/src/llm/llamacpp.rs
- llama-cpp-2, hf-hub, encoding_rs from workspace + core Cargo.toml
- local-llm, metal, cuda features (core, server, cli)
- AgentLlmConfig.model_path field
- The "local" arm in LlmRouter::materialize_provider
- DownloadProgress / DownloadStatus / download_status route
- use_embedded request flag
- resolve_gguf_source and the post-setup GGUF download flow
- nvcc/Metal detection in build.sh
- "--skip llamacpp" filter in CI
- Wizard's Built-in provider button + download progress UI

Kept:
- provider=ollama with HTTP proxy (LocalProvider)
- Reconciler's per-host Ollama pull loop (reconcile_models)
- Hardware detection + recommend_model for Ollama tag picking

Future (not in this PR): publish XpressAI custom GGUFs (e.g.
Qwen3.6-27B-RYS-UD) to Ollama Hub so agents can pull them via the same
provider=ollama path.

Stacked on #102. cargo fmt, clippy clean, 401 tests pass, svelte-check
0 errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant