llama-suite is a config-driven local LLM operations toolkit built around llama.cpp and llama-swap.
It keeps machine-specific model configuration, runtime control, evaluation, benchmarking, and deployment packaging in one repo instead of scattering them across shell scripts, local notes, and one-off containers.
Running local models across multiple machines gets messy fast:
- one base config drifts into several hand-edited variants
- runtime launch scripts and eval scripts stop agreeing on ports or model names
- UI controls, deployment packaging, and benchmark outputs live in different places
llama-suite exists to make that workflow reproducible. The repo treats local LLM ops as a system, not just a server binary.
- Define the shared baseline in
configs/config.base.yaml. - Layer machine-specific overrides from
configs/overrides/. - Generate an effective runtime config for
llama-swapandllama.cpp-compatible backends. - Use the FastAPI Web UI to inspect config, launch endpoints, run sweeps, benchmarks, memory scans, and evals.
- Package the same control plane for local containers, OrbStack, Compose, Helm, and marketplace-style deployment.
- It is ops-first, not SDK-first. The main artifact is a working local control plane.
- Config overrides are first-class, so one repo can drive macOS, Windows, and larger workstation setups.
- The Web UI is connected to real local tasks: endpoint lifecycle, config editing, download flows, sweeps, and results.
- Deployment packaging lives next to the runtime and eval tooling, so local experimentation and hosted control surfaces stay aligned.
Dashboard
The dashboard shows active endpoint tasks, Open WebUI controls, system state, and the currently selected machine override.
Config Studio
Config Studio exposes the merged config model instead of raw YAML only, which makes override editing and validation much faster.
Model Inventory
The models view surfaces readiness, context size, GPU/thread settings, and missing artifacts from the same source of truth used by the runtime.
flowchart LR
Base["configs/config.base.yaml"]
Overrides["configs/overrides/*.yaml"]
Merge["config merge + validation"]
Effective["generated effective config"]
Watcher["llama_swap_watch.py"]
Swap["llama-swap"]
Runtime["llama.cpp / ik_llama.cpp"]
WebUI["FastAPI Web UI"]
Bench["bench tools"]
Eval["eval tools"]
Memory["memory scan"]
Results["runs/ + result artifacts"]
Deploy["deploy/{compose,orbstack,helm,marketplace}"]
Base --> Merge
Overrides --> Merge
Merge --> Effective
Effective --> Watcher
Watcher --> Swap
Swap --> Runtime
WebUI --> Effective
WebUI --> Watcher
WebUI --> Bench
WebUI --> Eval
WebUI --> Memory
Bench --> Results
Eval --> Results
Memory --> Results
Deploy --> WebUI
- Launch and restart
llama-swapfrom merged base plus override config. - Expose a local Web UI for config inspection, endpoint control, model management, and results.
- Run benchmark, evaluation, sweep, and memory-scan tasks against the same configured model inventory.
- Support Open WebUI lifecycle management alongside the endpoint layer.
- Package the Web UI for Docker/Compose, OrbStack, Helm, and marketplace-style deployment.
Use Python 3.10+.
macOS/Linux:
python tools/scripts/install.py --dev-extras
./.venv/bin/python -m llama_suite.webui.serverWindows PowerShell:
python tools\scripts\install.py --dev-extras
.\.venv\Scripts\python.exe -m llama_suite.webui.serverThe Web UI serves on http://localhost:8088.
Install or refresh the repo environment:
python tools/scripts/install.py --dev-extras
./.venv/bin/python tools/scripts/update.py --dev-extrasRun the watcher with a machine override:
./.venv/bin/python -m llama_suite.watchers.llama_swap_watch -o configs/overrides/mac-m3-max-36G.yamlRun tests:
./.venv/bin/python -m pytest -q- deploy/orbstack/README.md: local macOS container deployment.
- deploy/charts/llama-suite-webui/README.md: Helm chart for the Web UI.
- deploy/marketplace/llama-suite-webui/README.md: marketplace packaging.
- Changelog: CHANGELOG.md
- Release playbook: docs/releasing.md
models/,runs/,var/, and generated configs are intentionally local and ignored.- The repo vendors only the pieces needed to support local runtime workflows.
- The Web UI package includes its static assets and schema when built from this repo.


