llama-suite

llama-suite is a config-driven local LLM operations toolkit built around llama.cpp and llama-swap. It keeps machine-specific model configuration, runtime control, evaluation, benchmarking, and deployment packaging in one repo instead of scattering them across shell scripts, local notes, and one-off containers.

Why

Running local models across multiple machines gets messy fast:

one base config drifts into several hand-edited variants
runtime launch scripts and eval scripts stop agreeing on ports or model names
UI controls, deployment packaging, and benchmark outputs live in different places

llama-suite exists to make that workflow reproducible. The repo treats local LLM ops as a system, not just a server binary.

How

Define the shared baseline in configs/config.base.yaml.
Layer machine-specific overrides from configs/overrides/.
Generate an effective runtime config for llama-swap and llama.cpp-compatible backends.
Use the FastAPI Web UI to inspect config, launch endpoints, run sweeps, benchmarks, memory scans, and evals.
Package the same control plane for local containers, OrbStack, Compose, Helm, and marketplace-style deployment.

What's Different

It is ops-first, not SDK-first. The main artifact is a working local control plane.
Config overrides are first-class, so one repo can drive macOS, Windows, and larger workstation setups.
The Web UI is connected to real local tasks: endpoint lifecycle, config editing, download flows, sweeps, and results.
Deployment packaging lives next to the runtime and eval tooling, so local experimentation and hosted control surfaces stay aligned.

Screenshots

Dashboard

The dashboard shows active endpoint tasks, Open WebUI controls, system state, and the currently selected machine override.

Config Studio

Config Studio exposes the merged config model instead of raw YAML only, which makes override editing and validation much faster.

Model Inventory

The models view surfaces readiness, context size, GPU/thread settings, and missing artifacts from the same source of truth used by the runtime.

Architecture

flowchart LR
    Base["configs/config.base.yaml"]
    Overrides["configs/overrides/*.yaml"]
    Merge["config merge + validation"]
    Effective["generated effective config"]
    Watcher["llama_swap_watch.py"]
    Swap["llama-swap"]
    Runtime["llama.cpp / ik_llama.cpp"]
    WebUI["FastAPI Web UI"]
    Bench["bench tools"]
    Eval["eval tools"]
    Memory["memory scan"]
    Results["runs/ + result artifacts"]
    Deploy["deploy/{compose,orbstack,helm,marketplace}"]

    Base --> Merge
    Overrides --> Merge
    Merge --> Effective
    Effective --> Watcher
    Watcher --> Swap
    Swap --> Runtime

    WebUI --> Effective
    WebUI --> Watcher
    WebUI --> Bench
    WebUI --> Eval
    WebUI --> Memory

    Bench --> Results
    Eval --> Results
    Memory --> Results

    Deploy --> WebUI

Core Capabilities

Launch and restart llama-swap from merged base plus override config.
Expose a local Web UI for config inspection, endpoint control, model management, and results.
Run benchmark, evaluation, sweep, and memory-scan tasks against the same configured model inventory.
Support Open WebUI lifecycle management alongside the endpoint layer.
Package the Web UI for Docker/Compose, OrbStack, Helm, and marketplace-style deployment.

Quick Start

Use Python 3.10+.

macOS/Linux:

python tools/scripts/install.py --dev-extras
./.venv/bin/python -m llama_suite.webui.server

Windows PowerShell:

python tools\scripts\install.py --dev-extras
.\.venv\Scripts\python.exe -m llama_suite.webui.server

The Web UI serves on http://localhost:8088.

Common Commands

Install or refresh the repo environment:

python tools/scripts/install.py --dev-extras
./.venv/bin/python tools/scripts/update.py --dev-extras

Run the watcher with a machine override:

./.venv/bin/python -m llama_suite.watchers.llama_swap_watch -o configs/overrides/mac-m3-max-36G.yaml

Run tests:

./.venv/bin/python -m pytest -q

Deployment

deploy/orbstack/README.md: local macOS container deployment.
deploy/charts/llama-suite-webui/README.md: Helm chart for the Web UI.
deploy/marketplace/llama-suite-webui/README.md: marketplace packaging.

Release Hygiene

Changelog: CHANGELOG.md
Release playbook: docs/releasing.md

Notes

models/, runs/, var/, and generated configs are intentionally local and ignored.
The repo vendors only the pieces needed to support local runtime workflows.
The Web UI package includes its static assets and schema when built from this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
.github/workflows		.github/workflows
configs		configs
datasets		datasets
deploy		deploy
docs		docs
src/llama_suite		src/llama_suite
tests		tests
tools		tools
vendor/llama-swap		vendor/llama-swap
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llama-suite

Why

How

What's Different

Screenshots

Architecture

Core Capabilities

Quick Start

Common Commands

Deployment

Release Hygiene

Notes

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llama-suite

Why

How

What's Different

Screenshots

Architecture

Core Capabilities

Quick Start

Common Commands

Deployment

Release Hygiene

Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages