Herd

A model registry and lifecycle manager for local GGUF models served via llama-swap.

Herd gives you a single source of truth for your local model collection — organized by category, with cascading defaults, automatic HuggingFace metadata fetching, and one-command config generation.

What it does

Registry — Organize models in per-category YAML files with cascading defaults (global → category → model)
Add — Interactively add models from HuggingFace with auto-detected metadata (type, context length, special flags)
Download — Generate and run huggingface-cli download commands for your entire registry
Build — Generate llama-swap config.yaml from your registry with correct paths, flags, and sampling parameters
Status — Dashboard showing which models are downloaded, missing, or disabled
Validate — Check for missing files, orphaned GGUFs, and registry errors

Install

# Option 1: pip install (creates the `herd` command)
pip install -e .

# Option 2: use directly without installing
pip install pyyaml requests
alias herd='python3 /path/to/herd/manage.py'

Quick start

# Set your models directory
# Edit models/_defaults.yaml and set base_path to where your GGUFs live

# Add a model interactively
herd add unsloth/Qwen3-4B-Instruct-2507-GGUF

# Download it
herd download qwen3-4b

# Build llama-swap config and start serving
herd build --output config.yaml
llama-swap --config config.yaml

Registry structure

models/
  _defaults.yaml     # Global config: base_path, server settings, category defaults
  instruct.yaml      # Instruct/chat models
  coding.yaml        # Code generation models
  reasoning.yaml     # Chain-of-thought reasoning models
  embedding.yaml     # Embedding models (auto-adds --embedding flag)
  reranker.yaml      # Reranker models (auto-adds --reranking flag)

Files prefixed with _ are config. Everything else is a category — the filename determines the category. Add as many as you want: creative.yaml, medical.yaml, multilingual.yaml, etc.

Model entries

# models/instruct.yaml
qwen3-4b:
  path: instruct/qwen3-4b-q8/Qwen3-4B-Instruct-2507-UD-Q8_K_XL.gguf
  repo: unsloth/Qwen3-4B-Instruct-2507-GGUF
  file: Qwen3-4B-Instruct-2507-UD-Q8_K_XL.gguf
  ctx: 262144
  system_prompt: "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."
  sampling:
    top_p: 0.8
    top_k: 20
  tags:
  - q8
  - small

Cascading defaults

Config resolution: _defaults.yaml globals → category defaults → per-model overrides.

# models/_defaults.yaml
defaults:
  gpu_layers: 99
  categories:
    instruct:
      ctx: 32768
    embedding:
      ctx: 32768
      flags: [--embedding]

A model in instruct.yaml inherits gpu_layers: 99 and ctx: 32768 automatically. Override any field at the model level.

Flags

Flags merge by default (category flags + model flags). Use flags_override to replace entirely:

special-model:
  flags: [--extra]           # merged with category defaults
override-model:
  flags_override: [--only-this]  # replaces category defaults entirely

Commands

herd add [repo]        # Add a model (interactive if no repo given)
herd build             # Generate llama-swap config.yaml
herd status            # Show model status dashboard
herd list              # List models with filters
herd info <model>      # Show resolved config and command for a model
herd download          # Download models from HuggingFace
herd download --dry-run # Preview download commands
herd enable <model>    # Enable a disabled model
herd disable <model>   # Disable a model
herd validate          # Check registry health
herd scan              # Find orphaned GGUFs and propose registry entries
herd cleanup           # Remove orphaned files from disk
herd monitor           # Live TUI dashboard (requires textual, httpx)

Filtering

herd build --only embedding
herd build --exclude embedding,reranker
herd list --only instruct --tags small
herd status --all  # include disabled models

Model fields

Field	Required	Description
`path`	yes	Relative path from `base_path` to the GGUF file
`repo`	yes	HuggingFace repository (for downloads)
`file`	yes	Filename or glob pattern for the GGUF
`ctx`	no	Context length (inherits from category/global default)
`system_prompt`	no	Recommended system prompt from model documentation
`summary`	no	Model description and notes
`sampling`	no	Sampling parameters (temperature, top_p, top_k, etc.)
`tags`	no	Tags for filtering (e.g., `small`, `q8`, `general`)
`flags`	no	Additional llama-server flags (merged with category defaults)
`flags_override`	no	Replace category default flags entirely
`mmproj`	no	Path to multimodal projection file (vision models)
`enabled`	no	Set to `false` to disable without removing

Model path format

Relative paths from base_path:

{category}/{model-dir}/{filename.gguf}

Example: instruct/qwen3-4b-q8/Qwen3-4B-Instruct-2507-UD-Q8_K_XL.gguf

Auto-detection

When adding models, Herd auto-detects:

Model type — OCR, embedding, reranker, reasoning, coding, or instruct (from filename/repo patterns)
Special flags — --embedding for embedding models, --reranking for rerankers, --jinja for Phi-4, --chat-template chatml for OLMo
Context length — From HuggingFace config.json

Requirements

Python 3.8+
pyyaml, requests
llama-swap (for serving)
llama.cpp (llama-server backend)
huggingface-cli (for downloads)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
lib		lib
models		models
scripts		scripts
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
manage.py		manage.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Herd

What it does

Install

Quick start

Registry structure

Model entries

Cascading defaults

Flags

Commands

Filtering

Model fields

Model path format

Auto-detection

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Herd

What it does

Install

Quick start

Registry structure

Model entries

Cascading defaults

Flags

Commands

Filtering

Model fields

Model path format

Auto-detection

Requirements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages