Skip to content

ndcorder/herd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Herd

A model registry and lifecycle manager for local GGUF models served via llama-swap.

Herd gives you a single source of truth for your local model collection — organized by category, with cascading defaults, automatic HuggingFace metadata fetching, and one-command config generation.

What it does

  • Registry — Organize models in per-category YAML files with cascading defaults (global → category → model)
  • Add — Interactively add models from HuggingFace with auto-detected metadata (type, context length, special flags)
  • Download — Generate and run huggingface-cli download commands for your entire registry
  • Build — Generate llama-swap config.yaml from your registry with correct paths, flags, and sampling parameters
  • Status — Dashboard showing which models are downloaded, missing, or disabled
  • Validate — Check for missing files, orphaned GGUFs, and registry errors

Install

# Option 1: pip install (creates the `herd` command)
pip install -e .

# Option 2: use directly without installing
pip install pyyaml requests
alias herd='python3 /path/to/herd/manage.py'

Quick start

# Set your models directory
# Edit models/_defaults.yaml and set base_path to where your GGUFs live

# Add a model interactively
herd add unsloth/Qwen3-4B-Instruct-2507-GGUF

# Download it
herd download qwen3-4b

# Build llama-swap config and start serving
herd build --output config.yaml
llama-swap --config config.yaml

Registry structure

models/
  _defaults.yaml     # Global config: base_path, server settings, category defaults
  instruct.yaml      # Instruct/chat models
  coding.yaml        # Code generation models
  reasoning.yaml     # Chain-of-thought reasoning models
  embedding.yaml     # Embedding models (auto-adds --embedding flag)
  reranker.yaml      # Reranker models (auto-adds --reranking flag)

Files prefixed with _ are config. Everything else is a category — the filename determines the category. Add as many as you want: creative.yaml, medical.yaml, multilingual.yaml, etc.

Model entries

# models/instruct.yaml
qwen3-4b:
  path: instruct/qwen3-4b-q8/Qwen3-4B-Instruct-2507-UD-Q8_K_XL.gguf
  repo: unsloth/Qwen3-4B-Instruct-2507-GGUF
  file: Qwen3-4B-Instruct-2507-UD-Q8_K_XL.gguf
  ctx: 262144
  system_prompt: "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."
  sampling:
    top_p: 0.8
    top_k: 20
  tags:
  - q8
  - small

Cascading defaults

Config resolution: _defaults.yaml globals → category defaults → per-model overrides.

# models/_defaults.yaml
defaults:
  gpu_layers: 99
  categories:
    instruct:
      ctx: 32768
    embedding:
      ctx: 32768
      flags: [--embedding]

A model in instruct.yaml inherits gpu_layers: 99 and ctx: 32768 automatically. Override any field at the model level.

Flags

Flags merge by default (category flags + model flags). Use flags_override to replace entirely:

special-model:
  flags: [--extra]           # merged with category defaults
override-model:
  flags_override: [--only-this]  # replaces category defaults entirely

Commands

herd add [repo]        # Add a model (interactive if no repo given)
herd build             # Generate llama-swap config.yaml
herd status            # Show model status dashboard
herd list              # List models with filters
herd info <model>      # Show resolved config and command for a model
herd download          # Download models from HuggingFace
herd download --dry-run # Preview download commands
herd enable <model>    # Enable a disabled model
herd disable <model>   # Disable a model
herd validate          # Check registry health
herd scan              # Find orphaned GGUFs and propose registry entries
herd cleanup           # Remove orphaned files from disk
herd monitor           # Live TUI dashboard (requires textual, httpx)

Filtering

herd build --only embedding
herd build --exclude embedding,reranker
herd list --only instruct --tags small
herd status --all  # include disabled models

Model fields

Field Required Description
path yes Relative path from base_path to the GGUF file
repo yes HuggingFace repository (for downloads)
file yes Filename or glob pattern for the GGUF
ctx no Context length (inherits from category/global default)
system_prompt no Recommended system prompt from model documentation
summary no Model description and notes
sampling no Sampling parameters (temperature, top_p, top_k, etc.)
tags no Tags for filtering (e.g., small, q8, general)
flags no Additional llama-server flags (merged with category defaults)
flags_override no Replace category default flags entirely
mmproj no Path to multimodal projection file (vision models)
enabled no Set to false to disable without removing

Model path format

Relative paths from base_path:

{category}/{model-dir}/{filename.gguf}

Example: instruct/qwen3-4b-q8/Qwen3-4B-Instruct-2507-UD-Q8_K_XL.gguf

Auto-detection

When adding models, Herd auto-detects:

  • Model type — OCR, embedding, reranker, reasoning, coding, or instruct (from filename/repo patterns)
  • Special flags--embedding for embedding models, --reranking for rerankers, --jinja for Phi-4, --chat-template chatml for OLMo
  • Context length — From HuggingFace config.json

Requirements

  • Python 3.8+
  • pyyaml, requests
  • llama-swap (for serving)
  • llama.cpp (llama-server backend)
  • huggingface-cli (for downloads)

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors