mdl — Model Downloader

A CLI toolkit for downloading, converting, quantizing, and uploading Hugging Face models. Supports full end-to-end pipelines from HF Hub to S3-compatible storage.

Note: The convert, quantize, and pipeline commands require llama.cpp. Run mdl bootstrap-llamacpp first to fetch and build it automatically (requires git, cmake, make).

Features

Batch downloads from Hugging Face Hub with resume and state tracking
GGUF conversion via llama.cpp (convert_hf_to_gguf.py)
Quantization to Q4_K_M, Q5_K_M, Q8_0, etc. via llama-quantize
Ollama Modelfile generation — auto-detects chat format from the model's actual config files (supports ChatML, LLaMA-3, Gemma, Phi, Mistral, DeepSeek, and more)
S3 upload to MinIO or any S3-compatible endpoint
Pipeline mode — download → convert → quantize → upload in one command
Bootstrap — fetch and build llama.cpp automatically
Per-model error handling, dry-run mode, disk-space checks, and YAML configuration

Requirements

Python 3.11+
uv (recommended) or pip
git, cmake, make (for bootstrap-llamacpp)

Installation

git clone https://github.com/fuzzylabs/mdl.git
cd mdl
uv sync            # or: pip install -e .

To enable faster Hugging Face downloads:

uv add hf_transfer   # or: pip install hf_transfer

Configuration

Copy the example files and fill in your values:

cp .env.example .env
cp models.yaml.example models.yaml
cp pipeline.yaml.example pipeline.yaml

See .env.example for all available environment variables.

CLI Reference

All commands are accessed through the mdl entry point:

mdl [COMMAND] [OPTIONS]

`mdl download`

Batch-download models from Hugging Face Hub.

mdl download --config models.yaml                # download all models in config
mdl download -r google/gemma-3-1b-it             # download a single model
mdl download -r org/model-a -r org/model-b       # download multiple by repo ID
mdl download --config models.yaml --dry-run      # preview without downloading
mdl download --clear-state                        # reset download tracking

Option	Description
`-r, --repo-id`	Repo ID to download (repeatable)
`-c, --config PATH`	YAML config file (models.yaml format)
`-n, --dry-run`	Preview what would be downloaded
`--clear-state`	Clear download state and exit
`--min-disk-space INT`	Minimum free disk space in GB (default: 10)
`--delete-after`	Delete model from HF cache after download
`-v, --verbose`	Enable debug logging

models.yaml format

google:
  - gemma-3-1b-it
  - gemma-3-4b-it

meta-llama:
  - Llama-3.2-1B
  - Llama-3.2-1B-Instruct

Each entry becomes the repo ID org/model when downloaded.

`mdl convert`

Convert a downloaded Hugging Face model to F16 GGUF format.

mdl convert -m /path/to/model-dir                           # output defaults to models/<org>/<model>/<model>.f16.gguf
mdl convert -m /path/to/model-dir -o custom/path/out.gguf   # explicit output path

Option	Description
`-m, --model-dir PATH`	Path to the downloaded HF model directory (required)
`-o, --output PATH`	Output F16 GGUF path (default: `models/<org>/<model>/<model>.f16.gguf`)
`--llama-cpp-dir PATH`	Override `LLAMA_CPP_DIR` env var
`-v, --verbose`	Enable debug logging

`mdl quantize`

Quantize an F16 GGUF file to a smaller representation.

mdl quantize -i model.f16.gguf                              # output defaults to model.Q4_K_M.gguf alongside input
mdl quantize -i model.f16.gguf -o out.gguf -t Q5_K_M        # explicit output and type
mdl quantize -i model.f16.gguf --model-dir /path/to/hf-model # also generates Modelfile + config files + README

Option	Description
`-i, --input PATH`	Input F16 GGUF file (required)
`-o, --output PATH`	Output quantized GGUF path (default: `<input_dir>/<model>.<type>.gguf`)
`-t, --type TEXT`	Quantization type — e.g. Q4_K_M, Q5_K_M, Q8_0 (default: Q4_K_M)
`--llama-cpp-dir PATH`	Override `LLAMA_CPP_DIR` env var
`--model-dir PATH`	HF model directory — generates Ollama Modelfile, copies config files, and creates a MODELFILE_README.md next to the output
`-v, --verbose`	Enable debug logging

`mdl upload`

Upload a file to MinIO / S3-compatible storage.

mdl upload -f model.Q4_K_M.gguf
mdl upload -f model.Q4_K_M.gguf -p models/gemma -b my-bucket

Option	Description
`-f, --file PATH`	Local file to upload (required)
`-k, --s3-key TEXT`	Explicit S3 object key (defaults to filename)
`-p, --s3-prefix TEXT`	Prefix (directory) in the bucket
`-b, --bucket TEXT`	Override `MINIO_BUCKET` env var
`-v, --verbose`	Enable debug logging

`mdl pipeline`

Run the full pipeline: download → convert → quantize → upload.

mdl pipeline                                     # uses pipeline.yaml by default
mdl pipeline -c pipeline.yaml --dry-run          # preview without executing
mdl pipeline --no-upload                         # skip the S3 upload step
mdl pipeline --force                             # reprocess completed models
mdl pipeline --keep-quantized                    # keep GGUF files locally after upload
mdl pipeline --clear-state                       # reset pipeline state and exit

Option	Description
`-c, --config PATH`	Pipeline config file (default: pipeline.yaml)
`-n, --dry-run`	Preview actions without executing
`--clear-state`	Clear pipeline state and exit
`--force`	Reprocess already-completed models
`--no-upload`	Skip S3 upload step
`--keep-download`	Keep downloaded model files after processing
`--keep-quantized`	Keep quantized GGUF files after upload

Note: --no-upload only skips the S3 upload — it does not keep files on disk. The pipeline works in a temporary directory that is deleted after each model. To retain the quantized GGUF and related files locally, pass --keep-quantized (e.g. mdl pipeline --no-upload --keep-quantized). | --min-disk-space INT | Minimum free disk space in GB (default: 10) | | -v, --verbose | Enable debug logging |

pipeline.yaml format

models:
  - repo_id: google/gemma-3-1b-it        # required
    # quantize: true                      # default: true
    # upload: true                        # default: true
    # quantization: Q4_K_M               # default: Q4_K_M
    # output_name: custom-name.gguf      # default: model.QTYPE.gguf
    # revision: main                     # pin a git revision / branch / tag

  - repo_id: meta-llama/Llama-3.2-1B
  - repo_id: microsoft/Phi-4-mini-instruct

Output paths

All output files are organised under models/<org>/<model_name>/:

Local (with --keep-quantized): models/google/gemma-3-1b-it/gemma-3-1b-it.Q4_K_M.gguf
S3: s3://<bucket>/models/google/gemma-3-1b-it/gemma-3-1b-it.Q4_K_M.gguf

Ollama Modelfile

The pipeline automatically generates an Ollama Modelfile alongside each quantized GGUF. It reads the model's actual config files — config.json, tokenizer_config.json, and generation_config.json — to derive everything dynamically:

FROM — path to the GGUF file
TEMPLATE — Ollama Go template, detected from the model's eos_token (not just model_type). This correctly handles fine-tunes that change the chat format (e.g. Dolphin-Mistral uses ChatML despite being model_type: mistral)
SYSTEM — default system prompt, extracted from the Jinja2 chat_template via regex
PARAMETER — num_ctx, temperature, top_p, top_k, repeat_penalty, and stop tokens — all from the model's own configs

The raw Jinja2 chat_template is also included as comments at the bottom of the Modelfile for cross-reference when editing.

Supported model families (via EOS token and model_type detection):

Format	Models
ChatML (`<\|im_end\|>`)	Qwen, Qwen2, Qwen3, Dolphin, Yi, InternLM2, DeepSeek-V2/V3/R1, Jamba
LLaMA-3 (`<\|eot_id\|>`)	LLaMA-3, LLaMA-3.1, LLaMA-3.2, LLaMA-3.3
Gemma (`<end_of_turn>`)	Gemma, Gemma 2, Gemma 3
Phi (`<\|end\|>`)	Phi-3, Phi-3.5, Phi-4
Mistral (`</s>`)	Mistral-7B, Mixtral
Command-R	Cohere Command-R, Command-R+
Completion	StarCoder2, Falcon

Unknown models fall back to a generic template with a warning.

Output files (alongside the GGUF):

File	Purpose
`Modelfile`	Ollama-ready model definition
`MODELFILE_README.md`	Guide explaining each Modelfile section
`config.json`	Model architecture reference (copied from HF)
`tokenizer_config.json`	Chat template & tokens reference (copied from HF)
`generation_config.json`	Generation params reference (copied from HF)
`special_tokens_map.json`	Special tokens reference (copied from HF)

These files are:

Uploaded to S3 at models/<org>/<model>/
Saved locally when using --keep-quantized
Logged at DEBUG level when using --verbose

To use the generated Modelfile with Ollama:

cd models/google/gemma-3-1b-it/
ollama create gemma3-1b -f Modelfile
ollama run gemma3-1b

The pipeline also writes a URL registry to model_urls.json locally and mirrors it to s3://<bucket>/metadata/model_urls.json. Each entry includes a download_url and a curl download reference.

`mdl bootstrap-llamacpp`

Clone, build, and extract the required llama.cpp binaries. Requires git, cmake, and make.

mdl bootstrap-llamacpp

This fetches llama.cpp from GitHub, builds it with cmake + make, and copies the required binaries and headers to a llama.cpp-dist/ directory. If llama.cpp/ already exists, the clone step is skipped.

This is a prerequisite for mdl convert, mdl quantize, and mdl pipeline.

`mdl --version`

Print the installed version.

mdl --version

Environment Variables

All variables are set in .env (see .env.example).

Hugging Face

Variable	Description	Default
`HF_TOKEN`	Auth token for private/gated models	—
`HF_ENDPOINT`	Custom HF endpoint (mirror/proxy)	`https://huggingface.co`
`HF_HOME`	HF cache directory	`~/.cache/huggingface/`
`HF_HUB_DOWNLOAD_TIMEOUT`	Download timeout in seconds	120
`HF_HUB_ETAG_TIMEOUT`	ETag timeout in seconds	10
`HF_HUB_ENABLE_HF_TRANSFER`	Enable fast transfers (requires `hf_transfer`)	0

MinIO / S3

Variable	Description	Default
`MINIO_ENDPOINT`	S3 endpoint (host:port)	—
`MINIO_ACCESS_KEY`	Access key	—
`MINIO_SECRET_KEY`	Secret key	—
`MINIO_BUCKET`	Target bucket	`models`
`MINIO_SECURE`	Use HTTPS	`true`
`MINIO_PUBLIC_URL`	Public base URL for downloads	—
`MINIO_PRESIGN_DAYS`	Presigned URL expiry in days	7

llama.cpp

Variable	Description	Default
`LLAMA_CPP_DIR`	Path to llama.cpp directory	`llama.cpp`

How It Works

Load environment — reads .env before any HF imports
Parse config — loads YAML and builds the model list
Validate — checks credentials, disk space, and config structure
Process models — download, convert, quantize, and upload each model
Track state — persists progress to .download_state.json / .pipeline_state.json
Handle errors — logs failures per model and continues with the rest
Summarise — prints totals for successful, failed, and skipped models

Resume is automatic. Completed models are skipped on re-run. Use --clear-state to start fresh or --force to reprocess.

Troubleshooting

Problem	Solution
`RepositoryNotFoundError` / `GatedRepoError`	Set `HF_TOKEN` in `.env`. For gated models, accept terms on the model page first.
Downloads timing out	Increase `HF_HUB_DOWNLOAD_TIMEOUT` in `.env`
Disk space errors	Set `HF_HOME` to a larger drive, or use `--min-disk-space`
Slow downloads	Install `hf_transfer` and set `HF_HUB_ENABLE_HF_TRANSFER=1`
Re-downloading completed models	Don't delete `.download_state.json`. Use `--clear-state` only intentionally.

Development

uv sync --all-extras          # install dev dependencies
uv run pytest                 # run tests
uv run pytest --cov=mdl       # run tests with coverage

Project Structure

src/mdl/
├── __init__.py               # package version
├── cli/
│   ├── __init__.py           # Click group & subcommand registration
│   ├── bootstrap.py          # mdl bootstrap-llamacpp
│   ├── convert.py            # mdl convert
│   ├── download.py           # mdl download
│   ├── pipeline.py           # mdl pipeline
│   ├── quantize.py           # mdl quantize
│   └── upload.py             # mdl upload
└── core/
    ├── config.py             # env loading & logging setup
    ├── downloader.py         # HF Hub download logic & state
    ├── modelfile.py          # Ollama Modelfile generator
    ├── quantizer.py          # llama.cpp convert & quantize
    ├── uploader.py           # MinIO / S3 upload client
    └── url_manager.py        # model URL registry

License

See LICENSE.

Contributing

Contributions welcome. Please follow the existing code style, add tests for new features, and verify with --dry-run before submitting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mdl — Model Downloader

Features

Requirements

Installation

Configuration

CLI Reference

`mdl download`

models.yaml format

`mdl convert`

`mdl quantize`

`mdl upload`

`mdl pipeline`

pipeline.yaml format

Output paths

Ollama Modelfile

`mdl bootstrap-llamacpp`

`mdl --version`

Environment Variables

Hugging Face

MinIO / S3

llama.cpp

How It Works

Troubleshooting

Development

Project Structure

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src/mdl		src/mdl
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
models.yaml.example		models.yaml.example
pipeline.yaml.example		pipeline.yaml.example
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

mdl — Model Downloader

Features

Requirements

Installation

Configuration

CLI Reference

mdl download

models.yaml format

mdl convert

mdl quantize

mdl upload

mdl pipeline

pipeline.yaml format

Output paths

Ollama Modelfile

mdl bootstrap-llamacpp

mdl --version

Environment Variables

Hugging Face

MinIO / S3

llama.cpp

How It Works

Troubleshooting

Development

Project Structure

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`mdl download`

`mdl convert`

`mdl quantize`

`mdl upload`

`mdl pipeline`

`mdl bootstrap-llamacpp`

`mdl --version`

Packages