Skip to content

fuzzylabs/mdl

Repository files navigation

mdl — Model Downloader

A CLI toolkit for downloading, converting, quantizing, and uploading Hugging Face models. Supports full end-to-end pipelines from HF Hub to S3-compatible storage.

Note: The convert, quantize, and pipeline commands require llama.cpp. Run mdl bootstrap-llamacpp first to fetch and build it automatically (requires git, cmake, make).

Features

  • Batch downloads from Hugging Face Hub with resume and state tracking
  • GGUF conversion via llama.cpp (convert_hf_to_gguf.py)
  • Quantization to Q4_K_M, Q5_K_M, Q8_0, etc. via llama-quantize
  • Ollama Modelfile generation — auto-detects chat format from the model's actual config files (supports ChatML, LLaMA-3, Gemma, Phi, Mistral, DeepSeek, and more)
  • S3 upload to MinIO or any S3-compatible endpoint
  • Pipeline mode — download → convert → quantize → upload in one command
  • Bootstrap — fetch and build llama.cpp automatically
  • Per-model error handling, dry-run mode, disk-space checks, and YAML configuration

Requirements

  • Python 3.11+
  • uv (recommended) or pip
  • git, cmake, make (for bootstrap-llamacpp)

Installation

git clone https://github.com/fuzzylabs/mdl.git
cd mdl
uv sync            # or: pip install -e .

To enable faster Hugging Face downloads:

uv add hf_transfer   # or: pip install hf_transfer

Configuration

Copy the example files and fill in your values:

cp .env.example .env
cp models.yaml.example models.yaml
cp pipeline.yaml.example pipeline.yaml

See .env.example for all available environment variables.

CLI Reference

All commands are accessed through the mdl entry point:

mdl [COMMAND] [OPTIONS]

mdl download

Batch-download models from Hugging Face Hub.

mdl download --config models.yaml                # download all models in config
mdl download -r google/gemma-3-1b-it             # download a single model
mdl download -r org/model-a -r org/model-b       # download multiple by repo ID
mdl download --config models.yaml --dry-run      # preview without downloading
mdl download --clear-state                        # reset download tracking
Option Description
-r, --repo-id Repo ID to download (repeatable)
-c, --config PATH YAML config file (models.yaml format)
-n, --dry-run Preview what would be downloaded
--clear-state Clear download state and exit
--min-disk-space INT Minimum free disk space in GB (default: 10)
--delete-after Delete model from HF cache after download
-v, --verbose Enable debug logging

models.yaml format

google:
  - gemma-3-1b-it
  - gemma-3-4b-it

meta-llama:
  - Llama-3.2-1B
  - Llama-3.2-1B-Instruct

Each entry becomes the repo ID org/model when downloaded.


mdl convert

Convert a downloaded Hugging Face model to F16 GGUF format.

mdl convert -m /path/to/model-dir                           # output defaults to models/<org>/<model>/<model>.f16.gguf
mdl convert -m /path/to/model-dir -o custom/path/out.gguf   # explicit output path
Option Description
-m, --model-dir PATH Path to the downloaded HF model directory (required)
-o, --output PATH Output F16 GGUF path (default: models/<org>/<model>/<model>.f16.gguf)
--llama-cpp-dir PATH Override LLAMA_CPP_DIR env var
-v, --verbose Enable debug logging

mdl quantize

Quantize an F16 GGUF file to a smaller representation.

mdl quantize -i model.f16.gguf                              # output defaults to model.Q4_K_M.gguf alongside input
mdl quantize -i model.f16.gguf -o out.gguf -t Q5_K_M        # explicit output and type
mdl quantize -i model.f16.gguf --model-dir /path/to/hf-model # also generates Modelfile + config files + README
Option Description
-i, --input PATH Input F16 GGUF file (required)
-o, --output PATH Output quantized GGUF path (default: <input_dir>/<model>.<type>.gguf)
-t, --type TEXT Quantization type — e.g. Q4_K_M, Q5_K_M, Q8_0 (default: Q4_K_M)
--llama-cpp-dir PATH Override LLAMA_CPP_DIR env var
--model-dir PATH HF model directory — generates Ollama Modelfile, copies config files, and creates a MODELFILE_README.md next to the output
-v, --verbose Enable debug logging

mdl upload

Upload a file to MinIO / S3-compatible storage.

mdl upload -f model.Q4_K_M.gguf
mdl upload -f model.Q4_K_M.gguf -p models/gemma -b my-bucket
Option Description
-f, --file PATH Local file to upload (required)
-k, --s3-key TEXT Explicit S3 object key (defaults to filename)
-p, --s3-prefix TEXT Prefix (directory) in the bucket
-b, --bucket TEXT Override MINIO_BUCKET env var
-v, --verbose Enable debug logging

mdl pipeline

Run the full pipeline: download → convert → quantize → upload.

mdl pipeline                                     # uses pipeline.yaml by default
mdl pipeline -c pipeline.yaml --dry-run          # preview without executing
mdl pipeline --no-upload                         # skip the S3 upload step
mdl pipeline --force                             # reprocess completed models
mdl pipeline --keep-quantized                    # keep GGUF files locally after upload
mdl pipeline --clear-state                       # reset pipeline state and exit
Option Description
-c, --config PATH Pipeline config file (default: pipeline.yaml)
-n, --dry-run Preview actions without executing
--clear-state Clear pipeline state and exit
--force Reprocess already-completed models
--no-upload Skip S3 upload step
--keep-download Keep downloaded model files after processing
--keep-quantized Keep quantized GGUF files after upload

Note: --no-upload only skips the S3 upload — it does not keep files on disk. The pipeline works in a temporary directory that is deleted after each model. To retain the quantized GGUF and related files locally, pass --keep-quantized (e.g. mdl pipeline --no-upload --keep-quantized). | --min-disk-space INT | Minimum free disk space in GB (default: 10) | | -v, --verbose | Enable debug logging |

pipeline.yaml format

models:
  - repo_id: google/gemma-3-1b-it        # required
    # quantize: true                      # default: true
    # upload: true                        # default: true
    # quantization: Q4_K_M               # default: Q4_K_M
    # output_name: custom-name.gguf      # default: model.QTYPE.gguf
    # revision: main                     # pin a git revision / branch / tag

  - repo_id: meta-llama/Llama-3.2-1B
  - repo_id: microsoft/Phi-4-mini-instruct

Output paths

All output files are organised under models/<org>/<model_name>/:

  • Local (with --keep-quantized): models/google/gemma-3-1b-it/gemma-3-1b-it.Q4_K_M.gguf
  • S3: s3://<bucket>/models/google/gemma-3-1b-it/gemma-3-1b-it.Q4_K_M.gguf

Ollama Modelfile

The pipeline automatically generates an Ollama Modelfile alongside each quantized GGUF. It reads the model's actual config files — config.json, tokenizer_config.json, and generation_config.json — to derive everything dynamically:

  • FROM — path to the GGUF file
  • TEMPLATE — Ollama Go template, detected from the model's eos_token (not just model_type). This correctly handles fine-tunes that change the chat format (e.g. Dolphin-Mistral uses ChatML despite being model_type: mistral)
  • SYSTEM — default system prompt, extracted from the Jinja2 chat_template via regex
  • PARAMETERnum_ctx, temperature, top_p, top_k, repeat_penalty, and stop tokens — all from the model's own configs

The raw Jinja2 chat_template is also included as comments at the bottom of the Modelfile for cross-reference when editing.

Supported model families (via EOS token and model_type detection):

Format Models
ChatML (<|im_end|>) Qwen, Qwen2, Qwen3, Dolphin, Yi, InternLM2, DeepSeek-V2/V3/R1, Jamba
LLaMA-3 (<|eot_id|>) LLaMA-3, LLaMA-3.1, LLaMA-3.2, LLaMA-3.3
Gemma (<end_of_turn>) Gemma, Gemma 2, Gemma 3
Phi (<|end|>) Phi-3, Phi-3.5, Phi-4
Mistral (</s>) Mistral-7B, Mixtral
Command-R Cohere Command-R, Command-R+
Completion StarCoder2, Falcon

Unknown models fall back to a generic template with a warning.

Output files (alongside the GGUF):

File Purpose
Modelfile Ollama-ready model definition
MODELFILE_README.md Guide explaining each Modelfile section
config.json Model architecture reference (copied from HF)
tokenizer_config.json Chat template & tokens reference (copied from HF)
generation_config.json Generation params reference (copied from HF)
special_tokens_map.json Special tokens reference (copied from HF)

These files are:

  • Uploaded to S3 at models/<org>/<model>/
  • Saved locally when using --keep-quantized
  • Logged at DEBUG level when using --verbose

To use the generated Modelfile with Ollama:

cd models/google/gemma-3-1b-it/
ollama create gemma3-1b -f Modelfile
ollama run gemma3-1b

The pipeline also writes a URL registry to model_urls.json locally and mirrors it to s3://<bucket>/metadata/model_urls.json. Each entry includes a download_url and a curl download reference.


mdl bootstrap-llamacpp

Clone, build, and extract the required llama.cpp binaries. Requires git, cmake, and make.

mdl bootstrap-llamacpp

This fetches llama.cpp from GitHub, builds it with cmake + make, and copies the required binaries and headers to a llama.cpp-dist/ directory. If llama.cpp/ already exists, the clone step is skipped.

This is a prerequisite for mdl convert, mdl quantize, and mdl pipeline.


mdl --version

Print the installed version.

mdl --version

Environment Variables

All variables are set in .env (see .env.example).

Hugging Face

Variable Description Default
HF_TOKEN Auth token for private/gated models
HF_ENDPOINT Custom HF endpoint (mirror/proxy) https://huggingface.co
HF_HOME HF cache directory ~/.cache/huggingface/
HF_HUB_DOWNLOAD_TIMEOUT Download timeout in seconds 120
HF_HUB_ETAG_TIMEOUT ETag timeout in seconds 10
HF_HUB_ENABLE_HF_TRANSFER Enable fast transfers (requires hf_transfer) 0

MinIO / S3

Variable Description Default
MINIO_ENDPOINT S3 endpoint (host:port)
MINIO_ACCESS_KEY Access key
MINIO_SECRET_KEY Secret key
MINIO_BUCKET Target bucket models
MINIO_SECURE Use HTTPS true
MINIO_PUBLIC_URL Public base URL for downloads
MINIO_PRESIGN_DAYS Presigned URL expiry in days 7

llama.cpp

Variable Description Default
LLAMA_CPP_DIR Path to llama.cpp directory llama.cpp

How It Works

  1. Load environment — reads .env before any HF imports
  2. Parse config — loads YAML and builds the model list
  3. Validate — checks credentials, disk space, and config structure
  4. Process models — download, convert, quantize, and upload each model
  5. Track state — persists progress to .download_state.json / .pipeline_state.json
  6. Handle errors — logs failures per model and continues with the rest
  7. Summarise — prints totals for successful, failed, and skipped models

Resume is automatic. Completed models are skipped on re-run. Use --clear-state to start fresh or --force to reprocess.

Troubleshooting

Problem Solution
RepositoryNotFoundError / GatedRepoError Set HF_TOKEN in .env. For gated models, accept terms on the model page first.
Downloads timing out Increase HF_HUB_DOWNLOAD_TIMEOUT in .env
Disk space errors Set HF_HOME to a larger drive, or use --min-disk-space
Slow downloads Install hf_transfer and set HF_HUB_ENABLE_HF_TRANSFER=1
Re-downloading completed models Don't delete .download_state.json. Use --clear-state only intentionally.

Development

uv sync --all-extras          # install dev dependencies
uv run pytest                 # run tests
uv run pytest --cov=mdl       # run tests with coverage

Project Structure

src/mdl/
├── __init__.py               # package version
├── cli/
│   ├── __init__.py           # Click group & subcommand registration
│   ├── bootstrap.py          # mdl bootstrap-llamacpp
│   ├── convert.py            # mdl convert
│   ├── download.py           # mdl download
│   ├── pipeline.py           # mdl pipeline
│   ├── quantize.py           # mdl quantize
│   └── upload.py             # mdl upload
└── core/
    ├── config.py             # env loading & logging setup
    ├── downloader.py         # HF Hub download logic & state
    ├── modelfile.py          # Ollama Modelfile generator
    ├── quantizer.py          # llama.cpp convert & quantize
    ├── uploader.py           # MinIO / S3 upload client
    └── url_manager.py        # model URL registry

License

See LICENSE.

Contributing

Contributions welcome. Please follow the existing code style, add tests for new features, and verify with --dry-run before submitting.

About

tool for downloading models from hugginface and converting them into gguf format and then uploading them to an s3 bucket

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages