Skip to content

up: models already in the HuggingFace cache aren't detected (forces redundant re-download) #50

Description

@weklund

Summary

mlx-stack up refuses to start a tier whose model is already present in the standard HuggingFace cache (~/.cache/huggingface/hub), insisting on mlx-stack pull — even though the serve command it generates resolves from that cache fine.

Repro

  1. hf download mlx-community/Qwen3.5-9B-4bit (lands in ~/.cache/huggingface/hub)
  2. Stack tier: source: mlx-community/Qwen3.5-9B-4bit, model: qwen3.5-9b
  3. mlx-stack up

Actual

draft  qwen3.5-9b  8000  skipped
  Model 'qwen3.5-9b' not found locally. Run 'mlx-stack pull qwen3.5-9b' to download it.

Why it's a bug

up --dry-run shows the generated command is vllm-mlx serve mlx-community/Qwen3.5-9B-4bit ... — i.e. it serves by HF repo id, which vllm-mlx resolves from ~/.cache/huggingface/hub without anything in ~/.mlx-stack/models/. The presence check (core/stack_up.py, ~/.mlx-stack/models/<model_id|source-basename>) is stricter than the actual runtime requirement, so a usable, already-downloaded model is rejected and the user is told to re-download (~19 GB in my two-model case).

Workaround

ln -sfn ~/.cache/huggingface/hub/models--mlx-community--Qwen3.5-9B-4bit/snapshots/<hash> ~/.mlx-stack/models/Qwen3.5-9B-4bit

Suggested fix

Treat a model resolvable in the HF cache (via huggingface_hub) as local, or add an adopt/link path so existing downloads register without duplicating to ~/.mlx-stack/models/.

Environment

  • mlx-stack 0.3.8
  • vllm-mlx v0.2.6
  • mlx 0.31.1
  • macOS 26.2 (arm64), Apple M4 Pro, 64 GB
  • model: mlx-community/Qwen3.5-9B-4bit

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions