Summary
mlx-stack up refuses to start a tier whose model is already present in the standard HuggingFace cache (~/.cache/huggingface/hub), insisting on mlx-stack pull — even though the serve command it generates resolves from that cache fine.
Repro
hf download mlx-community/Qwen3.5-9B-4bit (lands in ~/.cache/huggingface/hub)
- Stack tier:
source: mlx-community/Qwen3.5-9B-4bit, model: qwen3.5-9b
mlx-stack up
Actual
draft qwen3.5-9b 8000 skipped
Model 'qwen3.5-9b' not found locally. Run 'mlx-stack pull qwen3.5-9b' to download it.
Why it's a bug
up --dry-run shows the generated command is vllm-mlx serve mlx-community/Qwen3.5-9B-4bit ... — i.e. it serves by HF repo id, which vllm-mlx resolves from ~/.cache/huggingface/hub without anything in ~/.mlx-stack/models/. The presence check (core/stack_up.py, ~/.mlx-stack/models/<model_id|source-basename>) is stricter than the actual runtime requirement, so a usable, already-downloaded model is rejected and the user is told to re-download (~19 GB in my two-model case).
Workaround
ln -sfn ~/.cache/huggingface/hub/models--mlx-community--Qwen3.5-9B-4bit/snapshots/<hash> ~/.mlx-stack/models/Qwen3.5-9B-4bit
Suggested fix
Treat a model resolvable in the HF cache (via huggingface_hub) as local, or add an adopt/link path so existing downloads register without duplicating to ~/.mlx-stack/models/.
Environment
- mlx-stack 0.3.8
- vllm-mlx v0.2.6
- mlx 0.31.1
- macOS 26.2 (arm64), Apple M4 Pro, 64 GB
- model:
mlx-community/Qwen3.5-9B-4bit
Summary
mlx-stack uprefuses to start a tier whose model is already present in the standard HuggingFace cache (~/.cache/huggingface/hub), insisting onmlx-stack pull— even though the serve command it generates resolves from that cache fine.Repro
hf download mlx-community/Qwen3.5-9B-4bit(lands in~/.cache/huggingface/hub)source: mlx-community/Qwen3.5-9B-4bit,model: qwen3.5-9bmlx-stack upActual
Why it's a bug
up --dry-runshows the generated command isvllm-mlx serve mlx-community/Qwen3.5-9B-4bit ...— i.e. it serves by HF repo id, which vllm-mlx resolves from~/.cache/huggingface/hubwithout anything in~/.mlx-stack/models/. The presence check (core/stack_up.py,~/.mlx-stack/models/<model_id|source-basename>) is stricter than the actual runtime requirement, so a usable, already-downloaded model is rejected and the user is told to re-download (~19 GB in my two-model case).Workaround
ln -sfn ~/.cache/huggingface/hub/models--mlx-community--Qwen3.5-9B-4bit/snapshots/<hash> ~/.mlx-stack/models/Qwen3.5-9B-4bitSuggested fix
Treat a model resolvable in the HF cache (via
huggingface_hub) as local, or add anadopt/link path so existing downloads register without duplicating to~/.mlx-stack/models/.Environment
mlx-community/Qwen3.5-9B-4bit