Add Apple Silicon support via vllm-metal by ereid7 · Pull Request #2 · alchemystack/qr-sampler

ereid7 · 2026-03-01T01:43:59Z

Summary

Adds Apple Silicon (macOS) support via vllm-metal — same plugin, same API, no code fork
3-line fix in processor.py: always call .cpu() before .numpy() so MPS tensors convert correctly
README updated with full Apple Silicon setup guide, Open WebUI standalone instructions, and a note about the required vllm-metal PR #124

What changed

src/qr_sampler/processor.py — The _to_numpy() method previously only called .cpu() for CUDA tensors. MPS (Metal) tensors hit the else branch where .numpy() fails because the tensor isn't on CPU. Fix: always call .cpu() before .numpy() (no-op on CPU tensors).

README.md — Added:

Apple Silicon setup section with MLX-format model example (mlx-community/Qwen3-0.6B-4bit)
Verification step (check server logs for QRSamplerLogitsProcessor initialized)
Both /v1/completions and /v1/chat/completions curl examples
Prerequisite note about vllm-metal PR #124 (required for plugin discovery)
Open WebUI standalone Docker instructions for Apple Silicon
Split Web UI section into NVIDIA/Linux and Apple Silicon paths

.gitignore — Added .webui_secret_key (generated by Open WebUI, should not be committed)

Testing

308/308 unit tests pass
E2E verified on Apple Silicon: both /v1/completions and /v1/chat/completions return entropy-driven responses
Per-token sampling logs confirmed — full pipeline active (z-score, u-value, token selection via CDF)
Open WebUI verified working against vllm-metal server

Dependencies

Requires vllm-metal PR #124 to be merged for plugin discovery to work. Without it, vllm-metal silently skips custom logits processors. The PR is a 9-line patch that mirrors GPUModelRunner's existing pattern.

Always call .cpu() before .numpy() in _to_numpy() — MPS tensors are not on CPU and the previous CUDA-only check missed them. .cpu() is a no-op on CPU tensors so this is safe for all devices. Add Apple Silicon setup docs to README with vllm-metal install steps.

…PR #124 note

…omment

alchemystack · 2026-03-08T19:00:07Z

Great, thanks!

ereid7 added 3 commits February 28, 2026 13:10

docs: improve Apple Silicon setup with MLX models, verification, and …

8f013d9

…PR #124 note

docs: clarify Docker limitation on Apple Silicon and improve .cpu() c…

c4053fb

…omment

alchemystack approved these changes Mar 8, 2026

View reviewed changes

alchemystack merged commit beb1bbc into alchemystack:main Mar 8, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Apple Silicon support via vllm-metal#2

Add Apple Silicon support via vllm-metal#2
alchemystack merged 3 commits intoalchemystack:mainfrom
ereid7:fix/apple-silicon-support

ereid7 commented Mar 1, 2026

Uh oh!

alchemystack commented Mar 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ereid7 commented Mar 1, 2026

Summary

What changed

Testing

Dependencies

Uh oh!

alchemystack commented Mar 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants