Skip to content

feat: Apple Silicon (Metal) deployment guide #32

@itigges22

Description

@itigges22

Description

llama.cpp supports Metal for Apple Silicon acceleration. We need a deployment guide and potentially a Dockerfile for macOS users.

Requirements

  • Test Qwen3.5-9B-Q6_K on Apple Silicon (M1/M2/M3/M4 with 16GB+ unified memory)
  • Document bare-metal setup for macOS (Homebrew llama.cpp + Python services)
  • Verify self-embeddings work via Metal backend
  • Benchmark generation speed vs CUDA
  • Add macOS section to docs/SETUP.md
  • Consider Docker Desktop for Mac limitations (no GPU passthrough) — may need bare-metal-only guide

Context

Many developers run macOS. Metal support in llama.cpp is mature. The main question is whether Docker or bare-metal is the right approach (Docker Desktop doesn't support GPU passthrough on macOS).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions