feat: Apple Silicon (Metal) deployment guide

## Description

llama.cpp supports Metal for Apple Silicon acceleration. We need a deployment guide and potentially a Dockerfile for macOS users.

## Requirements

- [ ] Test Qwen3.5-9B-Q6_K on Apple Silicon (M1/M2/M3/M4 with 16GB+ unified memory)
- [ ] Document bare-metal setup for macOS (Homebrew llama.cpp + Python services)
- [ ] Verify self-embeddings work via Metal backend
- [ ] Benchmark generation speed vs CUDA
- [ ] Add macOS section to `docs/SETUP.md`
- [ ] Consider Docker Desktop for Mac limitations (no GPU passthrough) — may need bare-metal-only guide

## Context

Many developers run macOS. Metal support in llama.cpp is mature. The main question is whether Docker or bare-metal is the right approach (Docker Desktop doesn't support GPU passthrough on macOS).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Apple Silicon (Metal) deployment guide #32

Description

Requirements

Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: Apple Silicon (Metal) deployment guide #32

Description

Description

Requirements

Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions