Local OCR & image analysis for any MCP client — private, offline, no API keys.
Pre-extracts text and image data locally before your AI ever sees it — cutting token usage by ~97% on real documents. Files never leave your Mac: no cloud API, no API keys, no network requests.
- OCR for images and PDFs (JPG, PNG, HEIC, TIFF, multi-page PDF) via Apple Vision Framework.
- ~97% token reduction: a 44-page PDF costs ~2,400 tokens instead of ~73,500.
- Face detection, barcode/QR reading, and image classification — all on-device.
- Full document pipeline: OCR + faces + barcodes + rectangles in a single tool call.
- Works with Claude Code, Claude Desktop, and Cursor — any MCP-compatible client.
- No files uploaded to any server — processing stays entirely on your Mac.
- 100% offline after
npm install— powered by Apple Vision Framework, same engine as Live Text in Photos.app.
❌ Without macos-vision-mcp:
- Sending a 44-page PDF costs ~73,500 tokens
- Every image, invoice, or contract goes through a cloud API
- Sensitive documents leave your machine on every request
✅ With macos-vision-mcp:
- Local Apple Vision pre-extracts text before Claude ever sees it
- ~2,400 tokens for the same 44-page PDF — 97% fewer
- Files never leave your Mac
macos-vision-mcp acts as a local pre-processing layer between your documents and the cloud. Useful for:
- Legal documents, contracts, NDAs
- Financial reports, invoices, internal spreadsheets
- Medical records or any GDPR-sensitive content
- Any situation where you want to extract structured data locally before deciding what (if anything) to send upstream
Instead of sending the raw document to your AI, you extract the text and structure locally first. The model then works only with the extracted text — never the original file.
Step 1 — Install the package:
npm install -g macos-vision-mcpStep 2 — Add to your MCP client (example for Claude Code):
claude mcp add macos-vision-mcp -- macos-vision-mcpRestart your client. The tools appear automatically.
Note: The native module
macos-visioncompiles against your local Node.js at install time. If you switch Node versions, runnpm rebuildinside the package directory.
| Tool | What it does | Example prompt |
|---|---|---|
ocr_image |
Extract text from an image or PDF (JPG, PNG, HEIC, TIFF, PDF). Returns plain text or structured blocks with bounding boxes. | "Read the text from ~/Desktop/screenshot.png" |
detect_faces |
Detect human faces and return their count and positions. | "How many people are in this photo?" |
detect_barcodes |
Read QR codes, EAN, UPC, Code128, PDF417, Aztec, and other 1D/2D codes. | "What does the QR code in /tmp/qr.jpg say?" |
classify_image |
Classify image content into 1000+ categories with confidence scores. | "What is in this image?" |
analyze_document |
Full pipeline: OCR + faces + barcodes + rectangles in one call. | "Extract everything from this scanned invoice" |
Use the tool name explicitly in your prompt to guarantee local processing:
Extract text from an image or PDF:
Use ocr_image to extract text from ~/Desktop/invoice.pdf
Detect faces in a photo:
Use detect_faces on ~/Photos/team.jpg and tell me how many people are in it
Classify image content:
Use classify_image on ~/Downloads/unknown.jpg
Full document analysis (OCR + faces + barcodes in one call):
Use analyze_document on ~/Desktop/scan.pdf and extract everything you can find
claude mcp add macos-vision-mcp -- macos-vision-mcpEdit ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"macos-vision-mcp": {
"command": "macos-vision-mcp"
}
}
}Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"macos-vision-mcp": {
"command": "macos-vision-mcp"
}
}
}If you installed with npx rather than globally, replace "command": "macos-vision-mcp" with "command": "npx", "args": ["macos-vision-mcp"].
Contributions are welcome. Please follow Conventional Commits for commit messages — this project uses release-it with @release-it/conventional-changelog to automate releases.
git clone <repo>
cd macos-vision-mcp
npm install
npm run dev # watch modeMIT — Adrian Wolczuk