中文 | English
Apple's on-device ML power, exposed as a clean REST API — self-hosted, zero cloud, zero cost.
Apple 设备端 ML 能力,封装为简洁的 REST API —— 本地自托管,零云服务,零成本。
Stop paying per-API-call for image intelligence you can run for free on your own Mac.
Vision API is a developer tool that turns your Mac into a private image-intelligence server. It wraps Apple's native Vision framework — the same ML engine powering macOS Photos, Live Text, and Shortcuts — and exposes its capabilities as a plain HTTP/JSON API. Your Mac already has these models built in; this project just gives them a network interface.
The intended use case is local or intranet self-hosting: run it on a Mac you control, call it from your app, script, or CI pipeline, and never touch a third-party cloud service. Drop it on any Mac, hit an endpoint, get results. No API keys. No usage limits. No image data leaving your machine.
Built for indie developers and small teams who want production-quality image analysis without the cloud bill or the privacy trade-off.
# Three commands from zero to running:
git clone https://github.com/tdawn0-0/vision-api && cd vision-api
swift package resolve
swift run App
# → Server live at http://localhost:9493| Feature | Endpoint | macOS |
|---|---|---|
| 📄 OCR / Text Recognition | Extract printed & handwritten text from any image | 10.15+ |
| ✂️ Background Removal | Remove backgrounds with pixel-perfect subject masking | 12+ |
| 🎨 Aesthetics Scoring | Score photo quality (blur, exposure, composition) and detect utility images | 15+ |
| 🏷️ Auto Tagging / Classification | Get 1000+ semantic labels (dog, beach, food) with confidence scores |
10.15+ |
| 🔳 Barcode / QR Detection | Detect & decode QR codes, EAN-13, and 20+ other formats | 10.13+ |
📖 Interactive API docs available at
http://localhost:9493/Swagger/index.htmlonce running.
Why this beats a cloud API:
- 🔒 Private — images never leave your machine
- ⚡ Fast — no network round-trip, runs on Apple Neural Engine
- 💸 Free — no per-call pricing, no subscription
- 🧩 Simple —
multipart/form-dataupload, JSON response, done
- macOS (required — Vision framework is Apple-only)
- Swift toolchain (comes with Xcode or swift.org)
1. Clone & install dependencies:
git clone https://github.com/tdawn0-0/vision-api
cd vision-api
swift package resolve2. Start the server:
swift run AppThe server starts on http://localhost:9493 by default. All image endpoints accept multipart/form-data with a binary imageFile field.
Custom port — three ways, highest to lowest priority:
# CLI flag
swift run App serve --port 9493
# Environment variable
PORT=9493 swift run AppOnce running, open the Swagger UI for interactive docs and live testing:
http://localhost:9493/Swagger/index.html
Or send a quick request from the terminal:
curl -X POST http://localhost:9493/ocr \
-F "imageFile=@/path/to/image.png"- Text Recognition (OCR) —
VNRecognizeTextRequest - Background Removal —
VNGenerateForegroundInstanceMaskRequest - Image Aesthetics Scoring (macOS 15+) —
CalculateImageAestheticsScoresRequestoverallScore(-1 to 1): blur, exposure, color balance, compositionisUtility: separates artistic photos from screenshots / receipts / documents
- Image Classification / Auto Tagging (macOS 10.15+) —
VNClassifyImageRequest- 1000+ category labels, optional
confidenceThresholdandmaxResultsfilters
- 1000+ category labels, optional
- Barcode & QR Detection (macOS 10.13+) — QR, EAN-13, Code128, DataMatrix, and more
- Returns
payloadandsymbology(e.g.VNBarcodeSymbologyQR)
- Returns
- Saliency Heatmap / Smart Crop (macOS 10.15+) — attention & objectness-based cropping hints
- Image Similarity (macOS 10.15+) — feature vector comparison for reverse image search & dedup
- Face Detection & Landmarks (macOS 10.13+) — bounding boxes + 68-point facial keypoints
- Face Capture Quality (macOS 10.15+) — 0–1 quality score for ID photo validation
- Document Scanner (macOS 12+) — corner detection + perspective correction
- Human / Animal Detection (macOS 10.15+) — bounding boxes for people and pets
- Body & Hand Pose (macOS 11+) — 19-point body skeleton, 21-point hand keypoints
- Animal Body Pose (macOS 14+) — skeleton keypoints for cats & dogs
See the full Vision framework capability list for what's on the horizon.
Read this before deploying Vision API in any commercial or production context.
This project calls macOS system frameworks (primarily Apple Vision) that are licensed as part of the macOS operating system. A few things to keep in mind:
- Personal & development use — running this on your own Mac for personal projects or internal tooling is straightforward and the intended use case.
- Commercial SaaS / hosted service — if you plan to wrap Vision API into a paid product, expose it to external users, or build a business around it, you are responsible for ensuring your use complies with Apple's macOS Software License Agreement, your own jurisdiction's laws, and any relevant export regulations.
- Data & privacy — Vision API processes images locally by design, but if you deploy it on a shared or internet-facing server, you are responsible for any privacy obligations (GDPR, CCPA, etc.) that apply to the image data passing through it.
- No warranty — this project is provided as-is under the MIT license. The author(s) make no representations about fitness for any particular purpose.
The maintainers of this project accept no liability for any legal, regulatory, or commercial consequences arising from your use of Vision API. Commercial use is at your own risk.
All contributions are welcome — new endpoints, bug fixes, docs, or ideas. Open an issue to discuss or submit a pull request directly.