🔍 Vision API

中文 | English

🔍 Vision API

Apple's on-device ML power, exposed as a clean REST API — self-hosted, zero cloud, zero cost.

Apple 设备端 ML 能力，封装为简洁的 REST API —— 本地自托管，零云服务，零成本。

Stop paying per-API-call for image intelligence you can run for free on your own Mac.

Vision API is a developer tool that turns your Mac into a private image-intelligence server. It wraps Apple's native Vision framework — the same ML engine powering macOS Photos, Live Text, and Shortcuts — and exposes its capabilities as a plain HTTP/JSON API. Your Mac already has these models built in; this project just gives them a network interface.

The intended use case is local or intranet self-hosting: run it on a Mac you control, call it from your app, script, or CI pipeline, and never touch a third-party cloud service. Drop it on any Mac, hit an endpoint, get results. No API keys. No usage limits. No image data leaving your machine.

Built for indie developers and small teams who want production-quality image analysis without the cloud bill or the privacy trade-off.

# Three commands from zero to running:
git clone https://github.com/tdawn0-0/vision-api && cd vision-api
swift package resolve
swift run App
# → Server live at http://localhost:9493

✨ What It Can Do

Feature	Endpoint	macOS
📄 OCR / Text Recognition	Extract printed & handwritten text from any image	10.15+
✂️ Background Removal	Remove backgrounds with pixel-perfect subject masking	12+
🎨 Aesthetics Scoring	Score photo quality (blur, exposure, composition) and detect utility images	15+
🏷️ Auto Tagging / Classification	Get 1000+ semantic labels (`dog`, `beach`, `food`) with confidence scores	10.15+
🔳 Barcode / QR Detection	Detect & decode QR codes, EAN-13, and 20+ other formats	10.13+

📖 Interactive API docs available at http://localhost:9493/Swagger/index.html once running.

Why this beats a cloud API:

🔒 Private — images never leave your machine
⚡ Fast — no network round-trip, runs on Apple Neural Engine
💸 Free — no per-call pricing, no subscription
🧩 Simple — multipart/form-data upload, JSON response, done

🚀 Getting Started

Prerequisites

macOS (required — Vision framework is Apple-only)
Swift toolchain (comes with Xcode or swift.org)

Installation

1. Clone & install dependencies:

git clone https://github.com/tdawn0-0/vision-api
cd vision-api
swift package resolve

2. Start the server:

swift run App

The server starts on http://localhost:9493 by default. All image endpoints accept multipart/form-data with a binary imageFile field.

Custom port — three ways, highest to lowest priority:

# CLI flag
swift run App serve --port 9493

# Environment variable
PORT=9493 swift run App

Try It Out

Once running, open the Swagger UI for interactive docs and live testing:

http://localhost:9493/Swagger/index.html

Or send a quick request from the terminal:

curl -X POST http://localhost:9493/ocr \
  -F "imageFile=@/path/to/image.png"

🗺️ Roadmap

Available Now

Text Recognition (OCR) — VNRecognizeTextRequest
Background Removal — VNGenerateForegroundInstanceMaskRequest
Image Aesthetics Scoring (macOS 15+) — CalculateImageAestheticsScoresRequest
- overallScore (-1 to 1): blur, exposure, color balance, composition
- isUtility: separates artistic photos from screenshots / receipts / documents
Image Classification / Auto Tagging (macOS 10.15+) — VNClassifyImageRequest
- 1000+ category labels, optional confidenceThreshold and maxResults filters
Barcode & QR Detection (macOS 10.13+) — QR, EAN-13, Code128, DataMatrix, and more
- Returns payload and symbology (e.g. VNBarcodeSymbologyQR)

Coming Soon

Saliency Heatmap / Smart Crop (macOS 10.15+) — attention & objectness-based cropping hints
Image Similarity (macOS 10.15+) — feature vector comparison for reverse image search & dedup
Face Detection & Landmarks (macOS 10.13+) — bounding boxes + 68-point facial keypoints
Face Capture Quality (macOS 10.15+) — 0–1 quality score for ID photo validation
Document Scanner (macOS 12+) — corner detection + perspective correction
Human / Animal Detection (macOS 10.15+) — bounding boxes for people and pets
Body & Hand Pose (macOS 11+) — 19-point body skeleton, 21-point hand keypoints
Animal Body Pose (macOS 14+) — skeleton keypoints for cats & dogs

See the full Vision framework capability list for what's on the horizon.

⚖️ Legal Notice

Read this before deploying Vision API in any commercial or production context.

This project calls macOS system frameworks (primarily Apple Vision) that are licensed as part of the macOS operating system. A few things to keep in mind:

Personal & development use — running this on your own Mac for personal projects or internal tooling is straightforward and the intended use case.
Commercial SaaS / hosted service — if you plan to wrap Vision API into a paid product, expose it to external users, or build a business around it, you are responsible for ensuring your use complies with Apple's macOS Software License Agreement, your own jurisdiction's laws, and any relevant export regulations.
Data & privacy — Vision API processes images locally by design, but if you deploy it on a shared or internet-facing server, you are responsible for any privacy obligations (GDPR, CCPA, etc.) that apply to the image data passing through it.
No warranty — this project is provided as-is under the MIT license. The author(s) make no representations about fitness for any particular purpose.

The maintainers of this project accept no liability for any legal, regulatory, or commercial consequences arising from your use of Vision API. Commercial use is at your own risk.

🤝 Contributing

All contributions are welcome — new endpoints, bug fixes, docs, or ideas. Open an issue to discuss or submit a pull request directly.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Public		Public
Sources/App		Sources/App
Tests/AppTests		Tests/AppTests
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md
README_CN.md		README_CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 Vision API

✨ What It Can Do

🚀 Getting Started

Prerequisites

Installation

Try It Out

🗺️ Roadmap

Available Now

Coming Soon

⚖️ Legal Notice

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔍 Vision API

✨ What It Can Do

🚀 Getting Started

Prerequisites

Installation

Try It Out

🗺️ Roadmap

Available Now

Coming Soon

⚖️ Legal Notice

🤝 Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages