macos-vision

Apple Vision for Node.js — native, fast, offline, no API keys required.

Uses macOS's built-in Vision framework via a compiled Swift binary. Works completely offline. No cloud services, no API keys, no Python, zero runtime dependencies.

Requirements

macOS 12+
Node.js 18+
Xcode Command Line Tools

xcode-select --install

Installation

npm install macos-vision

The native Swift binary is compiled automatically on install.

What this is (and isn't)

macos-vision gives you raw Apple Vision results — text, coordinates, bounding boxes, labels.

It is not a document pipeline. It does not:

Convert PDFs or images to Markdown
Understand document structure (headings, tables, paragraphs)
Chain multiple detections into a final report

For those use cases, use the raw output as input to an LLM or a post-processing layer of your own.

CLI

# OCR — plain text (default)
npx macos-vision photo.jpg

# Structured OCR blocks with bounding boxes
npx macos-vision --blocks photo.jpg

# Detect faces
npx macos-vision --faces photo.jpg

# Detect barcodes and QR codes
npx macos-vision --barcodes photo.jpg

# Detect rectangular shapes
npx macos-vision --rectangles photo.jpg

# Find document boundary
npx macos-vision --document photo.jpg

# Classify image content
npx macos-vision --classify photo.jpg

# Run all detections at once
npx macos-vision --all photo.jpg

Multiple flags can be combined: npx macos-vision --blocks --faces --classify photo.jpg

Structured results are printed as JSON to stdout.

API

import { ocr, detectFaces, detectBarcodes, detectRectangles, detectDocument, classify } from 'macos-vision'

// OCR — plain text
const text = await ocr('photo.jpg')

// OCR — structured blocks with bounding boxes
const blocks = await ocr('photo.jpg', { format: 'blocks' })

// Detect faces
const faces = await detectFaces('photo.jpg')

// Detect barcodes and QR codes
const codes = await detectBarcodes('invoice.jpg')

// Detect rectangular shapes (tables, forms, cards)
const rects = await detectRectangles('document.jpg')

// Find document boundary in a photo
const doc = await detectDocument('photo.jpg') // DocumentBounds | null

// Classify image content
const labels = await classify('photo.jpg')

// Layout inference — unified reading-order-sorted representation
const layout = inferLayout({ textBlocks: blocks, faces, barcodes: codes })
// layout is LayoutBlock[] — ready to feed into a Markdown renderer or LLM context

Layout inference

inferLayout merges raw Vision results into a unified LayoutBlock[] sorted in reading order (top-to-bottom, left-to-right). Text blocks are grouped into lines and paragraphs using geometric heuristics.

import { ocr, detectFaces, detectBarcodes, inferLayout } from 'macos-vision';

const blocks   = await ocr('page.png', { format: 'blocks' });
const faces    = await detectFaces('page.png');
const barcodes = await detectBarcodes('page.png');

const layout = inferLayout({ textBlocks: blocks, faces, barcodes });

for (const block of layout) {
  if (block.kind === 'text') {
    console.log(`[p${block.paragraphId} l${block.lineId}] ${block.text}`);
  } else {
    console.log(`[${block.kind}] at (${block.x.toFixed(2)}, ${block.y.toFixed(2)})`);
  }
}

LayoutBlock is a discriminated union — use block.kind to narrow the type:

`kind`	Extra fields
`'text'`	`text`, `lineId`, `paragraphId`
`'barcode'`	`value`, `type`
`'face'`	—
`'rectangle'`	—
`'document'`	—

Note: Layout inference is a heuristic layer. It does not understand multi-column layouts or rotated text. Treat it as structured input for downstream tools, not as ground truth.

API

`ocr(imagePath, options?)`

Extracts text from an image.

Parameter	Type	Default	Description
`imagePath`	`string`	—	Path to image (PNG, JPG, JPEG, WEBP)
`options.format`	`'text' \| 'blocks'`	`'text'`	Plain text or structured blocks with coordinates

Returns Promise<string> or Promise<VisionBlock[]>.

interface VisionBlock {
  text: string
  x: number       // 0–1 from left
  y: number       // 0–1 from top
  width: number   // 0–1
  height: number  // 0–1
}

`detectFaces(imagePath)`

Detects human faces and returns their bounding boxes.

interface Face {
  x: number; y: number; width: number; height: number
  confidence: number  // 0–1
}

`detectBarcodes(imagePath)`

Detects barcodes and QR codes and decodes their payload.

interface Barcode {
  type: string    // e.g. 'org.iso.QRCode', 'org.gs1.EAN-13'
  value: string   // decoded content
  x: number; y: number; width: number; height: number
}

`detectRectangles(imagePath)`

Finds rectangular shapes (documents, tables, cards, forms).

interface Rectangle {
  topLeft: [number, number]; topRight: [number, number]
  bottomLeft: [number, number]; bottomRight: [number, number]
  confidence: number
}

`detectDocument(imagePath)`

Finds the boundary of a document in a photo (e.g. paper on a desk). Returns null if no document is found.

interface DocumentBounds {
  topLeft: [number, number]; topRight: [number, number]
  bottomLeft: [number, number]; bottomRight: [number, number]
  confidence: number
}

`classify(imagePath)`

Returns top image classification labels with confidence scores.

interface Classification {
  identifier: string   // e.g. 'document', 'outdoor', 'animal'
  confidence: number   // 0–1
}

Why macos-vision?

	macos-vision	Tesseract.js	Cloud APIs
Offline	✅	✅	❌
No API key	✅	✅	❌
Native speed	✅	❌	—
Zero runtime deps	✅	❌	❌
OCR with bounding boxes	✅	✅	✅
Face detection	✅	❌	✅
Barcode / QR	✅	❌	✅
Document detection	✅	❌	✅
Image classification	✅	❌	✅
macOS only	✅	❌	❌

Apple Vision is the same engine used by macOS Spotlight, Live Text, and Shortcuts — highly optimized and accurate.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.husky		.husky
scripts		scripts
src		src
test		test
.gitignore		.gitignore
.npmignore		.npmignore
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
.release-it.json		.release-it.json
CHANGELOG.md		CHANGELOG.md
README.md		README.md
commitlint.config.js		commitlint.config.js
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

macos-vision

Requirements

Installation

What this is (and isn't)

CLI

API

Layout inference

API

`ocr(imagePath, options?)`

`detectFaces(imagePath)`

`detectBarcodes(imagePath)`

`detectRectangles(imagePath)`

`detectDocument(imagePath)`

`classify(imagePath)`

Why macos-vision?

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

macos-vision

Requirements

Installation

What this is (and isn't)

CLI

API

Layout inference

API

ocr(imagePath, options?)

detectFaces(imagePath)

detectBarcodes(imagePath)

detectRectangles(imagePath)

detectDocument(imagePath)

classify(imagePath)

Why macos-vision?

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`ocr(imagePath, options?)`

`detectFaces(imagePath)`

`detectBarcodes(imagePath)`

`detectRectangles(imagePath)`

`detectDocument(imagePath)`

`classify(imagePath)`

Packages