A native, blazingly fast macOS application automation CLI designed specifically for AI agents.
Similar to Anthropic's Computer Use, application-use empowers AI agents to operate desktop applications. However, instead of relying on heavy visual inference and imprecise (x, y) coordinate clicks from full-screen screenshots, application-use provides a textual understanding interface built directly on top of underlying macOS native APIs (Accessibility) and Apple Vision Framework.
By operating at the OS level, it retrieves a highly structured view of the UI instantly. This approach achieves superior speed, deterministic accuracy, and significantly better effects for LLMs navigating complex desktop interfaces.
- Native OS APIs: Uses macOS native Accessibility (
AXUIElement) for instant, reliable UI tree extraction and precise element interaction. - Vision Integration: Powered by Apple's built-in Vision framework for robust OCR and screen analysis, easily capturing and interacting with elements that standard APIs miss.
- AI-Optimized Interface: Converts visual spatial information into structured text representations (snapshots with alphabet hints like
JK). LLMs can instantly map these hints back to specific actions. - Fast & Lightweight: Built meticulously with Go and Swift (via CGO bridge) for maximum native performance without heavy dependencies.
Installs the native execution binary directly from NPM:
npm install -g application-use@latestAdd the skill to your AI coding assistant for richer context:
npx skills add qdore/application-useDependencies: Go (1.20+) and Xcode Command Line Tools.
git clone <repository-url>
cd application-use
# Build the Swift static bridge and Go executable
make clean && make# Search for installed apps
application-use search "safari"
# Open an application
application-use open --appName "Safari"
# Take a structural snapshot (returns an annotated accessibility tree for AI)
application-use snapshot --appName "Safari"
# Click an element by its hint letters (e.g., 'JK' returned from the snapshot)
application-use click JK --appName "Safari"
# Fill text into an element
application-use fill JK "hello world" --appName "Safari"
# Send keystrokes directly
application-use sendkey cmd+t --appName "Safari"
# Take a screenshot
application-use screenshot result.png --appName "Safari"
# Close the application
application-use close --appName "Safari"application-use open --appName <app> # Open a specific application
application-use snapshot --appName <app> # Snapshot the UI tree & overlay alphabet hints on interactive elements
application-use click <hint> --appName <app> # Click element by hint (e.g. 'JK'). Supports: --right, --double
application-use fill [hint] <text> --appName <app> # Fill text into an element or current focus (uses clipboard paste)
application-use sendkey <key> --appName <app> # Send a key combination (e.g., cmd+v, enter, esc)
application-use scroll <area> <dir> [px] --appName <app> # Scroll a specific UI block (up/down/left/right)
application-use screenshot [path] --appName <app> # Take a visual screenshot of the window
application-use search [query] # Search for installed applications
application-use close --appName <app> # Gracefully close the application
application-use upgrade # Check for updates and automatically upgradeInstead of relying on fragile coordinate clicks (x,y), application-use implements an LLM-friendly snapshot-and-interact pattern:
- Snapshot:
application-use snapshotqueries the native Accessibility tree and layers Vision OCR data. It assigns a deterministic 2-3 letter hint (likeAA,JK) to every clickable, editable, or readable element. - Understand: The LLM parses the printed structural tree to understand the UI layout.
(+)markers indicate pure text elements.(*)markers indicate elements discovered primarily via OCR.
- Interact: The LLM issues a command like
application-use click JKorapplication-use fill AB "admin". The CLI uses OS-level handles to instantly perform the action.
Because application-use interacts natively with OS controls and visual outputs, it requires two macOS permissions on the first run:
- Accessibility:
System Settings > Privacy & Security > Accessibility(to read the UI tree and inject clicks/keys) - Screen Recording:
System Settings > Privacy & Security > Screen Recording(required for Vision OCR and taking snapshots)
If permissions are missing, the CLI will output specialized error prompts instructing the user to enable them.
Currently, application-use is designed for macOS. We welcome pull requests to add support for other operating systems (Windows, Linux, etc.).