The automation layer between AI agents and iOS
Mobile browser and native app automation via iOS Simulator. Built for AI agents — start with coordinates, escalate to vision only when needed.
Numbered marks give you coordinates — vision API only when stuck
markElements → getMarkInfo → tap x,y → screenshot if stuck
openApp → snapshot → find/tapRef → screenshot if stuck
Tiered automation:
- Coordinates first — marks (browser) or refs (native) give you x,y without vision calls (90% of actions)
- Vision fallback — screenshot when stuck to see modals/blockers
- JS injection — direct DOM manipulation as last resort (browser only)
git clone https://github.com/JordanCoin/Atl.git
cd Atl/core
./bin/atl-sim start
# API ready at http://localhost:9222Swift CLI for agent-friendly automation:
cd core/cli && swift build -c release
.build/release/atl ping # Check server
.build/release/atl goto https://target.com # Navigate
.build/release/atl wait --for-selector ".product-grid" # Wait for element (SPA-safe)
.build/release/atl snapshot # Get marks + text
.build/release/atl click "Add to cart" # Click by text
.build/release/atl pdf -o cart.pdf # Capture PDFWait strategies (for SPAs like Target, Amazon):
--for-selector— Wait for CSS selector (most reliable)--for-text— Wait for text to appear--network 500— Wait for network idle (500ms quiet)- Default: DOM stability (500ms no mutations)
# Navigate
curl -X POST localhost:9222/command -d '{"method":"goto","params":{"url":"https://target.com"}}'
# Mark all interactive elements
curl -X POST localhost:9222/command -d '{"method":"markAll"}'
# Get coordinates for element #26
curl -X POST localhost:9222/command -d '{"method":"getMarkInfo","params":{"label":26}}'
# → {"x":43, "y":539, "text":"Add to cart"}
# Tap at exact coordinates
curl -X POST localhost:9222/command -d '{"method":"tap","params":{"x":214,"y":561}}'# Open Settings app
curl -X POST localhost:9222/command -d '{"method":"openApp","params":{"bundleId":"com.apple.Preferences"}}'
# Snapshot accessibility tree (get refs for all elements)
curl -X POST localhost:9222/command -d '{"method":"snapshot","params":{"interactiveOnly":true}}'
# Find and tap Wi-Fi
curl -X POST localhost:9222/command -d '{"method":"find","params":{"text":"Wi-Fi","action":"tap"}}'
# Switch back to browser
curl -X POST localhost:9222/command -d '{"method":"openBrowser"}'openclaw skills install ./core/skillFull skill documentation → includes:
- Vision-free automation workflow
- Escalation ladder (coordinates → vision → JS)
- Touch gestures (tap, swipe, pinch)
- Helper bash functions
- Best practices & troubleshooting
See BROWSER-AUTOMATION.md for quick-start guide.
- Full README - Complete API reference
- OpenClaw Skill - AI agent integration
- OpenAPI Spec - Machine-readable API
- macOS with Xcode (for iOS Simulator)
- That's it!
MIT - see LICENSE