The automation layer between AI agents and iOS
Fast browser and native app automation via iOS Simulator. Built for AI agents — complete tasks in minimal API calls with a mark-and-click system that doesn't break.
Numbered marks on interactive elements — no vision model needed to know what to click
git clone https://github.com/JordanCoin/Atl.git
cd atl
./bin/atl-sim startAPI ready at http://localhost:9222.
Every page interaction is 3 calls:
goto → markAll → clickMark
That's it. No CSS selectors, no XPath, no DOM inspection. Just:
- Go to page
- Mark everything (get labels + text)
- Click by number
./bin/cart bestbuy "wireless earbuds"Returns:
{
"success": true,
"merchant": "bestbuy",
"step": "complete",
"warnings": [],
"cart": {
"items": 1,
"total": "$211.99"
},
"pdf": "Screenshots/bestbuy-cart-20251230-232143.pdf"
}5 API calls total. ~10 seconds per cart. ~360 carts/hour.
Always returns a PDF of final state → example cart PDF
Supported merchants: amazon, bestbuy, ebay, target, walmart, homedepot
curl -X POST http://localhost:9222/command \
-d '{"id":"1","method":"markAll"}'Returns every interactive element with a label number:
{
"result": {
"count": 153,
"elements": [
{"label": 0, "text": "Home", "href": "..."},
{"label": 25, "text": "Add to cart", "selector": "button.add-to-cart"},
...
]
}
}Then click by label:
curl -X POST http://localhost:9222/command \
-d '{"id":"2","method":"clickMark","params":{"label":25}}'Programmatic - search the JSON:
curl ... -d '{"method":"markAll"}' | jq '[.result.elements[] | select(.text | test("add to cart"; "i"))]'Visual - look at the PDF:
curl ... -d '{"method":"screenshot","params":{"fullPage":true}}' | jq -r '.result.data' | base64 -d > page.pdfThe PDF shows numbered labels on every element.
| Command | Parameters | Description |
|---|---|---|
goto |
{url} |
Navigate to URL |
reload |
- | Reload page |
back |
- | Go back |
forward |
- | Go forward |
| Command | Parameters | Description |
|---|---|---|
markAll |
- | Label ALL elements on page (recommended) |
markElements |
- | Label viewport-visible only |
clickMark |
{label} |
Click by label number |
unmarkElements |
- | Remove labels |
| Command | Parameters | Description |
|---|---|---|
fill |
{selector, value} |
Fill input field |
type |
{text} |
Type into focused element |
press |
{key} |
Press key (Enter, Tab, etc.) |
click |
{selector} |
Click by CSS selector |
| Command | Parameters | Description |
|---|---|---|
screenshot |
{fullPage?} |
PNG viewport or PDF full page |
captureLight |
- | Text + interactives only (~99% smaller) |
| Command | Parameters | Description |
|---|---|---|
querySelector |
{selector} |
Get element info |
querySelectorAll |
{selector} |
Get all matching elements |
waitForSelector |
{selector} |
Wait for element |
| Command | Parameters | Description |
|---|---|---|
tap |
{x, y} |
Tap at coordinates |
longPress |
{x, y, duration?} |
Long press (default 0.5s) |
swipe |
{direction} or {fromX, fromY, toX, toY} |
Swipe up/down/left/right |
pinch |
{scale} |
Pinch zoom (scale > 1 = zoom in) |
getMarkInfo |
{label} |
Get element coordinates by label |
Vision-free coordinate workflow:
# 1. Mark elements
curl -X POST http://localhost:9222/command -d '{"method":"markAll"}'
# 2. Get coordinates for element 26
curl -X POST http://localhost:9222/command -d '{"method":"getMarkInfo","params":{"label":26}}'
# → {"x":43, "y":539, "width":343, "height":44, "text":"Add to cart"}
# 3. Tap at center of element
curl -X POST http://localhost:9222/command -d '{"method":"tap","params":{"x":214,"y":561}}'- Searchable text - no OCR needed
- Full page - entire scrollable content
- Smaller - text-heavy pages compress well
- Multimodal - send directly to Claude/GPT-4V for analysis
Full skill with docs, helper scripts, and best practices: skill/
# Install from this repo
git clone https://github.com/JordanCoin/Atl.git
openclaw skills install ./Atl/core/skill
# Or if you already have the repo cloned
openclaw skills install ./skillThe skill includes:
- Complete API documentation
- Vision-free automation workflow (no GPT-4V costs!)
- Escalation ladder (coordinates → vision → JS injection)
- Bash helper functions (
atl_tap,atl_swipe,atl_mark, etc.) - Real-world e-commerce checkout patterns
The HTTP API is simple enough for any agent to use directly:
- Mark elements:
POST /commandwith{"method":"markAll"}→ get labeled elements - Find by text: Search the JSON for buttons/links you want
- Click by label:
{"method":"clickMark","params":{"label":25}} - Screenshot if stuck:
{"method":"screenshot"}→ see what's blocking you
See BROWSER-AUTOMATION.md for a quick-start guide, or dive into the full skill documentation.
ATL supports native iOS app automation alongside browser automation. Switch modes seamlessly within the same session.
| Command | Parameters | Description |
|---|---|---|
openApp |
{bundleId} |
Open native app, switch to native mode |
closeApp |
- | Close current app |
appState |
- | Get current mode and bundleId |
snapshot |
{interactiveOnly?, maxDepth?} |
Get accessibility tree with refs |
tapRef |
{ref} |
Tap element by ref (e.g., "e5") |
find |
{text, action?, by?, value?} |
Find element and optionally act |
openBrowser |
- | Switch back to browser mode |
# Open Settings
curl -X POST localhost:9222/command \
-d '{"method":"openApp","params":{"bundleId":"com.apple.Preferences"}}'
# Get accessibility snapshot
curl -X POST localhost:9222/command \
-d '{"method":"snapshot","params":{"interactiveOnly":true}}'
# → {"count":42,"elements":[{"ref":"e0","type":"cell","label":"Wi-Fi",...},...]}
# Find and tap Wi-Fi
curl -X POST localhost:9222/command \
-d '{"method":"find","params":{"text":"Wi-Fi","action":"tap"}}'
# Or tap by ref directly
curl -X POST localhost:9222/command \
-d '{"method":"tapRef","params":{"ref":"e5"}}'
# Switch back to browser
curl -X POST localhost:9222/command \
-d '{"method":"openBrowser"}'These commands work identically in browser and native mode:
| Command | Description |
|---|---|
tap |
Tap at x,y coordinates |
longPress |
Long press at coordinates |
swipe |
Swipe in direction |
pinch |
Pinch zoom |
screenshot |
Capture screen |
No App Store downloads needed — use these system apps:
| App | Bundle ID |
|---|---|
| Settings | com.apple.Preferences |
| Contacts | com.apple.MobileAddressBook |
| Calculator | com.apple.calculator |
| Calendar | com.apple.mobilecal |
| Notes | com.apple.mobilenotes |
| Reminders | com.apple.reminders |
| Clock | com.apple.mobiletimer |
| Files | com.apple.DocumentsApp |
./bin/atl-sim start # Boot simulator, start server
./bin/atl-sim stop # Stop the app
./bin/atl-sim status # Check if running
./bin/cart <merchant> [search] # Run cart flowOn failure, you always get:
success: falsestep- where it failederror- what went wrongpdf- screenshot of failure state
{
"success": false,
"step": "find_add_to_cart",
"error": "No Add to Cart button found",
"pdf": "Screenshots/target-cart-20251230-232318.pdf"
}Look at the PDF to see what went wrong.
ATL runs an unauthenticated HTTP server on port 9222. This is designed for local development only.
- The server binds to
127.0.0.1(localhost) and rejects external connections - Never expose port 9222 to the network or internet
- Never run ATL on a shared machine with untrusted users
- The server has full control over the browser - treat it like a debug port
- macOS with Xcode (for iOS Simulator)
- No build step - pre-built app included
MIT