Control any phone from code. Android for all use cases, iOS for QA/testing.
A vanilla JS library that gives AI agents (or any code) control of mobile devices. Same patterns as barebrowse for web — agents learn one API, use it for both web and mobile.
No Appium. No Java server. No build step. Zero required dependencies.
| # | Module | Platform | Use case | What it does | Requirements |
|---|---|---|---|---|---|
| 1 | Core ADB | Android | QA, automation | Full screen control — accessibility tree snapshots, tap/type/swipe by ref, screenshots, app lifecycle | adb in PATH, USB debugging enabled |
| 2 | Termux ADB | Android | QA, autonomous agents | Same full screen control, but runs on the phone itself — no host machine needed | Termux app, wireless debugging |
| 3 | Termux:API | Android | QA, autonomous agents | Direct Android APIs — SMS, calls, location, camera, clipboard, contacts, notifications. No screen control. | Termux + Termux:API app |
| 4 | iOS (WDA) | iOS | QA/testing only | Same snapshot() → tap(ref) as Android. Real accessibility tree via WDA, native element click, type, scroll, screenshots. |
WDA on device, USB cable, Python 3.12 (setup only) |
Modules 1 and 2 are the same API — one runs on a host machine, the other on the phone itself. Module 3 adds direct Android APIs (SMS, GPS, camera) and pairs with Module 2 for full autonomous agents. Module 4 brings the same ref-based pattern to iOS.
Who it's for: QA teams, automation engineers, AI agent builders who want to control Android devices from a host machine (laptop, server, CI runner).
How it connects: USB cable, WiFi, or emulator. Uses adb directly.
| Capability | How |
|---|---|
| Read the screen | page.snapshot() — pruned accessibility tree as YAML with [ref=N] markers |
| Tap elements | page.tap(5) — tap by ref number from snapshot |
| Type text | page.type(3, 'hello') — focus field + type |
| Navigate | page.back(), page.home(), page.press('enter') |
| Scroll | page.scroll(ref, 'down') — within any scrollable element |
| Launch apps | page.launch('com.android.settings') — by package name |
| Take screenshots | page.screenshot() — PNG buffer, ~0.5s |
| Deep link | page.intent('android.settings.BLUETOOTH_SETTINGS') |
| Wait for state | page.waitForText('Success', 5000) — poll until text appears |
| Vision fallback | page.tapXY(x, y) or page.tapGrid('C5') — when accessibility tree fails |
Run the interactive setup wizard — it handles adb install, SDK setup, and device connection:
baremobile setup # choose Android → pick your connection mode4 connection modes:
| Mode | Use case | What the wizard does |
|---|---|---|
| Emulator | QA/testing without a phone | Installs SDK (~3GB), creates AVD, launches emulator |
| USB | QA/testing with a phone | Checks adb, guides USB debugging setup, detects device |
| WiFi | Personal assistant | Interactive — enables USB debugging, detects device, runs adb tcpip, auto-detects IP, connects. Auto-reconnects on DHCP changes. |
| Termux | Autonomous on-device agent | Guides Termux package install + wireless debugging |
Minimum version: Android 10+ (2019 or newer).
Emulator note: The emulator uses Google APIs system image (includes Play Store for app installs).
Manual setup (if you prefer):
- Install Android SDK platform-tools (puts
adbin PATH) - On the phone: Settings > About phone > tap "Build number" 7 times > enable USB debugging
- Connect USB, tap "Allow" on the debugging prompt
- Verify:
adb devicesshows your device
import { connect } from 'baremobile';
const page = await connect();
console.log(await page.snapshot()); // see what's on screen
await page.tap(5); // tap element ref 5
await page.type(3, 'hello world'); // type into ref 3
await page.launch('com.whatsapp'); // open WhatsApp
await page.screenshot(); // PNG buffer
page.close();- ScrollView [ref=1]
- Group
- Text "Settings"
- ScrollView [ref=3]
- List
- Group [ref=4]
- Text "Network & internet"
- Text "Mobile, Wi-Fi, hotspot"
- Group [ref=5]
- Text "Connected devices"
- Text "Bluetooth, pairing"Compact, token-efficient. Only interactive elements get refs. Agent reads, picks a ref, acts.
Who it's for: Autonomous agents that run on the phone itself (via Termux). No host machine, no USB cable. The phone controls itself.
How it connects: ADB over localhost. Same commands, same API — serial is just localhost:PORT instead of a USB address.
Everything from Core ADB works identically. The only differences:
| Core ADB | Termux ADB | |
|---|---|---|
| Runs on | Host machine (laptop, server) | The phone itself (Termux) |
| Connection | USB cable or WiFi | Localhost (wireless debugging) |
| Connect call | connect() |
connect({ termux: true }) |
| Requires | adb on host |
android-tools in Termux |
# In Termux on the phone:
pkg install android-tools nodejs-lts
# Enable Wireless Debugging in Developer Options
# Pair (localhost works for pairing):
adb pair localhost:PAIR_PORT CODE
# Connect (must use device WiFi IP, not localhost):
adb connect <DEVICE_IP>:CONNECT_PORT
# Example: adb connect 192.168.1.42:38527import { connect } from 'baremobile';
const page = await connect({ termux: true }); // auto-detect localhost ADB
console.log(await page.snapshot());
await page.tap(5);An autonomous agent running on the phone itself. Reads its own screen, decides what to do, acts. Combine with Termux:API for full device access — screen control + SMS + location + camera in one agent.
Who it's for: Agents that need Android capabilities beyond the screen — send SMS, make calls, read GPS, take photos, manage clipboard. Works with or without screen control.
How it connects: termux-* CLI commands. No ADB involved. Talks directly to Android APIs through the Termux:API addon app.
| Function | What it does |
|---|---|
smsSend(number, text) |
Send an SMS |
smsList({limit, type}) |
Read SMS inbox |
call(number) |
Initiate a phone call |
location({provider}) |
Get GPS/network location |
cameraPhoto(file) |
Capture a photo (JPEG) |
clipboardGet() / clipboardSet(text) |
Read/write clipboard |
contactList() |
List all contacts as JSON |
notify(title, content) |
Show a notification |
batteryStatus() |
Battery level, charging state |
volumeGet() / volumeSet(stream, vol) |
Read/set volume |
wifiInfo() |
Connected network details |
torch(on) |
Flashlight on/off |
vibrate() |
Vibrate the device |
# Install Termux from F-Droid (NOT Google Play)
# Install Termux:API addon from F-Droid
# In Termux:
pkg install termux-api nodejs-ltsimport * as api from 'baremobile/src/termux-api.js';
await api.smsSend('+1555123456', 'Meeting at 3pm');
const battery = await api.batteryStatus(); // { percentage: 85, status: 'charging' }
const loc = await api.location(); // { latitude: 37.7749, longitude: -122.4194 }
await api.cameraPhoto('/tmp/photo.jpg'); // snap a photo
const contacts = await api.contactList(); // all contactsThe real power is both together — screen control + direct APIs:
import { connect } from 'baremobile';
import * as api from 'baremobile/src/termux-api.js';
const page = await connect({ termux: true });
// Read a message on screen, then send a reply via SMS API
const snapshot = await page.snapshot();
// ... agent decides to reply ...
await api.smsSend('+1555123456', 'Got it, on my way');
// Check location, then search for it in Maps
const loc = await api.location();
await page.launch('com.google.android.apps.maps');Who it's for: QA teams wanting iPhone control from Linux — no Mac, no Xcode at runtime. Same snapshot() / tap(ref) pattern as Android, backed by WDA over HTTP.
Status: Full ref-based control working — accessibility tree, tap, type, scroll, swipe, screenshots, app lifecycle, unlock.
Important: iOS is QA/testing only. USB cable required — the WDA process depends on a USB tunnel (RemoteXPC) that cannot be established over WiFi without Xcode. For autonomous/personal-assistant use cases, use Android.
| Capability | How |
|---|---|
| Read the screen | page.snapshot() — hierarchical YAML with [ref=N] markers (same format as Android) |
| Tap elements | page.tap(1) — coordinate tap at bounds center |
| Type text | page.type(2, 'hello') — coordinate tap to focus + WDA keys |
| Navigate | page.back() (finds back button in NavBar), page.home() |
| Scroll | page.scroll(ref, 'down') — coordinate-based swipe within bounds |
| Launch apps | page.launch('com.apple.Preferences') — by bundle ID |
| Take screenshots | page.screenshot() — PNG buffer |
| Wait for state | page.waitForText('Settings', 5000) — poll until text appears |
| Unlock device | page.unlock(passcode) — unlock with passcode |
| Find by text | page.findByText('Melanie') — returns ref for a text match (no device call) |
| Scale factor | page.scaleFactor — Retina scale (e.g., 3 for iPhone 15). page.screenshotToPoint(px, py) converts screenshot pixels to logical points for tapXY(). |
import { connect } from 'baremobile/src/ios.js';
const page = await connect();
console.log(await page.snapshot());
// - App
// - Window
// - NavBar "Settings"
// - Text "Settings"
// - List [ref=1]
// - Cell [ref=2] "Wi-Fi"
// - Cell [ref=3] "Bluetooth"
await page.tap(2); // coordinate tap at bounds center
await page.waitForText('Wi-Fi', 10000); // verify navigation
await page.type(4, 'network-name'); // type into search field
const png = await page.screenshot(); // visual verification
page.close();WDA XML is translated to a common node tree, then run through the same prune/format pipeline as Android — identical YAML output. Custom-UI elements (e.g., Telegram chat rows rendered as XCUIElementTypeOther) get refs when iOS marks them accessible="true" with visible text. Snapshot cleanup: keyboard subtrees stripped (agent uses type()), Unicode directional markers removed, iOS file paths stripped, internal class names filtered.
WDA XML → translateWda() → cleanText + strip keyboard/paths → node tree → prune() → formatTree() → YAML
Actions use W3C Actions API touch sequences at element bound coordinates — more reliable than WDA's /wda/tap endpoint, which silently fails on some elements. At runtime, all communication is pure HTTP to WDA. Python (pymobiledevice3) is only needed during setup for the USB tunnel, DDI mount, and WDA launch. The MCP server auto-reconnects if WDA dies mid-session, and auto-restarts WDA on second failure — tier-1 restarts just WDA in ~3 seconds using stored RSD (no pkexec popup, no manual intervention), tier-2 falls back to full tunnel restart if needed.
| Requirement | Why |
|---|---|
| WDA on device | Signed with free Apple ID (7-day cert, re-sign weekly) |
| USB cable | WiFi tunnel requires Mac/Xcode — not possible on Linux |
| Developer Mode on iPhone | Required for developer services |
| pymobiledevice3 | Setup only — tunnel, DDI mount, WDA launch. Python 3.12. |
| AltServer-Linux | Re-signing WDA cert (placed at .wda/AltServer) |
What you DON'T need: No Mac, no Xcode, no Bluetooth adapter, no Python at runtime.
baremobile setup # interactive wizard — option 2 (iOS from scratch) or option 3 (start WDA)
baremobile ios resign # re-sign WDA cert (7-day Apple free cert, interactive)
baremobile ios teardown # kill tunnel/WDA/forward processesSmart detection: Option 2 checks if WDA is already installed with a valid cert. If so, it skips the install and offers to start the server directly. Previous tunnel/WDA processes are automatically cleaned up before starting new ones.
Free Apple ID certs expire after 7 days. The MCP server auto-warns when the cert is >6 days old.
Create structured test plans per app using the template at test/ios-test-plan.template.md:
cp test/ios-test-plan.template.md test/plans/whatsapp.md
# Edit with app-specific scenariosEach plan includes a navigation map — the app's top-level structure (tabs, screens, key elements) documented once so the agent doesn't waste tokens exploring. Then scenarios with steps and verify assertions.
Feed to any MCP client:
"Read test/plans/whatsapp.md and execute the test plan on iOS."
The agent reads the plan, launches the app, follows the steps using snapshot + tap, and verifies assertions. It adapts to unexpected states (popups, loading spinners) because it's using real snapshots — not hardcoded refs.
Tip: Run pymobiledevice3 apps list to discover all installed bundle IDs upfront.
All modules are also available via CLI (npx baremobile) and MCP server. The CLI starts a background daemon that holds a device session. For iOS, all MCP tools accept platform: "ios".
See the README for the full CLI command reference.
-> Core ADB. Connect via USB, run tests from your machine.
-> Termux ADB + Termux:API. Screen control + direct Android APIs, no host needed.
-> Termux:API. No screen control needed, direct API access.
-> iOS module. WDA-based — real element tree, native click, type, scroll. Same snapshot() / tap(ref) pattern as Android. USB required.
-> Core ADB for Android + iOS module for iPhone. Same agent, different devices.
Things your agent doesn't have to think about:
- Bloated UI trees — 4-step pruning: collapse wrappers, drop empty nodes, dedup list items, filter internal class names
- iOS snapshot noise — keyboard subtrees stripped, Unicode directional markers removed, file paths cleaned
- 200+ Android widget classes — mapped to 27 simple roles (Button, Text, TextInput, Image...)
- Text input quirks — API 35+ space handling, full shell character escaping (
~ # % ^ * { } [ ] ! ?and more) - Binary output corruption —
exec-outfor clean PNG bytes - Multi-device setups — every command threads device serial
- Element states —
[disabled],[checked],[focused],[selected]in snapshots - Vision fallback — when accessibility tree fails (Flutter, WebViews), use
screenshot()+tapXY()
| Gap | Why | Workaround |
|---|---|---|
| Login / auth | App tokens are hardware-bound | Agent logs in via UI |
| WebView content | Shallow accessibility tree | Vision fallback, CDP bridge planned |
| CAPTCHAs | No programmatic solve | Vision model or skip |
| Screen unlock | Needs unlocked screen | press('power') + swipe() + type() for PIN |
| Multi-touch | ADB supports single-point only | sendevent planned |