baremobile — Customer Guide

Control any phone from code. Android for all use cases, iOS for QA/testing.

What is baremobile?

A vanilla JS library that gives AI agents (or any code) control of mobile devices. Same patterns as barebrowse for web — agents learn one API, use it for both web and mobile.

No Appium. No Java server. No build step. Zero required dependencies.

Modules at a Glance

#	Module	Platform	Use case	What it does	Requirements
1	Core ADB	Android	QA, automation	Full screen control — accessibility tree snapshots, tap/type/swipe by ref, screenshots, app lifecycle	`adb` in PATH, USB debugging enabled
2	Termux ADB	Android	QA, autonomous agents	Same full screen control, but runs on the phone itself — no host machine needed	Termux app, wireless debugging
3	Termux:API	Android	QA, autonomous agents	Direct Android APIs — SMS, calls, location, camera, clipboard, contacts, notifications. No screen control.	Termux + Termux:API app
4	iOS (WDA)	iOS	QA/testing only	Same `snapshot()` → `tap(ref)` as Android. Real accessibility tree via WDA, native element click, type, scroll, screenshots.	WDA on device, USB cable, Python 3.12 (setup only)

Modules 1 and 2 are the same API — one runs on a host machine, the other on the phone itself. Module 3 adds direct Android APIs (SMS, GPS, camera) and pairs with Module 2 for full autonomous agents. Module 4 brings the same ref-based pattern to iOS.

Module 1: Core ADB — Full Screen Control

Who it's for: QA teams, automation engineers, AI agent builders who want to control Android devices from a host machine (laptop, server, CI runner).

How it connects: USB cable, WiFi, or emulator. Uses adb directly.

What your agent can do

Capability	How
Read the screen	`page.snapshot()` — pruned accessibility tree as YAML with `[ref=N]` markers
Tap elements	`page.tap(5)` — tap by ref number from snapshot
Type text	`page.type(3, 'hello')` — focus field + type
Navigate	`page.back()`, `page.home()`, `page.press('enter')`
Scroll	`page.scroll(ref, 'down')` — within any scrollable element
Launch apps	`page.launch('com.android.settings')` — by package name
Take screenshots	`page.screenshot()` — PNG buffer, ~0.5s
Deep link	`page.intent('android.settings.BLUETOOTH_SETTINGS')`
Wait for state	`page.waitForText('Success', 5000)` — poll until text appears
Vision fallback	`page.tapXY(x, y)` or `page.tapGrid('C5')` — when accessibility tree fails

Setup (one-time)

Run the interactive setup wizard — it handles adb install, SDK setup, and device connection:

baremobile setup    # choose Android → pick your connection mode

4 connection modes:

Mode	Use case	What the wizard does
Emulator	QA/testing without a phone	Installs SDK (~3GB), creates AVD, launches emulator
USB	QA/testing with a phone	Checks adb, guides USB debugging setup, detects device
WiFi	Personal assistant	Interactive — enables USB debugging, detects device, runs `adb tcpip`, auto-detects IP, connects. Auto-reconnects on DHCP changes.
Termux	Autonomous on-device agent	Guides Termux package install + wireless debugging

Minimum version: Android 10+ (2019 or newer).

Emulator note: The emulator uses Google APIs system image (includes Play Store for app installs).

Manual setup (if you prefer):

Install Android SDK platform-tools (puts adb in PATH)
On the phone: Settings > About phone > tap "Build number" 7 times > enable USB debugging
Connect USB, tap "Allow" on the debugging prompt
Verify: adb devices shows your device

Quick start

import { connect } from 'baremobile';

const page = await connect();
console.log(await page.snapshot());   // see what's on screen

await page.tap(5);                    // tap element ref 5
await page.type(3, 'hello world');    // type into ref 3
await page.launch('com.whatsapp');    // open WhatsApp
await page.screenshot();              // PNG buffer

page.close();

What the agent sees

- ScrollView [ref=1]
  - Group
    - Text "Settings"
  - ScrollView [ref=3]
    - List
      - Group [ref=4]
        - Text "Network & internet"
        - Text "Mobile, Wi-Fi, hotspot"
      - Group [ref=5]
        - Text "Connected devices"
        - Text "Bluetooth, pairing"

Compact, token-efficient. Only interactive elements get refs. Agent reads, picks a ref, acts.

Module 2: Termux ADB — On-Device Screen Control

Who it's for: Autonomous agents that run on the phone itself (via Termux). No host machine, no USB cable. The phone controls itself.

How it connects: ADB over localhost. Same commands, same API — serial is just localhost:PORT instead of a USB address.

What's different from Core ADB

Everything from Core ADB works identically. The only differences:

	Core ADB	Termux ADB
Runs on	Host machine (laptop, server)	The phone itself (Termux)
Connection	USB cable or WiFi	Localhost (wireless debugging)
Connect call	`connect()`	`connect({ termux: true })`
Requires	`adb` on host	`android-tools` in Termux

Setup

# In Termux on the phone:
pkg install android-tools nodejs-lts

# Enable Wireless Debugging in Developer Options
# Pair (localhost works for pairing):
adb pair localhost:PAIR_PORT CODE
# Connect (must use device WiFi IP, not localhost):
adb connect <DEVICE_IP>:CONNECT_PORT
# Example: adb connect 192.168.1.42:38527

Quick start

import { connect } from 'baremobile';

const page = await connect({ termux: true });  // auto-detect localhost ADB
console.log(await page.snapshot());
await page.tap(5);

Use case: bareagent

An autonomous agent running on the phone itself. Reads its own screen, decides what to do, acts. Combine with Termux:API for full device access — screen control + SMS + location + camera in one agent.

Module 3: Termux:API — Direct Android APIs

Who it's for: Agents that need Android capabilities beyond the screen — send SMS, make calls, read GPS, take photos, manage clipboard. Works with or without screen control.

How it connects: termux-* CLI commands. No ADB involved. Talks directly to Android APIs through the Termux:API addon app.

Capabilities

Function	What it does
`smsSend(number, text)`	Send an SMS
`smsList({limit, type})`	Read SMS inbox
`call(number)`	Initiate a phone call
`location({provider})`	Get GPS/network location
`cameraPhoto(file)`	Capture a photo (JPEG)
`clipboardGet()` / `clipboardSet(text)`	Read/write clipboard
`contactList()`	List all contacts as JSON
`notify(title, content)`	Show a notification
`batteryStatus()`	Battery level, charging state
`volumeGet()` / `volumeSet(stream, vol)`	Read/set volume
`wifiInfo()`	Connected network details
`torch(on)`	Flashlight on/off
`vibrate()`	Vibrate the device

Setup

# Install Termux from F-Droid (NOT Google Play)
# Install Termux:API addon from F-Droid
# In Termux:
pkg install termux-api nodejs-lts

Quick start

import * as api from 'baremobile/src/termux-api.js';

await api.smsSend('+1555123456', 'Meeting at 3pm');
const battery = await api.batteryStatus();    // { percentage: 85, status: 'charging' }
const loc = await api.location();             // { latitude: 37.7749, longitude: -122.4194 }
await api.cameraPhoto('/tmp/photo.jpg');      // snap a photo
const contacts = await api.contactList();     // all contacts

Combining with Termux ADB

The real power is both together — screen control + direct APIs:

import { connect } from 'baremobile';
import * as api from 'baremobile/src/termux-api.js';

const page = await connect({ termux: true });

// Read a message on screen, then send a reply via SMS API
const snapshot = await page.snapshot();
// ... agent decides to reply ...
await api.smsSend('+1555123456', 'Got it, on my way');

// Check location, then search for it in Maps
const loc = await api.location();
await page.launch('com.google.android.apps.maps');

Module 4: iOS — WebDriverAgent (WDA)

Who it's for: QA teams wanting iPhone control from Linux — no Mac, no Xcode at runtime. Same snapshot() / tap(ref) pattern as Android, backed by WDA over HTTP.

Status: Full ref-based control working — accessibility tree, tap, type, scroll, swipe, screenshots, app lifecycle, unlock.

Important: iOS is QA/testing only. USB cable required — the WDA process depends on a USB tunnel (RemoteXPC) that cannot be established over WiFi without Xcode. For autonomous/personal-assistant use cases, use Android.

What your agent can do

Capability	How
Read the screen	`page.snapshot()` — hierarchical YAML with `[ref=N]` markers (same format as Android)
Tap elements	`page.tap(1)` — coordinate tap at bounds center
Type text	`page.type(2, 'hello')` — coordinate tap to focus + WDA keys
Navigate	`page.back()` (finds back button in NavBar), `page.home()`
Scroll	`page.scroll(ref, 'down')` — coordinate-based swipe within bounds
Launch apps	`page.launch('com.apple.Preferences')` — by bundle ID
Take screenshots	`page.screenshot()` — PNG buffer
Wait for state	`page.waitForText('Settings', 5000)` — poll until text appears
Unlock device	`page.unlock(passcode)` — unlock with passcode
Find by text	`page.findByText('Melanie')` — returns ref for a text match (no device call)
Scale factor	`page.scaleFactor` — Retina scale (e.g., 3 for iPhone 15). `page.screenshotToPoint(px, py)` converts screenshot pixels to logical points for `tapXY()`.

Quick start

import { connect } from 'baremobile/src/ios.js';

const page = await connect();
console.log(await page.snapshot());
// - App
//   - Window
//     - NavBar "Settings"
//       - Text "Settings"
//     - List [ref=1]
//       - Cell [ref=2] "Wi-Fi"
//       - Cell [ref=3] "Bluetooth"

await page.tap(2);                         // coordinate tap at bounds center
await page.waitForText('Wi-Fi', 10000);    // verify navigation
await page.type(4, 'network-name');        // type into search field
const png = await page.screenshot();       // visual verification
page.close();

Architecture

WDA XML is translated to a common node tree, then run through the same prune/format pipeline as Android — identical YAML output. Custom-UI elements (e.g., Telegram chat rows rendered as XCUIElementTypeOther) get refs when iOS marks them accessible="true" with visible text. Snapshot cleanup: keyboard subtrees stripped (agent uses type()), Unicode directional markers removed, iOS file paths stripped, internal class names filtered.

WDA XML  →  translateWda()  →  cleanText + strip keyboard/paths  →  node tree  →  prune()  →  formatTree()  →  YAML

Actions use W3C Actions API touch sequences at element bound coordinates — more reliable than WDA's /wda/tap endpoint, which silently fails on some elements. At runtime, all communication is pure HTTP to WDA. Python (pymobiledevice3) is only needed during setup for the USB tunnel, DDI mount, and WDA launch. The MCP server auto-reconnects if WDA dies mid-session, and auto-restarts WDA on second failure — tier-1 restarts just WDA in ~3 seconds using stored RSD (no pkexec popup, no manual intervention), tier-2 falls back to full tunnel restart if needed.

Requirements

Requirement	Why
WDA on device	Signed with free Apple ID (7-day cert, re-sign weekly)
USB cable	WiFi tunnel requires Mac/Xcode — not possible on Linux
Developer Mode on iPhone	Required for developer services
pymobiledevice3	Setup only — tunnel, DDI mount, WDA launch. Python 3.12.
AltServer-Linux	Re-signing WDA cert (placed at `.wda/AltServer`)

What you DON'T need: No Mac, no Xcode, no Bluetooth adapter, no Python at runtime.

Setup

baremobile setup              # interactive wizard — option 2 (iOS from scratch) or option 3 (start WDA)
baremobile ios resign         # re-sign WDA cert (7-day Apple free cert, interactive)
baremobile ios teardown       # kill tunnel/WDA/forward processes

Smart detection: Option 2 checks if WDA is already installed with a valid cert. If so, it skips the install and offers to start the server directly. Previous tunnel/WDA processes are automatically cleaned up before starting new ones.

Free Apple ID certs expire after 7 days. The MCP server auto-warns when the cert is >6 days old.

Test plans

Create structured test plans per app using the template at test/ios-test-plan.template.md:

cp test/ios-test-plan.template.md test/plans/whatsapp.md
# Edit with app-specific scenarios

Each plan includes a navigation map — the app's top-level structure (tabs, screens, key elements) documented once so the agent doesn't waste tokens exploring. Then scenarios with steps and verify assertions.

Feed to any MCP client:

"Read test/plans/whatsapp.md and execute the test plan on iOS."

The agent reads the plan, launches the app, follows the steps using snapshot + tap, and verifies assertions. It adapts to unexpected states (popups, loading spinners) because it's using real snapshots — not hardcoded refs.

Tip: Run pymobiledevice3 apps list to discover all installed bundle IDs upfront.

CLI and MCP

All modules are also available via CLI (npx baremobile) and MCP server. The CLI starts a background daemon that holds a device session. For iOS, all MCP tools accept platform: "ios".

See the README for the full CLI command reference.

Choosing the Right Module

"I want to automate Android UI testing from my laptop"

-> Core ADB. Connect via USB, run tests from your machine.

"I want an AI agent that lives on the phone and acts autonomously"

-> Termux ADB + Termux:API. Screen control + direct Android APIs, no host needed.

"I just need to send SMS or read GPS from code"

-> Termux:API. No screen control needed, direct API access.

"I want to test iOS apps from Linux"

-> iOS module. WDA-based — real element tree, native click, type, scroll. Same snapshot() / tap(ref) pattern as Android. USB required.

"I want cross-platform test suites"

-> Core ADB for Android + iOS module for iPhone. Same agent, different devices.

What baremobile handles for you

Things your agent doesn't have to think about:

Bloated UI trees — 4-step pruning: collapse wrappers, drop empty nodes, dedup list items, filter internal class names
iOS snapshot noise — keyboard subtrees stripped, Unicode directional markers removed, file paths cleaned
200+ Android widget classes — mapped to 27 simple roles (Button, Text, TextInput, Image...)
Text input quirks — API 35+ space handling, full shell character escaping (~ # % ^ * { } [ ] ! ? and more)
Binary output corruption — exec-out for clean PNG bytes
Multi-device setups — every command threads device serial
Element states — [disabled], [checked], [focused], [selected] in snapshots
Vision fallback — when accessibility tree fails (Flutter, WebViews), use screenshot() + tapXY()

What still needs the agent

Gap	Why	Workaround
Login / auth	App tokens are hardware-bound	Agent logs in via UI
WebView content	Shallow accessibility tree	Vision fallback, CDP bridge planned
CAPTCHAs	No programmatic solve	Vision model or skip
Screen unlock	Needs unlocked screen	`press('power')` + `swipe()` + `type()` for PIN
Multi-touch	ADB supports single-point only	`sendevent` planned

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

baremobile — Customer Guide

What is baremobile?

Modules at a Glance

Module 1: Core ADB — Full Screen Control

What your agent can do

Setup (one-time)

Quick start

What the agent sees

Module 2: Termux ADB — On-Device Screen Control

What's different from Core ADB

Setup

Quick start

Use case: bareagent

Module 3: Termux:API — Direct Android APIs

Capabilities

Setup

Quick start

Combining with Termux ADB

Module 4: iOS — WebDriverAgent (WDA)

What your agent can do

Quick start

Architecture

Requirements

Setup

Test plans

CLI and MCP

Choosing the Right Module

"I want to automate Android UI testing from my laptop"

"I want an AI agent that lives on the phone and acts autonomously"

"I just need to send SMS or read GPS from code"

"I want to test iOS apps from Linux"

"I want cross-platform test suites"

What baremobile handles for you

What still needs the agent

Links

FilesExpand file tree

customer-guide.md

Latest commit

History

customer-guide.md

File metadata and controls

baremobile — Customer Guide

What is baremobile?

Modules at a Glance

Module 1: Core ADB — Full Screen Control

What your agent can do

Setup (one-time)

Quick start

What the agent sees

Module 2: Termux ADB — On-Device Screen Control

What's different from Core ADB

Setup

Quick start

Use case: bareagent

Module 3: Termux:API — Direct Android APIs

Capabilities

Setup

Quick start

Combining with Termux ADB

Module 4: iOS — WebDriverAgent (WDA)

What your agent can do

Quick start

Architecture

Requirements

Setup

Test plans

CLI and MCP

Choosing the Right Module

"I want to automate Android UI testing from my laptop"

"I want an AI agent that lives on the phone and acts autonomously"

"I just need to send SMS or read GPS from code"

"I want to test iOS apps from Linux"

"I want cross-platform test suites"

What baremobile handles for you

What still needs the agent

Links