Control your browser from your terminal.
buse is a stateless CLI designed for AI agents and automation scripts. It turns complex browser interaction into simple, structured command-line primitives.
- Stateless Control: Just point the CLI at a browser and go.
- Persistent Sessions: Multiple browser instances can run simultaneously.
- Universal Primitives: Click, type, scroll, and execute JS with one-liners.
- Vision-Ready:
observecaptures semantic state plus optional screenshots and SoM labels. - Session Migration: Export cookies/storage via
save-stateto maintain persistent logins.
Automating a browser usually means writing long, complex scripts or paying for expensive cloud services. buse changes that by letting you control a browser just like any other folder or file on your computer—using simple, one-word commands in your terminal.
For example, open a browser and navigate to a website:
uvx --python 3.12 buse browser-1
uvx --python 3.12 buse browser-1 navigate "https://example.com"
uvx --python 3.12 buse browser-2 # open a second browser
uvx --python 3.12 buse browser-2 search "latest tech news"
With uv:
uvx --python 3.12 buse --helpWith pip:
pip install buseFrom source:
cd buse
uv pip install -e .- Python 3.12
- Google Chrome (local install)
buse <instance_id> <command> [args]
| Command | Description | Example |
|---|---|---|
<id> |
Initialize/Start a new browser instance | buse b1 |
list |
Show all active browser instances | buse list |
stop |
Stop and kill a browser instance | buse b1 stop |
save-state |
Export cookies/storage to a file | buse b1 save-state cookies.json |
| Command | Description | Example |
|---|---|---|
observe |
Snapshot page state (visual + text modes) | buse b1 observe --visual som |
extract |
LLM extraction (set BUSE_EXTRACT_MODEL) |
buse b1 extract "get product info" |
- DOM indices are ephemeral; refresh with
buse <id> observeafter page changes, or use--id/--classfor stability. - Preferred flags are
--visual(som,omni,none),--text(ai,dom,none), and--mode(efficient,full,raw). --humanprints a human-friendly layout; JSON output is better for agents.- Legacy flags (
--screenshot,--omniparser,--som,--semantic,--no-dom,--diagnostics) are still supported for compatibility. observe --visual omnialways captures a screenshot: savesimage.jpg(input) andimage_som.jpg(server output) in the screenshots dir or--path.- When available,
screenshot_pathpoints toimage_som.jpg. OmniParserbboxvalues are in CSS pixels (not normalized). - Use
--text noneto skip DOM processing and return an emptydom_minified. --max-chars 0disables semantic truncation entirely.
| Command | Description | Example |
|---|---|---|
navigate |
Load a specific URL (supports --new-tab) |
buse b1 navigate "https://google.com" |
new-tab |
Open a URL in a new tab (alias for navigate --new-tab) |
buse b1 new-tab "https://example.com" |
search |
Search the web (engines: google, bing, duckduckgo) |
buse b1 search "query" --engine google |
click |
Click by index/ref (eN), selector, id/class, or coordinates (with modifiers) |
buse b1 click e3 --double |
input |
Type text into a field by index/ref (eN) or --id/--class (supports --slowly, --append, --submit) |
buse b1 input e3 "Hello" |
fill |
Fill multiple fields in one command (JSON payload) | buse b1 fill '[{"ref":"e1","value":"a"}]' |
drag |
Drag from one element to another (ref/index) | buse b1 drag e1 e2 |
upload-file |
Upload a file to an element by index | buse b1 upload-file 5 "./img.png" |
send-keys |
Send special keys or text (use --list-keys for names, optional focus with --index/--id/--class) |
buse b1 send-keys "Enter" |
find-text |
Scroll to specific text on the page | buse b1 find-text "Contact" |
dropdown-options |
List options for a select element by index or --id/--class |
buse b1 dropdown-options 12 |
select-dropdown |
Select dropdown option by visible text and index or --id/--class (use --text when no index) |
buse b1 select-dropdown 12 "Option" |
hover |
Hover over an element by index or --id/--class |
buse b1 hover 5 |
scroll |
Scroll page or a specific element (use --up or --down) |
buse b1 scroll --up --pages 2 |
refresh |
Reload the current page | buse b1 refresh |
go-back |
Go back in browser history | buse b1 go-back |
wait |
Wait by time, selector, text, or network idle | buse b1 wait 2 |
evaluate |
Execute custom JavaScript code | buse b1 evaluate "alert('Hi')" |
| Command | Description | Example |
|---|---|---|
switch-tab |
Switch by 4-char tab ID | buse b1 switch-tab "4D39" |
close-tab |
Close by 4-char tab ID | buse b1 close-tab "4D39" |
Global (all commands):
--format(json|toon, default:json),-falias--profile(default:false),-palias
Selected command flags:
observe:--visual,--text,--mode,--max-chars,--max-labels,--selector,--frame,--human,--path(legacy:--screenshot,--omniparser,--som,--semantic,--no-dom,--diagnostics)click:--selector,--id,--class,--x/--y,--right,--middle,--double,--ctrl/--shift/--alt/--meta,--force,--debuginput:--text,--id,--class,--slowly,--append,--submitfill: JSON list payload (positional)drag:--html5/--no-html5send-keys:--index,--id,--class,--list-keysscroll:--down/--up,--pages,--indexwait:--text,--selector,--network-idle,--timeout
# Start a session
buse b1
# Observe without screenshot (JSON)
buse b1 observe
# Observe with SoM labels and semantic text (JSON + image)
buse b1 observe --visual som --text ai
# Navigate and click by coordinates
buse b1 navigate "https://example.com"
buse b1 click --x 280 --y 220
# Click by ref/id/class fallback
buse b1 click e3
buse b1 click --id "submit-button"
buse b1 click --class "cta-primary"
# Input by id with explicit --text
buse b1 input --id "email" --text "test@example.com"
# Input slowly and submit
buse b1 input --id "email" --text "test@example.com" --slowly --submit
# Fill multiple fields atomically
buse b1 fill '[{"ref":"e1","value":"user"},{"ref":"e2","value":"pass","type":"text"}]'
# Drag and drop
buse b1 drag e1 e2
# Upload a file
buse b1 upload-file 5 "./image.png"
# Send special keys
buse b1 send-keys "Enter"
# Send keys to a focused element
buse b1 send-keys --id "search" "Hello"
# List send-keys names
buse b1 send-keys --list-keys
# Find and scroll to text
buse b1 find-text "Contact Us"
# Get dropdown options and select by text
buse b1 dropdown-options --id "country"
buse b1 select-dropdown --id "country" --text "Canada"
# Scroll and wait
buse b1 scroll --down --pages 1.5
buse b1 scroll --up --pages 1
buse b1 wait 2Expose the active browser instances via the Model Context Protocol.
buse mcp-server --host 0.0.0.0 --port 8000
--transportselectsstreamable-http(default),sse, orstdio.--namechanges the MCP server name,--stateless/--statefulcontrols HTTP mode, and--json-response/--no-json-responsetoggles JSON wrapping.--allow-remotepermits non-local clients (default: local-only).--auth-tokenrequiresAuthorization: Bearer <token>orX-Buse-Tokenfor HTTP requests.--format(json|toon, default:json),-falias.- Resources:
buse://sessionsreturns a list of session metadata (instance_id,cdp_url,user_data_dir).buse://session/{id}returns the metadata for a single session.
- Tools:
- Supports all CLI actions:
navigate,click,input_text,fill,drag,send_keys,scroll,switch_tab,close_tab,search,upload_file,find_text,dropdown_options,select_dropdown,go_back,hover,refresh,wait,save_state,extract,evaluate,stop_session,start_session,observe.
- Supports all CLI actions:
The mcp SDK ships with buse, so no extra installation is required.
--format json|toonto switch output format.--profile(or-p) includes timing data in the JSON response.
BUSE_EXTRACT_MODEL: model name forextract(default:gpt-4o-mini).OPENAI_API_KEY: required forextract.BUSE_KEEP_SESSION: set to1to keep the session open within a single process.BUSE_SELECTOR_CACHE_TTL: selector-map cache TTL in seconds (default:0, disabled).BUSE_REMOTE_ALLOW_ORIGINS: override Chrome--remote-allow-origins(default:http://localhost:<port>,http://127.0.0.1:<port>).BUSE_IMAGE_QUALITY: JPEG quality (1-100) for OmniParser images.BUSE_MCP_ALLOW_REMOTE: set to1to allow non-local MCP clients.BUSE_MCP_AUTH_TOKEN: require a Bearer or X-Buse-Token header for MCP HTTP access.
https://blog.google/innovation-and-ai/models-and-research/google-deepmind/gemini-computer-use-model/
https://www.anthropic.com/news/3-5-models-and-computer-use
https://docs.browser-use.com/introduction
- Support all operating systems: Windows, macOS, Linux (right now works on my 10.15 macOS and Windows 11)
- Add automation scripting examples
- Add e2e tests
- Add optional daemon for persistent background sessions