W3Pilot

Go browser automation library using WebDriver BiDi for real-time bidirectional communication with browsers, ideal for AI-assisted automation.

Overview

This project provides:

Component	Description
Go Client SDK	Programmatic browser control
MCP Server	169 tools across 24 namespaces for AI assistants
CLI	Command-line browser automation
Script Runner	Deterministic test execution
Session Recording	Capture actions as replayable scripts

Architecture

W3Pilot uses a dual-protocol architecture connecting to a single Chrome browser via both WebDriver BiDi and Chrome DevTools Protocol (CDP):

┌──────────────────────────────────────────────────────────────┐
│                          Applications                        │
├───────────────┬───────────────┬──────────────────────────────┤
│    w3pilot    │  w3pilot-mcp  │       Your Go App            │
│     (CLI)     │ (MCP Server)  │     import "w3pilot"         │
├───────────────┴───────────────┴──────────────────────────────┤
│                                                              │
│                        W3Pilot Go SDK                        │
│                 github.com/plexusone/w3pilot                 │
│                                                              │
│  ┌────────────────────────┐  ┌────────────────────────────┐  │
│  │      BiDi Client       │  │       CDP Client           │  │
│  │   (page automation)    │  │   (profiling/debugging)    │  │
│  │                        │  │                            │  │
│  │ • Navigation           │  │ • Heap snapshots           │  │
│  │ • Element interaction  │  │ • Network emulation        │  │
│  │ • Screenshots          │  │ • CPU throttling           │  │
│  │ • Tracing              │  │ • Code coverage            │  │
│  │ • Accessibility        │  │ • Console debugging        │  │
│  └───────────┬────────────┘  └─────────────┬──────────────┘  │
│              │                             │                 │
├──────────────┼─────────────────────────────┼─────────────────┤
│              ▼                             ▼                 │
│       WebDriver BiDi                Chrome DevTools          │
│       (stdio pipe)                  (CDP WebSocket)          │
├──────────────────────────────────────────────────────────────┤
│                       Chrome / Chromium                      │
└──────────────────────────────────────────────────────────────┘

Why Dual-Protocol?

W3Pilot combines two complementary protocols for complete browser control:

Protocol	Purpose	Strengths
WebDriver BiDi	Automation & Testing	Semantic selectors, real-time events, cross-browser potential, future-proof standard
Chrome DevTools Protocol	Inspection & Profiling	Heap profiling, network bodies, CPU/network emulation, coverage analysis

BiDi Client excels at:

Page automation (navigation, clicks, typing)
Semantic element finding (by role, label, text, testid)
Screenshots and accessibility trees
Tracing and session recording
Human-in-the-loop workflows (CAPTCHA, SSO)

CDP Client excels at:

Memory profiling (heap snapshots)
Network response body capture
Performance emulation (Slow 3G, CPU throttling)
Code coverage analysis
Low-level debugging

Both protocols connect to the same Chrome browser instance, allowing you to automate with BiDi while profiling with CDP simultaneously.

Protocol-Agnostic API

The SDK automatically handles protocol selection. Some methods try BiDi first and fall back to CDP when BiDi doesn't support the feature:

Method	Tries First	Falls Back To
`SetOffline()`	BiDi	CDP network emulation
`ConsoleMessages()`	BiDi	CDP console debugger
`ClearConsoleMessages()`	BiDi	CDP console debugger

Users call the same method regardless of which protocol is used internally. When BiDi support is added upstream, the SDK will automatically use it without requiring code changes.

Prerequisites

W3Pilot requires the Clicker binary, a WebDriver BiDi browser launcher from the Vibium project.

Install Clicker

Option 1: Download from GitHub Releases

Download the latest release for your platform from vibium/releases and add it to your PATH.

Option 2: Build from Source

git clone https://github.com/VibiumDev/vibium.git
cd vibium/clicker
go build -o clicker .
mv clicker /usr/local/bin/  # or add to PATH

Option 3: Set Environment Variable

If the binary is in a custom location:

export CLICKER_BIN_PATH=/path/to/clicker

Verify Installation

clicker --version

Browser Requirements

Clicker automatically manages Chrome/Chromium. If Chrome is not installed, download it from google.com/chrome.

Installation

go get github.com/plexusone/w3pilot

Quick Start

Go Client SDK

package main

import (
    "context"
    "log"

    "github.com/plexusone/w3pilot"
)

func main() {
    ctx := context.Background()

    // Launch browser
    pilot, err := w3pilot.Launch(ctx)
    if err != nil {
        log.Fatal(err)
    }
    defer pilot.Quit(ctx)

    // Navigate and interact
    pilot.Go(ctx, "https://example.com")

    link, _ := pilot.Find(ctx, "a", nil)
    link.Click(ctx, nil)
}

Session Management

Manage persistent browser sessions that can be reused across CLI commands:

import "github.com/plexusone/w3pilot/session"

// Create session manager
mgr := session.NewManager(session.Config{
    AutoReconnect: true,
})

// Get browser (launches if needed, reconnects if possible)
pilot, err := mgr.Pilot(ctx)

// Detach without closing browser
mgr.Detach()

// Later: reconnect to same browser
pilot, err = mgr.Pilot(ctx)

// When done: close browser
mgr.Close(ctx)

MCP Server

Start the MCP server for AI assistant integration:

w3pilot mcp --headless

Configure in Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "w3pilot": {
      "command": "w3pilot",
      "args": ["mcp", "--headless"]
    }
  }
}

CLI Commands

# Browser lifecycle
w3pilot browser launch --headless
w3pilot browser quit

# Page navigation and capture
w3pilot page navigate https://example.com
w3pilot page back
w3pilot page screenshot result.png
w3pilot page title

# Element interactions
w3pilot element fill "#email" "user@example.com"
w3pilot element click "#submit"
w3pilot element text "#result"

# Wait for conditions
w3pilot wait selector "#modal"
w3pilot wait url "**/dashboard"

# JavaScript execution
w3pilot js eval "document.title"

Script Runner

Execute deterministic test scripts:

w3pilot run test.json

Script format (JSON or YAML):

{
  "name": "Login Test",
  "steps": [
    {"action": "navigate", "url": "https://example.com/login"},
    {"action": "fill", "selector": "#email", "value": "user@example.com"},
    {"action": "fill", "selector": "#password", "value": "secret"},
    {"action": "click", "selector": "#submit"},
    {"action": "assertUrl", "expected": "https://example.com/dashboard"}
  ]
}

Feature Comparison

Client SDK

Feature	Status
Browser launch/quit	✅
Navigation (go, back, forward, reload)	✅
Element finding (CSS selectors)	✅
Click, type, fill	✅
Screenshots	✅
JavaScript evaluation	✅
Keyboard/mouse controllers	✅
Browser context management	✅
Network interception	✅
Tracing	✅
Clock control	✅

CDP Features (via Chrome DevTools Protocol)

Feature	Status
Heap snapshots	✅
Network emulation (Slow 3G, Fast 3G, 4G)	✅
CPU throttling	✅
Direct CDP command access	✅

Additional Features

Feature	Description
MCP Server	169 tools across 24 namespaces for AI-assisted automation
CLI	`w3pilot` command with subcommands
Script Runner	Execute JSON/YAML test scripts
Session Management	Persistent browser sessions with reconnection support
Session Recording	Capture MCP actions as replayable scripts
JSON Schema	Validated script format
Test Reporting	Structured test results with diagnostics

MCP Server Tools

The MCP server provides 169 tools across 24 namespaces. Export the full list as JSON with w3pilot mcp --list-tools.

Namespaces:

Namespace	Tools	Examples
`accessibility_`	1	`accessibility_snapshot`
`batch_`	1	`batch_execute`
`browser_`	2	`browser_launch`, `browser_quit`
`cdp_`	20	`cdp_take_heap_snapshot`, `cdp_run_lighthouse`, `cdp_start_coverage`
`config_`	1	`config_get`
`console_`	2	`console_get_messages`, `console_clear`
`dialog_`	2	`dialog_handle`, `dialog_get`
`element_`	33	`element_click`, `element_fill`, `element_get_text`, `element_is_visible`
`frame_`	2	`frame_select`, `frame_select_main`
`http_`	1	`http_request`
`human_`	1	`human_pause`
`input_`	12	`input_keyboard_press`, `input_mouse_click`, `input_touch_tap`
`js_`	4	`js_evaluate`, `js_add_script`, `js_add_style`, `js_init_script`
`network_`	6	`network_get_requests`, `network_route`, `network_set_offline`
`page_`	20	`page_navigate`, `page_go_back`, `page_screenshot`, `page_inspect`
`record_`	5	`record_start`, `record_stop`, `record_export`
`state_`	4	`state_save`, `state_load`, `state_list`, `state_delete`
`storage_`	17	`storage_get_cookies`, `storage_local_get`, `storage_session_set`
`tab_`	3	`tab_list`, `tab_select`, `tab_close`
`test_`	16	`test_assert_text`, `test_verify_value`, `test_generate_locator`
`trace_`	6	`trace_start`, `trace_stop`, `trace_chunk_start`
`video_`	2	`video_start`, `video_stop`
`wait_`	6	`wait_for_state`, `wait_for_url`, `wait_for_load`, `wait_for_text`
`workflow_`	2	`workflow_login`, `workflow_extract_table`

See docs/reference/mcp-tools.md for the complete reference.

Session Recording Workflow

Convert natural language test plans into deterministic scripts:

┌──────────────────┐     ┌──────────────────┐     ┌──────────────────┐
│  Markdown Test   │     │   LLM + MCP      │     │   JSON Script    │
│  Plan (English)  │ ──▶ │   (exploration)  │ ──▶ │ (deterministic)  │
└──────────────────┘     └──────────────────┘     └──────────────────┘

Write test plan in Markdown
LLM executes via MCP with record_start
LLM explores, finds selectors, handles edge cases
Export with record_export to get JSON
Run deterministically with w3pilot run

API Reference

See pkg.go.dev for full API documentation.

Key Types

// Launch browser
pilot, err := w3pilot.Launch(ctx)
pilot, err := w3pilot.LaunchHeadless(ctx)

// Navigation
pilot.Go(ctx, url)
pilot.Back(ctx)
pilot.Forward(ctx)
pilot.Reload(ctx)

// Finding elements by CSS selector
elem, err := pilot.Find(ctx, selector, nil)
elems, err := pilot.FindAll(ctx, selector, nil)

// Element interactions
elem.Click(ctx, nil)
elem.Fill(ctx, value, nil)
elem.Type(ctx, text, nil)

// Input controllers
pilot.Keyboard().Press(ctx, "Enter")
pilot.Mouse().Click(ctx, x, y)

// Capture
data, err := pilot.Screenshot(ctx)

Semantic Selectors

Find elements by accessibility attributes instead of brittle CSS selectors. This is especially useful for AI-assisted automation where element structure may change but semantics remain stable.

SDK Usage

// Find by ARIA role and text content
elem, err := pilot.Find(ctx, "", &w3pilot.FindOptions{
    Role: "button",
    Text: "Submit",
})

// Find by label (for form inputs)
elem, err := pilot.Find(ctx, "", &w3pilot.FindOptions{
    Label: "Email address",
})

// Find by placeholder
elem, err := pilot.Find(ctx, "", &w3pilot.FindOptions{
    Placeholder: "Enter your email",
})

// Find by data-testid (recommended for testing)
elem, err := pilot.Find(ctx, "", &w3pilot.FindOptions{
    TestID: "login-button",
})

// Combine CSS selector with semantic filtering
elem, err := pilot.Find(ctx, "form", &w3pilot.FindOptions{
    Role: "textbox",
    Label: "Password",
})

// Find all buttons
buttons, err := pilot.FindAll(ctx, "", &w3pilot.FindOptions{Role: "button"})

// Find element near another element
elem, err := pilot.Find(ctx, "", &w3pilot.FindOptions{
    Role: "button",
    Near: "#username-input",
})

MCP Tool Usage

Semantic selectors work with element_click, element_type, element_fill, and element_press tools:

// Click a button by role and text
{"name": "element_click", "arguments": {"role": "button", "text": "Sign In"}}

// Fill input by label
{"name": "element_fill", "arguments": {"label": "Email", "value": "user@example.com"}}

// Type in input by placeholder
{"name": "element_type", "arguments": {"placeholder": "Search...", "text": "query"}}

// Click by data-testid
{"name": "element_click", "arguments": {"testid": "submit-btn"}}

Available Selectors

Selector	Description	Example
`role`	ARIA role	`button`, `textbox`, `link`, `checkbox`
`text`	Visible text content	`"Submit"`, `"Learn more"`
`label`	Associated label text	`"Email address"`, `"Password"`
`placeholder`	Input placeholder	`"Enter email"`
`testid`	`data-testid` attribute	`"login-btn"`
`alt`	Image alt text	`"Company logo"`
`title`	Element title attribute	`"Close dialog"`
`xpath`	XPath expression	`"//button[@type='submit']"`
`near`	CSS selector of nearby element	`"#username"`

Init Scripts

Inject JavaScript that runs before any page scripts on every navigation. Useful for mocking APIs, injecting test helpers, or setting up authentication.

SDK Usage

// Add init script to inject before page scripts
err := pilot.AddInitScript(ctx, `window.testMode = true;`)

// Mock an API
err := pilot.AddInitScript(ctx, `
    window.fetch = async (url, opts) => {
        if (url.includes('/api/user')) {
            return { json: () => ({ id: 1, name: 'Test User' }) };
        }
        return originalFetch(url, opts);
    };
`)

CLI Usage

# Inject scripts when launching browser
w3pilot browser launch --init-script=./mock-api.js --init-script=./test-helpers.js

# Or with MCP server
w3pilot mcp --init-script=./mock-api.js --init-script=./test-helpers.js

MCP Tool Usage

{"name": "js_init_script", "arguments": {"script": "window.testMode = true;"}}

Storage State

Save and restore complete browser state including cookies, localStorage, and sessionStorage. Essential for maintaining login sessions across browser restarts.

SDK Usage

// Get complete storage state
state, err := pilot.StorageState(ctx)

// Save to file
jsonBytes, _ := json.Marshal(state)
os.WriteFile("auth-state.json", jsonBytes, 0600)

// Restore from file
var savedState w3pilot.StorageState
json.Unmarshal(jsonBytes, &savedState)
err := pilot.SetStorageState(ctx, &savedState)

// Clear all storage
err := pilot.ClearStorage(ctx)

MCP Tool Usage

// Save session
{"name": "storage_get_state"}

// Restore session
{"name": "storage_set_state", "arguments": {"state": "<json from storage_get_state>"}}

// Clear all storage
{"name": "storage_clear_all"}

Tracing

Record browser actions with screenshots and DOM snapshots for debugging and test creation.

SDK Usage

// Start tracing
tracing := pilot.Tracing()
err := tracing.Start(ctx, &w3pilot.TracingStartOptions{
    Screenshots: true,
    Snapshots:   true,
    Title:       "Login Flow Test",
})

// Perform actions...
pilot.Go(ctx, "https://example.com")
elem, _ := pilot.Find(ctx, "button", nil)
elem.Click(ctx, nil)

// Stop and save trace
data, err := tracing.Stop(ctx, nil)
os.WriteFile("trace.zip", data, 0600)

MCP Tool Usage

// Start trace
{"name": "trace_start", "arguments": {"screenshots": true, "title": "My Test"}}

// Stop and get trace data
{"name": "trace_stop", "arguments": {"path": "/tmp/trace.zip"}}

CDP Features (Chrome DevTools Protocol)

W3Pilot provides direct CDP access for advanced profiling and emulation that isn't available through WebDriver BiDi.

Heap Snapshots

Capture V8 heap snapshots for memory profiling:

// Take heap snapshot
snapshot, err := pilot.TakeHeapSnapshot(ctx, "/tmp/snapshot.heapsnapshot")
fmt.Printf("Snapshot: %s (%d bytes)\n", snapshot.Path, snapshot.Size)

// Load in Chrome DevTools: Memory tab → Load

Network Emulation

Simulate various network conditions:

import "github.com/plexusone/w3pilot/cdp"

// Throttle to Slow 3G
err := pilot.EmulateNetwork(ctx, cdp.NetworkSlow3G)

// Or use presets
err := pilot.EmulateNetwork(ctx, cdp.NetworkFast3G)
err := pilot.EmulateNetwork(ctx, cdp.Network4G)

// Custom conditions
err := pilot.EmulateNetwork(ctx, cdp.NetworkConditions{
    Latency:            100,  // ms
    DownloadThroughput: 500 * 1024,  // 500 KB/s
    UploadThroughput:   250 * 1024,  // 250 KB/s
})

// Clear emulation
err := pilot.ClearNetworkEmulation(ctx)

CPU Emulation

Simulate slower CPUs for performance testing:

import "github.com/plexusone/w3pilot/cdp"

// 4x CPU slowdown (mid-tier mobile)
err := pilot.EmulateCPU(ctx, cdp.CPU4xSlowdown)

// Other presets
err := pilot.EmulateCPU(ctx, cdp.CPU2xSlowdown)
err := pilot.EmulateCPU(ctx, cdp.CPU6xSlowdown)

// Clear emulation
err := pilot.ClearCPUEmulation(ctx)

Direct CDP Access

For advanced use cases, access the CDP client directly:

if pilot.HasCDP() {
    cdpClient := pilot.CDP()

    // Send any CDP command
    result, err := cdpClient.Send(ctx, "Performance.getMetrics", nil)
}

Testing

# Unit tests
go test -v ./...

# Integration tests
go test -tags=integration -v ./integration/...

# Headless mode
W3PILOT_HEADLESS=1 go test -tags=integration -v ./integration/...

Debug Logging

W3PILOT_DEBUG=1 w3pilot mcp

Related Projects

WebDriver BiDi - Protocol specification

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 200 Commits
.github		.github
cdp		cdp
cmd		cmd
docs		docs
integration		integration
mcp		mcp
rpa		rpa
script		script
session		session
state		state
tests		tests
.gitignore		.gitignore
.golangci.yaml		.golangci.yaml
CHANGELOG.json		CHANGELOG.json
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
TASKS.md		TASKS.md
UPSTREAM.md		UPSTREAM.md
assert.go		assert.go
assert_test.go		assert_test.go
bidi.go		bidi.go
bidi_test.go		bidi_test.go
binary.go		binary.go
clicker.go		clicker.go
clock.go		clock.go
context.go		context.go
debug.go		debug.go
dialog.go		dialog.go
doc.go		doc.go
download.go		download.go
element.go		element.go
element_verify_test.go		element_verify_test.go
errors.go		errors.go
example_test.go		example_test.go
go.mod		go.mod
go.sum		go.sum
inspect.go		inspect.go
keyboard.go		keyboard.go
lighthouse.go		lighthouse.go
mkdocs.yml		mkdocs.yml
mouse.go		mouse.go
performance.go		performance.go
pilot.go		pilot.go
route.go		route.go
touch.go		touch.go
tracing.go		tracing.go
transport_pipe.go		transport_pipe.go
transport_ws.go		transport_ws.go
types.go		types.go
validate.go		validate.go
video.go		video.go
websocket.go		websocket.go
workflow.go		workflow.go

Folders and files

Latest commit

History

Repository files navigation

W3Pilot

Overview

Architecture

Why Dual-Protocol?

Protocol-Agnostic API

Prerequisites

Install Clicker

Verify Installation

Browser Requirements

Installation

Quick Start

Go Client SDK

Session Management

MCP Server

CLI Commands

Script Runner

Feature Comparison

Client SDK

CDP Features (via Chrome DevTools Protocol)

Additional Features

MCP Server Tools

Session Recording Workflow

API Reference

Key Types

Semantic Selectors

SDK Usage

MCP Tool Usage

Available Selectors

Init Scripts

SDK Usage

CLI Usage

MCP Tool Usage

Storage State

SDK Usage

MCP Tool Usage

Tracing

SDK Usage

MCP Tool Usage

CDP Features (Chrome DevTools Protocol)

Heap Snapshots

Network Emulation

CPU Emulation

Direct CDP Access

Testing

Debug Logging

Related Projects

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Contributors

Uh oh!

Languages