Multi-agent browser automation that interviews you about the page, records your workflow, and replays it for any user — all from a Chrome extension. No headless browser, no Playwright server, no Selenium.
| Browser-Use | Skyvern | Anthropic Computer Use | Agentic Browser Lab | |
|---|---|---|---|---|
| Delivery | Python lib (headless) | Cloud SaaS (headless) | OS-level desktop control | Chrome MV3 extension |
| Agent design | Single LLM prompt | Single LLM prompt | Single Claude call | Multi-agent team (Perceiver + Planner + Interviewer) |
| Onboarding UX | Write prompts | Describe form | Just describe | Agent asks YOU multi-choice questions |
| Recovery | Re-prompt | Re-run | Re-shoot | Episodic memory + selector-failure learning |
| Uses user's real Chrome (cookies, auth) | ❌ | ❌ | ❌ (system Chrome) | ✅ Their actual logged-in session |
┌─────────────────────────────────────────────────────────────────────┐
│ Browser side (Chrome MV3 extension) │
│ ──────────────────────────────────── │
│ content-portals.js background.js executor.js │
│ ┌──────────────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ Picker overlay │ │ MV3 service │ │ CDP-based │ │
│ │ + HITL cards │◀─────▶│ worker │◀──▶│ click / type/ │ │
│ │ + Interview modal │ │ + screenshot │ │ navigate │ │
│ │ + animated cursor │ │ + WS to API │ │ (chrome.dbg) │ │
│ └──────────────────┘ └──────────────┘ └───────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
│
│ POST /interview/start
│ POST /interview/answer
│ POST /events
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Backend (Python / Pydantic AI) │
│ ──────────────────────────────── │
│ │
│ PerceiverAgent ──observe─▶ PageState │
│ (vision + DOM) { intent, fields, buttons, │
│ blockers, progress, confidence } │
│ │
│ InterviewerAgent ──propose─▶ Questionnaire │
│ (asks the user) { questions: [single/multi/free] } │
│ │
│ AnswerProcessor ──resolve─▶ WorkflowProposal | ActionPlan | │
│ FollowUpQuestionnaire │
│ │
│ PlannerAgent ──plan────▶ ExecutorPlan │
│ (freeform mode) { actions, confidence, goal_met } │
│ │
│ AutomationMemory ──recall──▶ MemoryHit[] ◀── mem0 (Qdrant) │
│ (selector-failure memory) │
└─────────────────────────────────────────────────────────────────────┘
| Agent | LLM call | Job |
|---|---|---|
| PerceiverAgent | qwen3-vl:235b → DOM-only fallback |
Reads the screenshot + DOM, produces structured PageState (intent, fields, buttons, blockers) |
| PlannerAgent | deepseek-v4-pro |
Takes instruction + PageState + action history + episodic memory → next 1-3 actions |
| InterviewerAgent | deepseek-v4-pro |
Reads the page, asks the user multi-choice questions to disambiguate intent before acting |
| AnswerProcessor | deepseek-v4-pro |
Takes user answers → produces a WorkflowProposal (save it), ActionPlan (run it), or FollowUpQuestionnaire |
| Mem0AutomationAdapter | (no LLM — vector store) | Remembers: selector failures, workflow usage, run summaries. Per user, persistent. |
User clicks "Pick" on overlay
│
▼
chrome.debugger.attach({tabId}, "1.3") ← REQUIRED, not optional
│
▼
Crosshair cursor + dashed outline on hover
│
▼
Click → CDP fetches:
• Accessibility.getPartialAXTree (role, name, value, state)
• CSS.getComputedStyleForNode (display, visibility, opacity)
• DOM.getBoxModel (precise quad)
• DOM.getOuterHTML (1200-char excerpt)
│
▼
Send to backend with picked element descriptor
│
▼
Planner uses the picked element as authoritative target
│
▼
Animated cursor flies to the target, pulses green on click
When the agent is uncertain, a HITL card appears with clipped screenshots of each candidate element so the user can pick visually instead of reading selectors.
from agentic_browser_lab.automation.perception_agent import observe_page
from agentic_browser_lab.automation.pai_wiring import plan_actions_from_instruction
from agentic_browser_lab.automation.interviewer_agent import interview_page
# Vision + DOM → structured page state
state = await observe_page(
dom_summary=my_dom_dict,
screenshot_data_url="data:image/png;base64,...",
page_url="https://example.com",
goal="fill the lookup form",
)
# Returns PageState(intent="lookup form", fields=[...], buttons=[...], blockers=[...])
# Instruction + page state → concrete actions
plan = await plan_actions_from_instruction(
instruction="Enter ZIP 90210 and click Find",
dom_summary={**page_state, "perceived_intent": state.intent, ...},
picked_elements=[],
)
# Returns ExecutorPlan(actions=[ExecutorAction(...), ...], confidence=0.95, goal_met=False)content-portals.js— picker overlay, HITL cards, interview modal, animated cursorexecutor.js— CDP click/type/key/navigate/wait_network_idleextension-logger.js— structured logging to chrome.storageruntime-config-shared.js— runtime config bootstrap
from agentic_browser_lab.memory import get_automation_memory
mem = get_automation_memory()
# Write
await mem.remember_selector_failure(
org_id=org, user_id=user,
page_url="https://sunfire.example/lookup",
stale_selector="[data-testid=search]",
replacement="button.lookup-primary",
replacement_kind="selector",
)
# Read (every planner call does this)
hits = await mem.recall_for_planner(
org_id=org, user_id=user,
query="fill ZIP and click lookup",
page_url="https://sunfire.example/lookup",
page_intent="customer lookup form",
)
# Returns MemoryHit objects — the planner is instructed to AVOID
# selectors that have failed before for this user.Backed by mem0 + Qdrant (the same stack OpenAI Operator uses internally). Falls back to in-memory storage when not configured. Never raises.
| Layer | Tech |
|---|---|
| Agents | Pydantic AI (structured output) + Ollama Cloud (deepseek-v4-pro for reasoning, qwen3-vl for vision) |
| Memory | mem0 + Qdrant (per-user episodic) |
| Extension | Chrome MV3 + CDP (chrome.debugger) |
| Backend | FastAPI + httpx |
Designed to plug into your existing LLM gateway / vector store. Provider-neutral.
Extracted from a production B2B SaaS in May 2026. Backend is running in prod with 3 migrations applied (027_learned_workflows, 028_learned_workflow_versions, 029_learned_workflow_marketplace — the marketplace + versioning live in the learned-workflows-marketplace sibling repo).
This repo focuses on the live observe / propose / act loop. The adjacent concerns are split into focused sibling repos so each one stays small enough to read in one sitting:
| Repo | What it owns |
|---|---|
learned-workflows-marketplace |
Storage triad (Postgres + Qdrant + Neo4j), versioning, cross-org marketplace, parameterization — the save / share / replay half |
site-mapper-agents |
Architect + Healer + Eavesdropper agents — endpoint classification, schema repair, replay-driven extraction — the API-discovery half |
cdp-network-interceptor |
The CDP wrapper used by site-mapper-agents to capture XHR/fetch traffic from a real Chrome tab |
mv3-audio-replay-buffer |
The MV3 offscreen-doc audio capture primitive used elsewhere in the parent platform |
If you need the full self-healing portal extraction stack (not just
the live browser planner this repo provides), pip install site-mapper-agents alongside this package.
- Drop the FastAPI lock-in — package as
pip install agentic-browser-labwith optional FastAPI plugin - Pluggable LLM provider (currently coupled to Ollama Cloud, easy to abstract)
- Headless mode for CI/CD (Playwright adapter)
- Cross-browser (Firefox MV3 port)
MIT — see LICENSE
@axumquant — built as part of Sales Coach (Medicare insurance B2B), open-sourced for the agent-tooling community.