Agentic Browser Lab

Multi-agent browser automation that interviews you about the page, records your workflow, and replays it for any user — all from a Chrome extension. No headless browser, no Playwright server, no Selenium.

What makes this different

	Browser-Use	Skyvern	Anthropic Computer Use	Agentic Browser Lab
Delivery	Python lib (headless)	Cloud SaaS (headless)	OS-level desktop control	Chrome MV3 extension
Agent design	Single LLM prompt	Single LLM prompt	Single Claude call	Multi-agent team (Perceiver + Planner + Interviewer)
Onboarding UX	Write prompts	Describe form	Just describe	Agent asks YOU multi-choice questions
Recovery	Re-prompt	Re-run	Re-shoot	Episodic memory + selector-failure learning
Uses user's real Chrome (cookies, auth)	❌	❌	❌ (system Chrome)	✅ Their actual logged-in session

Architecture (Bounded Context per agentic-DDD)

┌─────────────────────────────────────────────────────────────────────┐
│  Browser side (Chrome MV3 extension)                                │
│  ────────────────────────────────────                                │
│  content-portals.js          background.js          executor.js     │
│  ┌──────────────────┐        ┌──────────────┐    ┌───────────────┐ │
│  │ Picker overlay    │        │ MV3 service  │    │ CDP-based     │ │
│  │ + HITL cards      │◀─────▶│ worker       │◀──▶│ click / type/ │ │
│  │ + Interview modal │        │ + screenshot │    │ navigate      │ │
│  │ + animated cursor │        │ + WS to API  │    │ (chrome.dbg)  │ │
│  └──────────────────┘        └──────────────┘    └───────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
                                  │
                                  │  POST /interview/start
                                  │  POST /interview/answer
                                  │  POST /events
                                  ▼
┌─────────────────────────────────────────────────────────────────────┐
│  Backend (Python / Pydantic AI)                                     │
│  ────────────────────────────────                                    │
│                                                                      │
│   PerceiverAgent    ──observe─▶  PageState                          │
│   (vision + DOM)                  { intent, fields, buttons,        │
│                                     blockers, progress, confidence } │
│                                                                      │
│   InterviewerAgent  ──propose─▶  Questionnaire                      │
│   (asks the user)                 { questions: [single/multi/free] } │
│                                                                      │
│   AnswerProcessor   ──resolve─▶  WorkflowProposal | ActionPlan |    │
│                                  FollowUpQuestionnaire               │
│                                                                      │
│   PlannerAgent      ──plan────▶  ExecutorPlan                       │
│   (freeform mode)                 { actions, confidence, goal_met } │
│                                                                      │
│   AutomationMemory  ──recall──▶  MemoryHit[]   ◀── mem0 (Qdrant)    │
│   (selector-failure memory)                                          │
└─────────────────────────────────────────────────────────────────────┘

The 5 agents

Agent	LLM call	Job
PerceiverAgent	`qwen3-vl:235b` → DOM-only fallback	Reads the screenshot + DOM, produces structured `PageState` (intent, fields, buttons, blockers)
PlannerAgent	`deepseek-v4-pro`	Takes instruction + PageState + action history + episodic memory → next 1-3 actions
InterviewerAgent	`deepseek-v4-pro`	Reads the page, asks the user multi-choice questions to disambiguate intent before acting
AnswerProcessor	`deepseek-v4-pro`	Takes user answers → produces a `WorkflowProposal` (save it), `ActionPlan` (run it), or `FollowUpQuestionnaire`
Mem0AutomationAdapter	(no LLM — vector store)	Remembers: selector failures, workflow usage, run summaries. Per user, persistent.

The Chrome extension UX

User clicks "Pick" on overlay
   │
   ▼
chrome.debugger.attach({tabId}, "1.3")    ← REQUIRED, not optional
   │
   ▼
Crosshair cursor + dashed outline on hover
   │
   ▼
Click → CDP fetches:
   • Accessibility.getPartialAXTree    (role, name, value, state)
   • CSS.getComputedStyleForNode       (display, visibility, opacity)
   • DOM.getBoxModel                   (precise quad)
   • DOM.getOuterHTML                  (1200-char excerpt)
   │
   ▼
Send to backend with picked element descriptor
   │
   ▼
Planner uses the picked element as authoritative target
   │
   ▼
Animated cursor flies to the target, pulses green on click

When the agent is uncertain, a HITL card appears with clipped screenshots of each candidate element so the user can pick visually instead of reading selectors.

What you'd extract from this for your own project

Drop-in agents (Python / Pydantic AI)

from agentic_browser_lab.automation.perception_agent import observe_page
from agentic_browser_lab.automation.pai_wiring import plan_actions_from_instruction
from agentic_browser_lab.automation.interviewer_agent import interview_page

# Vision + DOM → structured page state
state = await observe_page(
    dom_summary=my_dom_dict,
    screenshot_data_url="data:image/png;base64,...",
    page_url="https://example.com",
    goal="fill the lookup form",
)
# Returns PageState(intent="lookup form", fields=[...], buttons=[...], blockers=[...])

# Instruction + page state → concrete actions
plan = await plan_actions_from_instruction(
    instruction="Enter ZIP 90210 and click Find",
    dom_summary={**page_state, "perceived_intent": state.intent, ...},
    picked_elements=[],
)
# Returns ExecutorPlan(actions=[ExecutorAction(...), ...], confidence=0.95, goal_met=False)

Drop-in Chrome extension primitives

content-portals.js — picker overlay, HITL cards, interview modal, animated cursor
executor.js — CDP click/type/key/navigate/wait_network_idle
extension-logger.js — structured logging to chrome.storage
runtime-config-shared.js — runtime config bootstrap

Episodic memory (the part nobody else has)

from agentic_browser_lab.memory import get_automation_memory

mem = get_automation_memory()

# Write
await mem.remember_selector_failure(
    org_id=org, user_id=user,
    page_url="https://sunfire.example/lookup",
    stale_selector="[data-testid=search]",
    replacement="button.lookup-primary",
    replacement_kind="selector",
)

# Read (every planner call does this)
hits = await mem.recall_for_planner(
    org_id=org, user_id=user,
    query="fill ZIP and click lookup",
    page_url="https://sunfire.example/lookup",
    page_intent="customer lookup form",
)
# Returns MemoryHit objects — the planner is instructed to AVOID
# selectors that have failed before for this user.

Backed by mem0 + Qdrant (the same stack OpenAI Operator uses internally). Falls back to in-memory storage when not configured. Never raises.

Stack

Layer	Tech
Agents	Pydantic AI (structured output) + Ollama Cloud (deepseek-v4-pro for reasoning, qwen3-vl for vision)
Memory	mem0 + Qdrant (per-user episodic)
Extension	Chrome MV3 + CDP (chrome.debugger)
Backend	FastAPI + httpx

Designed to plug into your existing LLM gateway / vector store. Provider-neutral.

Status

Extracted from a production B2B SaaS in May 2026. Backend is running in prod with 3 migrations applied (027_learned_workflows, 028_learned_workflow_versions, 029_learned_workflow_marketplace — the marketplace + versioning live in the learned-workflows-marketplace sibling repo).

Sibling repos

This repo focuses on the live observe / propose / act loop. The adjacent concerns are split into focused sibling repos so each one stays small enough to read in one sitting:

Repo	What it owns
`learned-workflows-marketplace`	Storage triad (Postgres + Qdrant + Neo4j), versioning, cross-org marketplace, parameterization — the save / share / replay half
`site-mapper-agents`	Architect + Healer + Eavesdropper agents — endpoint classification, schema repair, replay-driven extraction — the API-discovery half
`cdp-network-interceptor`	The CDP wrapper used by site-mapper-agents to capture XHR/fetch traffic from a real Chrome tab
`mv3-audio-replay-buffer`	The MV3 offscreen-doc audio capture primitive used elsewhere in the parent platform

If you need the full self-healing portal extraction stack (not just the live browser planner this repo provides), pip install site-mapper-agents alongside this package.

Roadmap

Drop the FastAPI lock-in — package as pip install agentic-browser-lab with optional FastAPI plugin
Pluggable LLM provider (currently coupled to Ollama Cloud, easy to abstract)
Headless mode for CI/CD (Playwright adapter)
Cross-browser (Firefox MV3 port)

License

MIT — see LICENSE

Maintainer

@axumquant — built as part of Sales Coach (Medicare insurance B2B), open-sourced for the agent-tooling community.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
agentic_browser_lab		agentic_browser_lab
chrome-extension		chrome-extension
docs		docs
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic Browser Lab

What makes this different

Architecture (Bounded Context per agentic-DDD)

The 5 agents

The Chrome extension UX

What you'd extract from this for your own project

Drop-in agents (Python / Pydantic AI)

Drop-in Chrome extension primitives

Episodic memory (the part nobody else has)

Stack

Status

Sibling repos

Roadmap

License

Maintainer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agentic Browser Lab

What makes this different

Architecture (Bounded Context per agentic-DDD)

The 5 agents

The Chrome extension UX

What you'd extract from this for your own project

Drop-in agents (Python / Pydantic AI)

Drop-in Chrome extension primitives

Episodic memory (the part nobody else has)

Stack

Status

Sibling repos

Roadmap

License

Maintainer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages