Teach Claude a web application once. Use it forever.
This manual covers the learning skill — the process of exploring a web application, inferring its semantic operations, and generating a reusable MCP server.
- Overview
- Prerequisites
- Quick Start
- How It Works
- Re-Learning Modes
- Generated Server Architecture
- Template Utilities Reference
- exec() Interceptors
- Browser Mode Switches
- Troubleshooting
The /learn-webapp skill is the core of AutoWebMCP. It takes a URL, explores the
web application in a real Chrome browser, identifies what operations the app supports,
and generates a complete MCP (Model Context Protocol) server that exposes those
operations as callable tools.
What it replaces: Every time Claude interacts with a web app via computer use, it screenshots the screen, analyzes the image, calculates click coordinates, clicks, and screenshots again. This is slow (~12 seconds for a single operation) and fragile.
What it produces: A permanent MCP server where each operation is a single function call (~0.5 seconds, reliable).
| Requirement | Details |
|---|---|
| Claude Code | With MCP support |
| Node.js | 18+ (required by the MCP SDK) |
| Google Chrome | Any recent version — the skill auto-detects the installation path |
| Claude in Chrome extension | Required during the learning phase for browser interaction |
No manual Chrome setup needed. The skill automatically detects Chrome, launches it with remote debugging enabled, and manages the CDP connection.
/learn-webapp https://your-app.com
That's it. The skill will:
- Launch Chrome with CDP if needed
- Navigate to the app
- Explore the UI systematically
- Present you with a list of discovered tools for approval
- Generate a complete MCP server
- Validate each tool against the live app
- Register the server in
.mcp.json
You stay in control throughout — the skill asks before clicking anything risky and you approve the final tool list before code generation begins.
The skill communicates with Chrome through the Chrome DevTools Protocol (CDP) on port 9222. On startup, it tests connectivity:
curl -s http://127.0.0.1:9222/json/versionIf Chrome isn't running with CDP, the skill offers two browser modes:
| Mode | Description | Best for |
|---|---|---|
| User profile | Dedicated CDP profile directory with your credentials | Apps requiring authentication |
| Sandbox | Clean temporary profile with no stored data | Public apps, testing |
Important: Chrome requires --user-data-dir set to a non-default directory for
CDP to work. The skill uses a dedicated CDP-Profile directory that persists your
login sessions across learning sessions.
Connection health probing: Before every major phase, the skill verifies CDP connectivity. If Chrome crashes or the tab closes mid-exploration, it detects the failure and offers to relaunch.
- Parses the URL and derives an app name (e.g.,
sites.google.com→google-sites) - Checks
catalogue.jsonfor an existing MCP — if found, offers re-learning options - Creates the output directory structure:
MCPs/<app-name>/
├── exploration/ # Exploration logs and snapshots
│ └── log.json # Structured exploration log
├── server/ # Generated MCP server
│ ├── index.mjs # MCP server entry point
│ ├── commands.mjs # Command library
│ ├── package.json # Dependencies
│ └── manifest.json # Capabilities metadata
├── README.md # Generated documentation
└── test-task.md # Integration test + results
- Opens the app in Chrome and captures a baseline screenshot
Goal: Build a complete map of the application's UI without clicking anything.
The skill reads the accessibility tree (not screenshots) to identify:
- Regions: Navigation bars, content areas, toolbars, sidebars, panels
- Interactive elements: Buttons, inputs, links, dropdowns, toggles
- Element metadata: ARIA labels, roles, data attributes, text content
Key detection capabilities:
| Detection | What it catches |
|---|---|
| Obfuscated DOM | Apps like Facebook/Gmail with hashed CSS classes (x9f619, _a9--) — CSS selectors are deprioritized, ARIA/text matching used instead |
| Duplicate elements | Card-based UIs where the same button exists per card (e.g., "Delete" on each question in Google Forms) — visibility filtering applied |
| ARIA classification | Each element scored as semantic (aria-label), structural (role+parent), or text-only — affects confidence scoring |
Overlay dismissal (Phase 2.5): Before exploration begins, the skill clears any blocking overlays — AI assistant dialogs (Gemini, Copilot), cookie banners, onboarding tours, promotional modals. Priority is given to privacy-preserving choices (reject cookies, dismiss tours).
Goal: Systematically probe interactive elements to understand what they do.
The skill explores elements by region, prioritizing:
- Primary actions (compose, create, submit)
- Form inputs (text fields, dropdowns)
- Navigation (menus, tabs)
- Secondary actions (toolbar buttons, toggles)
For each element, the exploration procedure is:
- Pre-state capture — screenshot + accessibility snapshot
- Danger-zone check — classify risk before clicking:
- SKIP: Publish, Send, Delete All, Pay (never clicked)
- SAFE: Tabs, menus, expand/collapse, preview (clicked freely)
- CAUTIOUS: Single delete, toggle, create (clicked and undone)
- Execute interaction — click, type, hover, or select
- Trusted click detection — if JS
.click()fails (no DOM change), retry with CDP browser-level click. Elements requiring trusted clicks are flagged withrequiresTrustedClick: true - Post-state capture — screenshot + snapshot, diff against pre-state
- Record observation — what changed, what selectors work, what pattern it follows
- Reset state — close dialogs, navigate back, undo changes
Widget interaction pattern classification:
| Pattern | Description | Code Pattern |
|---|---|---|
direct-click |
Single JS click produces the effect | el.click() |
trusted-click |
Requires browser-level click (isTrusted) | return { __clickCoords: {x, y} } |
dropdown-option |
Click opens listbox, then click option | el.click(); sleep(300); option.click() |
focus-type |
Focus element, then type via setInputValue | el.focus(); setInputValue(el, value) |
contenteditable |
ContentEditable div, selectAll + insertText | setContentEditableValue(el, value) |
multi-step-cascade |
Trigger → panel → fill → confirm | clickAndWait(trigger, panel) |
toggle |
Click toggles state, read back aria-checked | el.click(); return el.getAttribute('aria-checked') |
Multi-view discovery: When the app has distinct views (e.g., a form list page vs. a form editor), each view is explored separately and operations are tagged by view.
Authentication handling: If the app redirects to a login page, the skill pauses and asks you to log in manually. It monitors the browser URL and resumes once authentication completes.
Goal: Derive semantic operations from the exploration data.
The skill analyzes all explorations and identifies four types of operations:
- Single-action — one click/input produces a result (toggle dark mode, refresh)
- Multi-step — sequences forming a workflow (compose → fill → send)
- Parameterized — actions taking user data (search for X, set field to Y)
- Query — reading/extracting visible state (get unread count, list items)
Each operation gets:
- A
snake_casename and human-readable description - Parameter schema with types and descriptions
- Multi-fallback selector arrays ordered by resilience score
- Confidence score (0.0–1.0) based on selector quality and validation
Selector priority scoring:
| Score | Selector Type | Example |
|---|---|---|
| 5 | aria-label |
[aria-label="Add question"] |
| 5 | data-testid |
[data-testid="compose-btn"] |
| 4 | data-tooltip |
[data-tooltip="Add question"] |
| 3 | role + text |
findElementByText("button", "Add") |
| 2 | CSS class | .compose-button |
| 1 | Text-only | findButtonByText("Insert") |
| 0 | Positional | querySelectorAll('input')[2] |
App-specific helper extraction: Repeated patterns across operations (same selector in 3+ operations, same multi-step sequence) are extracted as named helper functions.
This is mandatory — code generation cannot proceed without your approval.
The skill presents a numbered table of all discovered tools:
## Proposed tools for google-forms MCP server
| # | Tool Name | Description |
|---|----------------------|--------------------------------------|
| 1 | set_form_title | Set the form title text |
| 2 | add_question | Add a new question to the form |
| 3 | set_question_type | Change the type of a question |
| ... |
You can:
- Approve — proceed to code generation
- Remove tools — "Remove 3 and 7"
- Add a tool — "Add a tool for duplicating a page" (triggers Phase 7 exploration)
The gate loops until you explicitly approve.
Goal: Generate the complete MCP server package from approved operations.
The skill reads template files and generates:
commands.mjs— All operations as exported async functions, plus framework utility functions and app-specific helpersindex.mjs— MCP server entry point with CDP connection, helper injection, tool registrations, and theexec()execution enginepackage.json— Dependencies (@modelcontextprotocol/sdk,puppeteer-core)manifest.json— Server metadata with operation list and confidence scoresREADME.md— Human-readable documentation for the generated server
Mandatory code quality rules:
waitForElement()oversleep()— every click that triggers DOM change must wait for the expected result, not use blind delays- Semantic selectors first —
aria-labelanddata-testidbefore CSS classes - Readback verification — every set/update operation reads back the DOM to confirm the change actually took effect
- Multi-view state guards — operations check they're on the correct view before executing
- Everything exported — all functions (utilities and operations) must be
exported so the dynamic injection system can collect and inject them
6.1 Per-tool live validation: Every generated tool is executed against the live app in the browser. For mutation tools, the change is verified via DOM readback and then undone. Tools must achieve a 70% pass rate minimum.
6.1.2 End-to-end test task: A realistic multi-step scenario that chains 5-8 tools
together, written to test-task.md. This catches integration issues that per-tool
validation misses (focus management, state transitions, timing).
6.2 Catalogue update: The new MCP is registered in catalogue.json.
6.3 Auto-register: The MCP server is added to .mcp.json so Claude Code loads
it on next restart.
6.4 Results report: A table of all learned tools with confidence scores, plus failed operations, low-confidence warnings, and unexplored areas.
After initial learning, you can add individual tools on demand:
add tool: archive conversation — moves the current conversation to archive
The skill explores the specific operation, validates it, and appends it to the existing MCP server files without regenerating everything.
When you run /learn-webapp on an app that already has an MCP server, you get
four options:
| Mode | What it does | Version impact |
|---|---|---|
| Re-learn from scratch | Full exploration, new server | Major bump (1.0 → 2.0) |
| Update/extend | Keep existing tools, add new ones | Minor bump (1.0 → 1.1) |
| Validate & fix | Test each tool, fix broken selectors | Patch (updates confidence) |
| Learn separate MCP | Create independent tool set | New MCP entry |
The command library contains three sections:
// --- Framework Utilities (from template) ---
// querySelector, waitForElement, setInputValue, etc.
// 20 exported utility functions
// --- App-Specific Helpers ---
// Extracted from repeated patterns across operations
// e.g., getActiveQuestionCard(), isEditor(), clickSidebarTab()
// --- Learned Operations ---
// One exported async function per approved tool
// e.g., set_form_title({ title }), add_question({ text, type })Every function must be exported — the server's injection system uses
Object.entries(commands).filter(...) to collect all exports and inject them
into the browser context via CDP. Non-exported helpers will be undefined at runtime.
The server entry point handles:
- CDP connection via
puppeteer-corewith 4 mode combinations - Helper injection — all
commands.mjsexports injected into browser context once per page viaevaluateOnNewDocument(survives navigations, bypasses CSP) - Tool registration — each operation registered as an MCP tool with JSON schema
exec()engine — executes operations with interceptors, retry, and URL re-check- Built-in tools —
health_check,get_page_state,show_scripts,run_script
{
"name": "google-forms-mcp",
"version": "1.0.0",
"targetApp": {
"url": "https://docs.google.com/forms/",
"urlPattern": "docs\\.google\\.com/forms"
},
"operations": [
{ "name": "set_form_title", "confidence": 0.90, "category": "data_entry" }
]
}These functions are available in every generated commands.mjs:
| Function | Purpose |
|---|---|
querySelector(selectors) |
Try multiple CSS selectors, return first match |
querySelectorAll(selectors) |
Try multiple selectors, return all matches |
queryVisibleSelector(selectors) |
Like querySelector but only visible matches |
querySelectorWithin(root, selectors) |
Scoped query within a root element |
waitForElement(selectors, timeout) |
Wait for element to appear (MutationObserver) |
waitForRemoval(selector, timeout) |
Wait for element to disappear |
sleep(ms) |
Pause (animation timing only, never as sole wait) |
setInputValue(el, value) |
Set value on <input>/<textarea> |
setContentEditableValue(el, value) |
Set value on contentEditable elements |
clickByAriaLabel(label) |
Click element by aria-label |
findButtonByText(text) |
Find button or [role="button"] by text |
findElementByText(role, text, opts) |
Find element by ARIA role + text |
clickMenuItem(itemText) |
Click menu item by text |
clickAndWait(clickSel, waitSel, timeout) |
Click then wait for result |
multiStep(steps) |
Sequential click-and-wait steps |
getPageState() |
Current URL, title, active element, dialogs |
navigateTo(url) |
Navigate via __navigate signal |
getRepeatingContainers(anchor, levels, verify) |
Find feed/card containers |
menuCascade(itemTexts, delay) |
Navigate cascading menus |
selectRadioByIndex(group, index) |
Select radio via trusted click |
togglePanel(trigger, panel, action) |
State-aware panel toggle |
The exec() function in index.mjs processes special return values from command
functions, enabling operations that can't be done purely in JavaScript:
| Interceptor | Trigger | What it does |
|---|---|---|
__clickCoords |
{ __clickCoords: {x, y} } |
Performs trusted browser-level click via Puppeteer (for widgets checking event.isTrusted) |
__hoverCoords |
{ __hoverCoords: {x, y} } |
Moves mouse to coordinates, waits for hover UI to appear |
__keyPress |
{ __keyPress: "Enter" } |
Sends keyboard events via Puppeteer (search submit, Escape) |
__navigate |
{ __navigate: url } |
Navigates page at Puppeteer level (avoids race conditions) |
__followUp |
{ __followUp: "funcName()" } |
Evaluates JavaScript after the intercepted action completes |
Retry logic: On first failure, exec() re-injects helpers and retries once.
This catches cases where a navigation cleared the injected functions.
URL re-check: After execution, if the URL changed but still matches the app's URL pattern, helpers are re-injected for the new page state.
Every generated MCP server supports two independent environment variables:
| Variable | Values | Default |
|---|---|---|
BROWSER_MODE |
visible, headless |
visible |
DATA_MODE |
user, sandbox |
user |
Mode combinations:
| BROWSER_MODE | DATA_MODE | Use Case |
|---|---|---|
visible |
user |
Default. Works in your running Chrome session |
visible |
sandbox |
Testing — visible browser, clean profile |
headless |
user |
Background automation with saved credentials |
headless |
sandbox |
CI/testing — fully isolated |
Set in .mcp.json:
{
"env": {
"BROWSER_MODE": "visible",
"DATA_MODE": "user"
}
}| Problem | Cause | Fix |
|---|---|---|
Target closed during exploration |
Chrome tab was closed | Skill auto-detects and offers to re-navigate |
Session closed or Protocol error |
CDP connection lost | Skill runs health check and relaunches Chrome |
Tool returns selector_not_found |
App UI changed since learning | Re-run /learn-webapp with "Validate & fix" option |
isTrusted click failures |
Widget rejects JS .click() |
Skill auto-detects and uses __clickCoords pattern |
| Helpers undefined in browser | Function not exported | Ensure all functions in commands.mjs use export |
| CSP blocks script injection | App has strict Content Security Policy | Template uses CDP protocol methods which bypass CSP |
| Readback mismatch | Input method doesn't match element type | Switch between setInputValue and setContentEditableValue |
Generated by AutoWebMCP.