Learning Manual — `/learn-webapp`

Teach Claude a web application once. Use it forever.

This manual covers the learning skill — the process of exploring a web application, inferring its semantic operations, and generating a reusable MCP server.

Overview
Prerequisites
Quick Start
How It Works
Re-Learning Modes
Generated Server Architecture
Template Utilities Reference
exec() Interceptors
Browser Mode Switches
Troubleshooting

Overview

The /learn-webapp skill is the core of AutoWebMCP. It takes a URL, explores the web application in a real Chrome browser, identifies what operations the app supports, and generates a complete MCP (Model Context Protocol) server that exposes those operations as callable tools.

What it replaces: Every time Claude interacts with a web app via computer use, it screenshots the screen, analyzes the image, calculates click coordinates, clicks, and screenshots again. This is slow (~12 seconds for a single operation) and fragile.

What it produces: A permanent MCP server where each operation is a single function call (~0.5 seconds, reliable).

Prerequisites

Requirement	Details
Claude Code	With MCP support
Node.js	18+ (required by the MCP SDK)
Google Chrome	Any recent version — the skill auto-detects the installation path
Claude in Chrome extension	Required during the learning phase for browser interaction

No manual Chrome setup needed. The skill automatically detects Chrome, launches it with remote debugging enabled, and manages the CDP connection.

Quick Start

/learn-webapp https://your-app.com

That's it. The skill will:

Launch Chrome with CDP if needed
Navigate to the app
Explore the UI systematically
Present you with a list of discovered tools for approval
Generate a complete MCP server
Validate each tool against the live app
Register the server in .mcp.json

You stay in control throughout — the skill asks before clicking anything risky and you approve the final tool list before code generation begins.

How It Works

Phase 0: Chrome CDP Setup

The skill communicates with Chrome through the Chrome DevTools Protocol (CDP) on port 9222. On startup, it tests connectivity:

curl -s http://127.0.0.1:9222/json/version

If Chrome isn't running with CDP, the skill offers two browser modes:

Mode	Description	Best for
User profile	Dedicated CDP profile directory with your credentials	Apps requiring authentication
Sandbox	Clean temporary profile with no stored data	Public apps, testing

Important: Chrome requires --user-data-dir set to a non-default directory for CDP to work. The skill uses a dedicated CDP-Profile directory that persists your login sessions across learning sessions.

Connection health probing: Before every major phase, the skill verifies CDP connectivity. If Chrome crashes or the tab closes mid-exploration, it detects the failure and offers to relaunch.

Phase 1: Initialization

Parses the URL and derives an app name (e.g., sites.google.com → google-sites)
Checks catalogue.json for an existing MCP — if found, offers re-learning options
Creates the output directory structure:

MCPs/<app-name>/
├── exploration/          # Exploration logs and snapshots
│   └── log.json          # Structured exploration log
├── server/               # Generated MCP server
│   ├── index.mjs         # MCP server entry point
│   ├── commands.mjs      # Command library
│   ├── package.json      # Dependencies
│   └── manifest.json     # Capabilities metadata
├── README.md             # Generated documentation
└── test-task.md          # Integration test + results

Opens the app in Chrome and captures a baseline screenshot

Phase 2: Page Reconnaissance

Goal: Build a complete map of the application's UI without clicking anything.

The skill reads the accessibility tree (not screenshots) to identify:

Regions: Navigation bars, content areas, toolbars, sidebars, panels
Interactive elements: Buttons, inputs, links, dropdowns, toggles
Element metadata: ARIA labels, roles, data attributes, text content

Key detection capabilities:

Detection	What it catches
Obfuscated DOM	Apps like Facebook/Gmail with hashed CSS classes (`x9f619`, `_a9--`) — CSS selectors are deprioritized, ARIA/text matching used instead
Duplicate elements	Card-based UIs where the same button exists per card (e.g., "Delete" on each question in Google Forms) — visibility filtering applied
ARIA classification	Each element scored as semantic (aria-label), structural (role+parent), or text-only — affects confidence scoring

Overlay dismissal (Phase 2.5): Before exploration begins, the skill clears any blocking overlays — AI assistant dialogs (Gemini, Copilot), cookie banners, onboarding tours, promotional modals. Priority is given to privacy-preserving choices (reject cookies, dismiss tours).

Phase 3: Interaction Exploration

Goal: Systematically probe interactive elements to understand what they do.

The skill explores elements by region, prioritizing:

Primary actions (compose, create, submit)
Form inputs (text fields, dropdowns)
Navigation (menus, tabs)
Secondary actions (toolbar buttons, toggles)

For each element, the exploration procedure is:

Pre-state capture — screenshot + accessibility snapshot
Danger-zone check — classify risk before clicking:
- SKIP: Publish, Send, Delete All, Pay (never clicked)
- SAFE: Tabs, menus, expand/collapse, preview (clicked freely)
- CAUTIOUS: Single delete, toggle, create (clicked and undone)
Execute interaction — click, type, hover, or select
Trusted click detection — if JS .click() fails (no DOM change), retry with CDP browser-level click. Elements requiring trusted clicks are flagged with requiresTrustedClick: true
Post-state capture — screenshot + snapshot, diff against pre-state
Record observation — what changed, what selectors work, what pattern it follows
Reset state — close dialogs, navigate back, undo changes

Widget interaction pattern classification:

Pattern	Description	Code Pattern
`direct-click`	Single JS click produces the effect	`el.click()`
`trusted-click`	Requires browser-level click (isTrusted)	`return { __clickCoords: {x, y} }`
`dropdown-option`	Click opens listbox, then click option	`el.click(); sleep(300); option.click()`
`focus-type`	Focus element, then type via setInputValue	`el.focus(); setInputValue(el, value)`
`contenteditable`	ContentEditable div, selectAll + insertText	`setContentEditableValue(el, value)`
`multi-step-cascade`	Trigger → panel → fill → confirm	`clickAndWait(trigger, panel)`
`toggle`	Click toggles state, read back aria-checked	`el.click(); return el.getAttribute('aria-checked')`

Multi-view discovery: When the app has distinct views (e.g., a form list page vs. a form editor), each view is explored separately and operations are tagged by view.

Authentication handling: If the app redirects to a login page, the skill pauses and asks you to log in manually. It monitors the browser URL and resumes once authentication completes.

Phase 4: Operation Inference

Goal: Derive semantic operations from the exploration data.

The skill analyzes all explorations and identifies four types of operations:

Single-action — one click/input produces a result (toggle dark mode, refresh)
Multi-step — sequences forming a workflow (compose → fill → send)
Parameterized — actions taking user data (search for X, set field to Y)
Query — reading/extracting visible state (get unread count, list items)

Each operation gets:

A snake_case name and human-readable description
Parameter schema with types and descriptions
Multi-fallback selector arrays ordered by resilience score
Confidence score (0.0–1.0) based on selector quality and validation

Selector priority scoring:

Score	Selector Type	Example
5	`aria-label`	`[aria-label="Add question"]`
5	`data-testid`	`[data-testid="compose-btn"]`
4	`data-tooltip`	`[data-tooltip="Add question"]`
3	`role` + text	`findElementByText("button", "Add")`
2	CSS class	`.compose-button`
1	Text-only	`findButtonByText("Insert")`
0	Positional	`querySelectorAll('input')[2]`

App-specific helper extraction: Repeated patterns across operations (same selector in 3+ operations, same multi-step sequence) are extracted as named helper functions.

Phase 4.5: User Approval Gate

This is mandatory — code generation cannot proceed without your approval.

The skill presents a numbered table of all discovered tools:

## Proposed tools for google-forms MCP server

| # | Tool Name            | Description                          |
|---|----------------------|--------------------------------------|
| 1 | set_form_title       | Set the form title text              |
| 2 | add_question         | Add a new question to the form       |
| 3 | set_question_type    | Change the type of a question        |
| ...                                                            |

You can:

Approve — proceed to code generation
Remove tools — "Remove 3 and 7"
Add a tool — "Add a tool for duplicating a page" (triggers Phase 7 exploration)

The gate loops until you explicitly approve.

Phase 5: Code Generation

Goal: Generate the complete MCP server package from approved operations.

The skill reads template files and generates:

commands.mjs — All operations as exported async functions, plus framework utility functions and app-specific helpers
index.mjs — MCP server entry point with CDP connection, helper injection, tool registrations, and the exec() execution engine
package.json — Dependencies (@modelcontextprotocol/sdk, puppeteer-core)
manifest.json — Server metadata with operation list and confidence scores
README.md — Human-readable documentation for the generated server

Mandatory code quality rules:

waitForElement() over sleep() — every click that triggers DOM change must wait for the expected result, not use blind delays
Semantic selectors first — aria-label and data-testid before CSS classes
Readback verification — every set/update operation reads back the DOM to confirm the change actually took effect
Multi-view state guards — operations check they're on the correct view before executing
Everything exported — all functions (utilities and operations) must be exported so the dynamic injection system can collect and inject them

Phase 6: Validation & Publication

6.1 Per-tool live validation: Every generated tool is executed against the live app in the browser. For mutation tools, the change is verified via DOM readback and then undone. Tools must achieve a 70% pass rate minimum.

6.1.2 End-to-end test task: A realistic multi-step scenario that chains 5-8 tools together, written to test-task.md. This catches integration issues that per-tool validation misses (focus management, state transitions, timing).

6.2 Catalogue update: The new MCP is registered in catalogue.json.

6.3 Auto-register: The MCP server is added to .mcp.json so Claude Code loads it on next restart.

6.4 Results report: A table of all learned tools with confidence scores, plus failed operations, low-confidence warnings, and unexplored areas.

Phase 7: Manual Tool Addition

After initial learning, you can add individual tools on demand:

add tool: archive conversation — moves the current conversation to archive

The skill explores the specific operation, validates it, and appends it to the existing MCP server files without regenerating everything.

Re-Learning Modes

When you run /learn-webapp on an app that already has an MCP server, you get four options:

Mode	What it does	Version impact
Re-learn from scratch	Full exploration, new server	Major bump (1.0 → 2.0)
Update/extend	Keep existing tools, add new ones	Minor bump (1.0 → 1.1)
Validate & fix	Test each tool, fix broken selectors	Patch (updates confidence)
Learn separate MCP	Create independent tool set	New MCP entry

Generated Server Architecture

commands.mjs — Command Library

The command library contains three sections:

// --- Framework Utilities (from template) ---
// querySelector, waitForElement, setInputValue, etc.
// 20 exported utility functions

// --- App-Specific Helpers ---
// Extracted from repeated patterns across operations
// e.g., getActiveQuestionCard(), isEditor(), clickSidebarTab()

// --- Learned Operations ---
// One exported async function per approved tool
// e.g., set_form_title({ title }), add_question({ text, type })

Every function must be exported — the server's injection system uses Object.entries(commands).filter(...) to collect all exports and inject them into the browser context via CDP. Non-exported helpers will be undefined at runtime.

index.mjs — MCP Server

The server entry point handles:

CDP connection via puppeteer-core with 4 mode combinations
Helper injection — all commands.mjs exports injected into browser context once per page via evaluateOnNewDocument (survives navigations, bypasses CSP)
Tool registration — each operation registered as an MCP tool with JSON schema
exec() engine — executes operations with interceptors, retry, and URL re-check
Built-in tools — health_check, get_page_state, show_scripts, run_script

manifest.json — Capabilities Metadata

{
  "name": "google-forms-mcp",
  "version": "1.0.0",
  "targetApp": {
    "url": "https://docs.google.com/forms/",
    "urlPattern": "docs\\.google\\.com/forms"
  },
  "operations": [
    { "name": "set_form_title", "confidence": 0.90, "category": "data_entry" }
  ]
}

Template Utilities Reference

These functions are available in every generated commands.mjs:

Function	Purpose
`querySelector(selectors)`	Try multiple CSS selectors, return first match
`querySelectorAll(selectors)`	Try multiple selectors, return all matches
`queryVisibleSelector(selectors)`	Like querySelector but only visible matches
`querySelectorWithin(root, selectors)`	Scoped query within a root element
`waitForElement(selectors, timeout)`	Wait for element to appear (MutationObserver)
`waitForRemoval(selector, timeout)`	Wait for element to disappear
`sleep(ms)`	Pause (animation timing only, never as sole wait)
`setInputValue(el, value)`	Set value on `<input>`/`<textarea>`
`setContentEditableValue(el, value)`	Set value on contentEditable elements
`clickByAriaLabel(label)`	Click element by aria-label
`findButtonByText(text)`	Find button or [role="button"] by text
`findElementByText(role, text, opts)`	Find element by ARIA role + text
`clickMenuItem(itemText)`	Click menu item by text
`clickAndWait(clickSel, waitSel, timeout)`	Click then wait for result
`multiStep(steps)`	Sequential click-and-wait steps
`getPageState()`	Current URL, title, active element, dialogs
`navigateTo(url)`	Navigate via `__navigate` signal
`getRepeatingContainers(anchor, levels, verify)`	Find feed/card containers
`menuCascade(itemTexts, delay)`	Navigate cascading menus
`selectRadioByIndex(group, index)`	Select radio via trusted click
`togglePanel(trigger, panel, action)`	State-aware panel toggle

exec() Interceptors

The exec() function in index.mjs processes special return values from command functions, enabling operations that can't be done purely in JavaScript:

Interceptor	Trigger	What it does
`__clickCoords`	`{ __clickCoords: {x, y} }`	Performs trusted browser-level click via Puppeteer (for widgets checking `event.isTrusted`)
`__hoverCoords`	`{ __hoverCoords: {x, y} }`	Moves mouse to coordinates, waits for hover UI to appear
`__keyPress`	`{ __keyPress: "Enter" }`	Sends keyboard events via Puppeteer (search submit, Escape)
`__navigate`	`{ __navigate: url }`	Navigates page at Puppeteer level (avoids race conditions)
`__followUp`	`{ __followUp: "funcName()" }`	Evaluates JavaScript after the intercepted action completes

Retry logic: On first failure, exec() re-injects helpers and retries once. This catches cases where a navigation cleared the injected functions.

URL re-check: After execution, if the URL changed but still matches the app's URL pattern, helpers are re-injected for the new page state.

Browser Mode Switches

Every generated MCP server supports two independent environment variables:

Variable	Values	Default
`BROWSER_MODE`	`visible`, `headless`	`visible`
`DATA_MODE`	`user`, `sandbox`	`user`

Mode combinations:

BROWSER_MODE	DATA_MODE	Use Case
`visible`	`user`	Default. Works in your running Chrome session
`visible`	`sandbox`	Testing — visible browser, clean profile
`headless`	`user`	Background automation with saved credentials
`headless`	`sandbox`	CI/testing — fully isolated

Set in .mcp.json:

{
  "env": {
    "BROWSER_MODE": "visible",
    "DATA_MODE": "user"
  }
}

Troubleshooting

Problem	Cause	Fix
`Target closed` during exploration	Chrome tab was closed	Skill auto-detects and offers to re-navigate
`Session closed` or `Protocol error`	CDP connection lost	Skill runs health check and relaunches Chrome
Tool returns `selector_not_found`	App UI changed since learning	Re-run `/learn-webapp` with "Validate & fix" option
`isTrusted` click failures	Widget rejects JS `.click()`	Skill auto-detects and uses `__clickCoords` pattern
Helpers undefined in browser	Function not exported	Ensure all functions in `commands.mjs` use `export`
CSP blocks script injection	App has strict Content Security Policy	Template uses CDP protocol methods which bypass CSP
Readback mismatch	Input method doesn't match element type	Switch between `setInputValue` and `setContentEditableValue`

Generated by AutoWebMCP.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning Manual — `/learn-webapp`

Table of Contents

Overview

Prerequisites

Quick Start

How It Works

Phase 0: Chrome CDP Setup

Phase 1: Initialization

Phase 2: Page Reconnaissance

Phase 3: Interaction Exploration

Phase 4: Operation Inference

Phase 4.5: User Approval Gate

Phase 5: Code Generation

Phase 6: Validation & Publication

Phase 7: Manual Tool Addition

Re-Learning Modes

Generated Server Architecture

commands.mjs — Command Library

index.mjs — MCP Server

manifest.json — Capabilities Metadata

Template Utilities Reference

exec() Interceptors

Browser Mode Switches

Troubleshooting

FilesExpand file tree

learning-manual.md

Latest commit

History

learning-manual.md

File metadata and controls

Learning Manual — /learn-webapp

Table of Contents

Overview

Prerequisites

Quick Start

How It Works

Phase 0: Chrome CDP Setup

Phase 1: Initialization

Phase 2: Page Reconnaissance

Phase 3: Interaction Exploration

Phase 4: Operation Inference

Phase 4.5: User Approval Gate

Phase 5: Code Generation

Phase 6: Validation & Publication

Phase 7: Manual Tool Addition

Re-Learning Modes

Generated Server Architecture

commands.mjs — Command Library

index.mjs — MCP Server

manifest.json — Capabilities Metadata

Template Utilities Reference

exec() Interceptors

Browser Mode Switches

Troubleshooting

Learning Manual — `/learn-webapp`