Skip to content

Commit c55d041

Browse files
apartsinclaude
andcommitted
Apply 20 robustness improvements from Facebook MCP session
Templates: - Fix offsetParent→offsetHeight visibility check for position:fixed elements - Broaden clickMenuItem to search both menuitem and button roles - Add findMenuItemByText, extractTextContent, retryWithFallback utilities - Add Lexical editor readback note to setContentEditableValue - Add session expiry detection to exec() (auth redirect detection) - Add --smoke-test CLI flag for CI integration - Document __hoverCoords/__followUp signal pattern in exec() JSDoc Skill (learn-webapp): - Add DOM portal detection (React.createPortal awareness) - Add click-to-activate element detection - Add editor framework detection (Lexical, ProseMirror, Quill, TinyMCE) - Add hover interaction exploration guidance - Add feed/card-based content discovery with getRepeatingContainers - Add hover-reveal and click-to-activate to interaction pattern table - Add element count heuristic for dynamic exploration budgets - Add parallel exploration guidance for complex apps - Add selector validation cache for faster validation - Enforce obfuscated DOM confidence penalty in manifest generation - Update _validate_all to handle __clickCoords/__hoverCoords tools - Update template utility reference table with new functions - Document session expiry detection and smoke-test in exec() enhancements Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 615c0cd commit c55d041

3 files changed

Lines changed: 419 additions & 28 deletions

File tree

skills/learn-webapp/SKILL.md

Lines changed: 214 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -482,13 +482,84 @@ have class names that match obfuscation patterns:
482482
- Structural selectors like `[role="main"] > div > div:first-child`
483483
4. Lower the default confidence for all tools by 0.05 since obfuscated apps
484484
are inherently harder to automate reliably
485-
5. Inform the user:
485+
5. **Enforce in manifest generation**: When writing `manifest.json`, automatically
486+
apply the -0.05 penalty to every operation's confidence score. Add
487+
`"domCharacteristics": { "obfuscatedClasses": true }` to the manifest
488+
6. Inform the user:
486489
```
487490
This app uses obfuscated CSS classes (e.g., compiled React/Angular output).
488491
CSS-based selectors will not be used. Tools will rely on ARIA attributes
489492
and text content matching, which may be less precise.
490493
```
491494

495+
### 2.3.4 DOM Portal Detection
496+
497+
Some React/Angular apps render interactive elements *outside* their logical parent
498+
containers using "portals" (React.createPortal). For example, Facebook renders the
499+
post composer's `[role="textbox"][contenteditable="true"]` element OUTSIDE the
500+
`[role="dialog"]` container, even though it visually appears inside the dialog.
501+
502+
**Detection**: During exploration, after opening a dialog/panel/overlay:
503+
1. Search for interactive elements WITHIN the dialog using `querySelectorWithin`
504+
2. If expected elements (text inputs, buttons) are NOT found within the dialog,
505+
search GLOBALLY via `querySelector` or `read_page`
506+
3. If elements are found globally but not within the dialog, flag the app as
507+
using DOM portals for that component:
508+
```json
509+
{
510+
"portalDetected": true,
511+
"component": "post composer",
512+
"dialogRef": "ref_X",
513+
"portalledElements": ["[role='textbox'][contenteditable='true']"],
514+
"note": "Textbox rendered outside dialog via React portal"
515+
}
516+
```
517+
518+
**Impact on code generation**: Operations for portal-using components MUST search
519+
globally (`document.querySelector(...)`) rather than scoping to the dialog
520+
(`dialog.querySelector(...)`). The exploration log entry guides code generation
521+
to use the correct scoping strategy.
522+
523+
### 2.3.5 Click-to-Activate Element Detection
524+
525+
Some interactive elements don't exist in the DOM until their container or
526+
placeholder is clicked. For example, Facebook's post composer textbox only
527+
appears after clicking the "What's on your mind?" placeholder area.
528+
529+
**Detection**: During element cataloging, note elements that:
530+
1. Have placeholder text suggesting interactivity (e.g., "What's on your mind?",
531+
"Search", "Type a message")
532+
2. Don't have an editable element (input/textarea/contenteditable) visible
533+
in the accessibility tree
534+
3. Show a new editable element after being clicked
535+
536+
**Procedure**:
537+
1. Identify placeholder/trigger elements by aria-placeholder or placeholder-like text
538+
2. Click the trigger element
539+
3. Re-read the page to find newly appeared elements
540+
4. Record the activation pattern:
541+
```json
542+
{
543+
"clickToActivate": true,
544+
"trigger": "[aria-placeholder*='on your mind']",
545+
"activatedElement": "[role='textbox'][contenteditable='true']",
546+
"delay": 500,
547+
"note": "Textbox appears in DOM only after clicking placeholder"
548+
}
549+
```
550+
551+
**Impact on code generation**: Operations that interact with click-to-activate
552+
elements must include the activation step before attempting to find/interact
553+
with the activated element. Pattern:
554+
```javascript
555+
// Click placeholder to activate the textbox
556+
const placeholder = querySelector(['[aria-placeholder*="on your mind"]']);
557+
if (placeholder) placeholder.click();
558+
await sleep(500);
559+
// Now find the activated element
560+
const textbox = querySelector(['[role="textbox"][contenteditable="true"]']);
561+
```
562+
492563
### 2.4 Save Reconnaissance Data
493564

494565
Write the region map and element catalog to `exploration/log.json` as the initial exploration state:
@@ -714,6 +785,8 @@ specific code template.
714785
| **dropdown-option** | Click opens listbox/menu, then click option | `el.click(); await sleep(300); optionEl.click()` |
715786
| **focus-type** | Focus element, then type via setInputValue | `el.focus(); setInputValue(el, value)` |
716787
| **contenteditable** | Focus contentEditable div, selectAll, insertText | `setContentEditableValue(el, value)` |
788+
| **hover-reveal** | Hover to reveal hidden UI, then interact | `return { __hoverCoords: {x,y}, __followUp: "..." }` |
789+
| **click-to-activate** | Click placeholder to make element appear, then interact | `placeholder.click(); await sleep(500); textbox = querySelector(...)` |
717790
| **multi-step-cascade** | Click triggers panel/dialog, interact within, confirm | `clickAndWait(trigger, panel); fillFields(); confirmBtn.click()` |
718791
| **toggle** | Click toggles state; read back aria-checked | `el.click(); return { state: el.getAttribute('aria-checked') }` |
719792

@@ -730,6 +803,71 @@ During Phase 5 code generation, use the appropriate code pattern for each elemen
730803
based on its classification. The `trusted-click` pattern is especially important —
731804
these elements MUST use `__clickCoords` or the generated tool will silently fail.
732805

806+
### Editor Framework Detection
807+
808+
During exploration, detect which rich text editor framework the app uses.
809+
Different editors require different input strategies and readback approaches.
810+
811+
**Detection procedure**: For each contenteditable element found, run:
812+
```javascript
813+
// Check for editor framework markers
814+
const markers = {
815+
lexical: !!el.querySelector('[data-lexical-editor]') || !!el.closest('[data-lexical-editor]'),
816+
proseMirror: !!el.querySelector('.ProseMirror') || el.classList?.contains('ProseMirror'),
817+
quill: !!el.querySelector('.ql-editor') || el.classList?.contains('ql-editor'),
818+
tinyMCE: !!el.querySelector('.tox-edit-area') || !!document.querySelector('.tox-tinymce'),
819+
draft: !!el.querySelector('[data-editor]') || !!el.closest('[data-contents="true"]'),
820+
};
821+
```
822+
823+
Record the detected framework in the exploration log:
824+
```json
825+
{
826+
"editorFramework": "lexical|proseMirror|quill|tinyMCE|draft|unknown",
827+
"inputStrategy": "setContentEditableValue with 300ms readback delay",
828+
"readbackMethod": "extractTextContent() — handles Lexical data-lexical-text spans"
829+
}
830+
```
831+
832+
**Framework-specific notes**:
833+
- **Lexical** (Facebook): Text stored in `<span data-lexical-text="true">` elements.
834+
`insertText` works but readback via `textContent` may be delayed. Use
835+
`extractTextContent()` with the element for reliable readback.
836+
- **ProseMirror** (Google Docs, Notion): Uses transaction-based updates. `insertText`
837+
works for simple text but may not trigger ProseMirror's update cycle for complex ops.
838+
- **Quill**: Standard contentEditable; `setContentEditableValue()` works reliably.
839+
- **Unknown**: Default to `setContentEditableValue()` with `sleep(300)` + readback.
840+
841+
### Hover Interaction Exploration
842+
843+
Some interactive elements reveal additional UI only when hovered for a duration.
844+
Common examples: Facebook's reaction bar (hover Like for 2s), tooltip menus,
845+
hover-activated dropdowns, and action buttons that appear on card hover.
846+
847+
**During Phase 3 exploration**:
848+
1. For elements with engagement-related labels (Like, React, Vote, Rate, Star),
849+
try hovering for 2 seconds using the `computer` tool with `hover` action
850+
2. After hovering, take a screenshot and `read_page` to detect new elements
851+
3. If new elements appeared (reaction picker, tooltip menu, action bar):
852+
- Record the hover trigger, delay, and revealed elements
853+
- Classify the interaction as `hover-reveal` pattern
854+
- Test if the revealed elements need trusted clicks or can use JS `.click()`
855+
4. Record in the exploration log:
856+
```json
857+
{
858+
"elementRef": "ref_X",
859+
"interactionPattern": "hover-reveal",
860+
"hoverDelay": 2000,
861+
"revealedElements": ["Love", "Care", "Haha", "Wow", "Sad", "Angry"],
862+
"codePattern": "__hoverCoords + __followUp"
863+
}
864+
```
865+
866+
**Impact on code generation**: Hover-reveal operations MUST use the `__hoverCoords`
867+
+ `__followUp` exec() signal pattern. The command returns coordinates for the hover
868+
target, and a `__followUp` function string that inspects the revealed UI and performs
869+
the desired action after the hover delay.
870+
733871
### Multi-step Workflow Discovery
734872

735873
After exploring individual elements, look for multi-step workflows:
@@ -781,6 +919,46 @@ a dashboard vs settings page, an inbox vs a compose view).
781919
belongs to. This helps generated `commands.mjs` include proper precondition
782920
checks (e.g., `if (!isEditor()) return { error: "Not in editor" }`).
783921

922+
### Feed & Card-Based Content Discovery
923+
924+
Many apps (Facebook, Twitter/X, Reddit, LinkedIn, news readers) display content
925+
in feed/card patterns without semantic `[role="feed"]` markup. These require
926+
anchor-based discovery using `getRepeatingContainers()`.
927+
928+
**Detection**: If the app's main content area contains repeating similar structures
929+
(posts, cards, items) but no `[role="feed"]` or `[role="list"]` element:
930+
1. Identify a common "anchor" element that exists in every card — typically an
931+
action button like `[aria-label="Like"]`, `[aria-label="Actions for this post"]`,
932+
or `[aria-label="Share"]`
933+
2. Use `getRepeatingContainers(anchorSelector, levels, verifySelector)` to walk up
934+
the DOM from each anchor and find the card container
935+
3. Test with different `levels` values (8-15) until you find the right container depth
936+
4. Use a `verifySelector` to confirm the container — another element that should exist
937+
in every card (e.g., a Like button + an Actions button in the same container)
938+
939+
**Recommended exploration approach**:
940+
```javascript
941+
// Find post containers by walking up from action buttons
942+
const posts = getRepeatingContainers(
943+
'[aria-label="Actions for this post"]', // anchor
944+
15, // max levels
945+
'[aria-label="Like"]' // verify: must also contain Like button
946+
);
947+
```
948+
949+
Record the discovery parameters in `exploration/log.json`:
950+
```json
951+
{
952+
"feedPattern": {
953+
"anchorSelector": "[aria-label=\"Actions for this post\"]",
954+
"maxLevels": 15,
955+
"verifySelector": "[aria-label=\"Like\"]",
956+
"containerCount": 4,
957+
"note": "No [role='feed'] present. Posts found via anchor-based walk-up."
958+
}
959+
}
960+
```
961+
784962
### Element Reference Stability
785963

786964
Element references (`ref_X`) from `read_page` become stale when:
@@ -853,10 +1031,20 @@ Before starting exploration, check if the app requires authentication:
8531031
8541032
### Exploration Budget
8551033
856-
- Explore at minimum 15-20 interactive elements
1034+
- **Element count heuristic**: Scale the exploration budget to the app's complexity:
1035+
- < 15 interactive elements: Explore all. Budget: 20-30 tool calls
1036+
- 15-30 elements: Standard budget. Explore 15-20 elements, 30-60 tool calls
1037+
- 30-50 elements: Extended budget. Explore 20-30 elements, 60-90 tool calls
1038+
- 50+ elements: Focus on primary regions first. Use parallel exploration agents
8571039
- Explore at minimum 2-3 complete multi-step workflows
8581040
- Stop when you've covered all major regions and primary actions
8591041
- Skip purely decorative or repetitive elements (e.g., 50 identical list items — explore 1-2)
1042+
- **Parallel exploration**: For apps with 5+ independent regions (e.g., Facebook has
1043+
header bar, sidebar, feed, composer, chat), use the Agent tool to explore 2-3
1044+
independent regions concurrently. Each agent explores its assigned region and
1045+
returns the exploration results. Merge results into `exploration/log.json`.
1046+
Only parallelize regions that don't affect each other's state (e.g., sidebar
1047+
navigation changes the content area, so those are NOT independent).
8601048
- **Time budget**: Aim to complete exploration within 30-60 tool calls. If the app
8611049
is very complex, focus on the most important regions first and note unexplored
8621050
areas in the report.
@@ -1142,7 +1330,10 @@ Read the following template files:
11421330
| `clickByAriaLabel(label)` | Click element by aria-label | Quick click when aria-label is known |
11431331
| `findButtonByText(text)` | Find `<button>` or `[role="button"]` by text | Buttons without aria-labels |
11441332
| `findElementByText(role, text, opts)` | Find element by role + text content | Elements without aria-labels |
1145-
| `clickMenuItem(itemText)` | Click `[role="menuitem"]` by text | Menu interactions |
1333+
| `clickMenuItem(itemText)` | Click menu item by text (searches menuitem + button roles) | Menu interactions |
1334+
| `findMenuItemByText(text)` | Find menu item element without clicking | When you need __clickCoords for a menu item |
1335+
| `extractTextContent(el, opts)` | Extract text from any editor (Lexical, ProseMirror, plain) | Reading back values from rich text editors |
1336+
| `retryWithFallback(primary, fallback)` | Try primary action, fallback on failure | Resilient tool implementations |
11461337
| `clickAndWait(clickSel, waitSel, timeout)` | Click then wait for result element | Toolbar button → dialog pattern |
11471338
| `getPageState()` | Return diagnostic page info | Precondition checks, debugging |
11481339
| `navigateTo(url)` | Navigate via `__navigate` signal | In-app page transitions |
@@ -1170,6 +1361,11 @@ Read the following template files:
11701361
- `__followUp` — optional function string to evaluate after a trusted click/hover/key
11711362
- Single retry with helper re-injection on failure
11721363
- Post-execution URL re-check for in-app navigation
1364+
- Session expiry detection — if post-execution URL matches auth patterns
1365+
(`/login`, `/signin`, `/auth`, `/checkpoint`), returns `state_error` with
1366+
`session_expired` hint instead of a cryptic selector failure
1367+
- `--smoke-test` CLI flag — run `node index.mjs --smoke-test` to verify
1368+
connectivity and helper injection without starting the MCP server
11731369

11741370
### 5.2 Generate Command Library
11751371

@@ -1775,6 +1971,12 @@ probe (Phase 0.5). If the browser is unreachable, relaunch before proceeding.
17751971
- SKIP (inferred but untested) → confidence 0.65-0.75
17761972
- FAIL then fixed → confidence 0.75-0.85
17771973

1974+
**Selector validation cache**: Track which selectors have been confirmed working
1975+
during validation. If multiple tools share the same selector (e.g., several tools
1976+
use `[aria-label="Add question"]`), only validate that selector once. When a
1977+
selector is confirmed, mark it in a set and skip re-validation in subsequent tools.
1978+
This can reduce validation time by 20-30% for tools with overlapping selectors.
1979+
17781980
**Time budget**: Aim to validate each tool in 1-2 browser interactions. For a
17791981
30-tool server, this should take ~30-60 tool calls total.
17801982

@@ -1816,13 +2018,22 @@ export async function _validate_all() {
18162018
Rules for generated validation:
18172019
- **Query tools**: Call with no args or safe defaults, expect `success: true`
18182020
- **Mutation tools**: Capture current state → mutate → verify → restore original
2021+
- **`__clickCoords` tools**: Tools that return `__clickCoords` (trusted click pattern)
2022+
cannot be validated via `page.evaluate()` alone — they need the full `exec()` pipeline.
2023+
The `_validate_all()` function should call these tools through the MCP server's own
2024+
`exec()` function, or mark them as requiring manual validation
2025+
- **`__hoverCoords` tools**: Similarly, hover-reveal tools need the full exec() pipeline.
2026+
Mark these for manual validation or call through exec()
18192027
- **Skip list**: Tools that create permanent side effects (e.g., `publish_site`,
18202028
`delete_page`) are skipped and listed in the report
18212029
- **Output**: Return pass/fail summary; export for use via `run_script`
18222030
18232031
This function is NOT registered as an MCP tool — it's an internal diagnostic
18242032
callable through `run_script` for regression testing after app updates.
18252033
2034+
The generated server also supports `--smoke-test` mode: run `node index.mjs --smoke-test`
2035+
to verify basic connectivity and helper injection without starting the MCP server.
2036+
18262037
### 6.1.2 End-to-End Test Task (Mandatory)
18272038
18282039
After per-tool validation, create a **test task** that exercises the MCP server as a

0 commit comments

Comments
 (0)