@@ -482,13 +482,84 @@ have class names that match obfuscation patterns:
482482 - Structural selectors like ` [role="main"] > div > div:first-child `
4834834 . Lower the default confidence for all tools by 0.05 since obfuscated apps
484484 are inherently harder to automate reliably
485- 5 . Inform the user:
485+ 5 . ** Enforce in manifest generation** : When writing ` manifest.json ` , automatically
486+ apply the -0.05 penalty to every operation's confidence score. Add
487+ ` "domCharacteristics": { "obfuscatedClasses": true } ` to the manifest
488+ 6 . Inform the user:
486489 ```
487490 This app uses obfuscated CSS classes (e.g., compiled React/Angular output).
488491 CSS-based selectors will not be used. Tools will rely on ARIA attributes
489492 and text content matching, which may be less precise.
490493 ```
491494
495+ ### 2.3.4 DOM Portal Detection
496+
497+ Some React/Angular apps render interactive elements * outside* their logical parent
498+ containers using "portals" (React.createPortal). For example, Facebook renders the
499+ post composer's ` [role="textbox"][contenteditable="true"] ` element OUTSIDE the
500+ ` [role="dialog"] ` container, even though it visually appears inside the dialog.
501+
502+ ** Detection** : During exploration, after opening a dialog/panel/overlay:
503+ 1 . Search for interactive elements WITHIN the dialog using ` querySelectorWithin `
504+ 2 . If expected elements (text inputs, buttons) are NOT found within the dialog,
505+ search GLOBALLY via ` querySelector ` or ` read_page `
506+ 3 . If elements are found globally but not within the dialog, flag the app as
507+ using DOM portals for that component:
508+ ``` json
509+ {
510+ "portalDetected" : true ,
511+ "component" : " post composer" ,
512+ "dialogRef" : " ref_X" ,
513+ "portalledElements" : [" [role='textbox'][contenteditable='true']" ],
514+ "note" : " Textbox rendered outside dialog via React portal"
515+ }
516+ ```
517+
518+ ** Impact on code generation** : Operations for portal-using components MUST search
519+ globally (` document.querySelector(...) ` ) rather than scoping to the dialog
520+ (` dialog.querySelector(...) ` ). The exploration log entry guides code generation
521+ to use the correct scoping strategy.
522+
523+ ### 2.3.5 Click-to-Activate Element Detection
524+
525+ Some interactive elements don't exist in the DOM until their container or
526+ placeholder is clicked. For example, Facebook's post composer textbox only
527+ appears after clicking the "What's on your mind?" placeholder area.
528+
529+ ** Detection** : During element cataloging, note elements that:
530+ 1 . Have placeholder text suggesting interactivity (e.g., "What's on your mind?",
531+ "Search", "Type a message")
532+ 2 . Don't have an editable element (input/textarea/contenteditable) visible
533+ in the accessibility tree
534+ 3 . Show a new editable element after being clicked
535+
536+ ** Procedure** :
537+ 1 . Identify placeholder/trigger elements by aria-placeholder or placeholder-like text
538+ 2 . Click the trigger element
539+ 3 . Re-read the page to find newly appeared elements
540+ 4 . Record the activation pattern:
541+ ``` json
542+ {
543+ "clickToActivate" : true ,
544+ "trigger" : " [aria-placeholder*='on your mind']" ,
545+ "activatedElement" : " [role='textbox'][contenteditable='true']" ,
546+ "delay" : 500 ,
547+ "note" : " Textbox appears in DOM only after clicking placeholder"
548+ }
549+ ```
550+
551+ ** Impact on code generation** : Operations that interact with click-to-activate
552+ elements must include the activation step before attempting to find/interact
553+ with the activated element. Pattern:
554+ ``` javascript
555+ // Click placeholder to activate the textbox
556+ const placeholder = querySelector ([' [aria-placeholder*="on your mind"]' ]);
557+ if (placeholder) placeholder .click ();
558+ await sleep (500 );
559+ // Now find the activated element
560+ const textbox = querySelector ([' [role="textbox"][contenteditable="true"]' ]);
561+ ```
562+
492563### 2.4 Save Reconnaissance Data
493564
494565Write the region map and element catalog to ` exploration/log.json ` as the initial exploration state:
@@ -714,6 +785,8 @@ specific code template.
714785| ** dropdown-option** | Click opens listbox/menu, then click option | ` el.click(); await sleep(300); optionEl.click() ` |
715786| ** focus-type** | Focus element, then type via setInputValue | ` el.focus(); setInputValue(el, value) ` |
716787| ** contenteditable** | Focus contentEditable div, selectAll, insertText | ` setContentEditableValue(el, value) ` |
788+ | ** hover-reveal** | Hover to reveal hidden UI, then interact | ` return { __hoverCoords: {x,y}, __followUp: "..." } ` |
789+ | ** click-to-activate** | Click placeholder to make element appear, then interact | ` placeholder.click(); await sleep(500); textbox = querySelector(...) ` |
717790| ** multi-step-cascade** | Click triggers panel/dialog, interact within, confirm | ` clickAndWait(trigger, panel); fillFields(); confirmBtn.click() ` |
718791| ** toggle** | Click toggles state; read back aria-checked | ` el.click(); return { state: el.getAttribute('aria-checked') } ` |
719792
@@ -730,6 +803,71 @@ During Phase 5 code generation, use the appropriate code pattern for each elemen
730803based on its classification. The ` trusted-click ` pattern is especially important —
731804these elements MUST use ` __clickCoords ` or the generated tool will silently fail.
732805
806+ ### Editor Framework Detection
807+
808+ During exploration, detect which rich text editor framework the app uses.
809+ Different editors require different input strategies and readback approaches.
810+
811+ ** Detection procedure** : For each contenteditable element found, run:
812+ ``` javascript
813+ // Check for editor framework markers
814+ const markers = {
815+ lexical: !! el .querySelector (' [data-lexical-editor]' ) || !! el .closest (' [data-lexical-editor]' ),
816+ proseMirror: !! el .querySelector (' .ProseMirror' ) || el .classList ? .contains (' ProseMirror' ),
817+ quill: !! el .querySelector (' .ql-editor' ) || el .classList ? .contains (' ql-editor' ),
818+ tinyMCE: !! el .querySelector (' .tox-edit-area' ) || !! document .querySelector (' .tox-tinymce' ),
819+ draft: !! el .querySelector (' [data-editor]' ) || !! el .closest (' [data-contents="true"]' ),
820+ };
821+ ```
822+
823+ Record the detected framework in the exploration log:
824+ ``` json
825+ {
826+ "editorFramework" : " lexical|proseMirror|quill|tinyMCE|draft|unknown" ,
827+ "inputStrategy" : " setContentEditableValue with 300ms readback delay" ,
828+ "readbackMethod" : " extractTextContent() — handles Lexical data-lexical-text spans"
829+ }
830+ ```
831+
832+ ** Framework-specific notes** :
833+ - ** Lexical** (Facebook): Text stored in ` <span data-lexical-text="true"> ` elements.
834+ ` insertText ` works but readback via ` textContent ` may be delayed. Use
835+ ` extractTextContent() ` with the element for reliable readback.
836+ - ** ProseMirror** (Google Docs, Notion): Uses transaction-based updates. ` insertText `
837+ works for simple text but may not trigger ProseMirror's update cycle for complex ops.
838+ - ** Quill** : Standard contentEditable; ` setContentEditableValue() ` works reliably.
839+ - ** Unknown** : Default to ` setContentEditableValue() ` with ` sleep(300) ` + readback.
840+
841+ ### Hover Interaction Exploration
842+
843+ Some interactive elements reveal additional UI only when hovered for a duration.
844+ Common examples: Facebook's reaction bar (hover Like for 2s), tooltip menus,
845+ hover-activated dropdowns, and action buttons that appear on card hover.
846+
847+ ** During Phase 3 exploration** :
848+ 1 . For elements with engagement-related labels (Like, React, Vote, Rate, Star),
849+ try hovering for 2 seconds using the ` computer ` tool with ` hover ` action
850+ 2 . After hovering, take a screenshot and ` read_page ` to detect new elements
851+ 3 . If new elements appeared (reaction picker, tooltip menu, action bar):
852+ - Record the hover trigger, delay, and revealed elements
853+ - Classify the interaction as ` hover-reveal ` pattern
854+ - Test if the revealed elements need trusted clicks or can use JS ` .click() `
855+ 4 . Record in the exploration log:
856+ ``` json
857+ {
858+ "elementRef" : " ref_X" ,
859+ "interactionPattern" : " hover-reveal" ,
860+ "hoverDelay" : 2000 ,
861+ "revealedElements" : [" Love" , " Care" , " Haha" , " Wow" , " Sad" , " Angry" ],
862+ "codePattern" : " __hoverCoords + __followUp"
863+ }
864+ ```
865+
866+ ** Impact on code generation** : Hover-reveal operations MUST use the ` __hoverCoords `
867+ + ` __followUp ` exec() signal pattern. The command returns coordinates for the hover
868+ target, and a ` __followUp ` function string that inspects the revealed UI and performs
869+ the desired action after the hover delay.
870+
733871### Multi-step Workflow Discovery
734872
735873After exploring individual elements, look for multi-step workflows:
@@ -781,6 +919,46 @@ a dashboard vs settings page, an inbox vs a compose view).
781919 belongs to. This helps generated ` commands.mjs ` include proper precondition
782920 checks (e.g., ` if (!isEditor()) return { error: "Not in editor" } ` ).
783921
922+ ### Feed & Card-Based Content Discovery
923+
924+ Many apps (Facebook, Twitter/X, Reddit, LinkedIn, news readers) display content
925+ in feed/card patterns without semantic ` [role="feed"] ` markup. These require
926+ anchor-based discovery using ` getRepeatingContainers() ` .
927+
928+ ** Detection** : If the app's main content area contains repeating similar structures
929+ (posts, cards, items) but no ` [role="feed"] ` or ` [role="list"] ` element:
930+ 1 . Identify a common "anchor" element that exists in every card — typically an
931+ action button like ` [aria-label="Like"] ` , ` [aria-label="Actions for this post"] ` ,
932+ or ` [aria-label="Share"] `
933+ 2 . Use ` getRepeatingContainers(anchorSelector, levels, verifySelector) ` to walk up
934+ the DOM from each anchor and find the card container
935+ 3 . Test with different ` levels ` values (8-15) until you find the right container depth
936+ 4 . Use a ` verifySelector ` to confirm the container — another element that should exist
937+ in every card (e.g., a Like button + an Actions button in the same container)
938+
939+ ** Recommended exploration approach** :
940+ ``` javascript
941+ // Find post containers by walking up from action buttons
942+ const posts = getRepeatingContainers (
943+ ' [aria-label="Actions for this post"]' , // anchor
944+ 15 , // max levels
945+ ' [aria-label="Like"]' // verify: must also contain Like button
946+ );
947+ ```
948+
949+ Record the discovery parameters in ` exploration/log.json ` :
950+ ``` json
951+ {
952+ "feedPattern" : {
953+ "anchorSelector" : " [aria-label=\" Actions for this post\" ]" ,
954+ "maxLevels" : 15 ,
955+ "verifySelector" : " [aria-label=\" Like\" ]" ,
956+ "containerCount" : 4 ,
957+ "note" : " No [role='feed'] present. Posts found via anchor-based walk-up."
958+ }
959+ }
960+ ```
961+
784962### Element Reference Stability
785963
786964Element references (` ref_X ` ) from ` read_page ` become stale when:
@@ -853,10 +1031,20 @@ Before starting exploration, check if the app requires authentication:
8531031
8541032### Exploration Budget
8551033
856- - Explore at minimum 15-20 interactive elements
1034+ - **Element count heuristic**: Scale the exploration budget to the app's complexity:
1035+ - < 15 interactive elements: Explore all. Budget: 20-30 tool calls
1036+ - 15-30 elements: Standard budget. Explore 15-20 elements, 30-60 tool calls
1037+ - 30-50 elements: Extended budget. Explore 20-30 elements, 60-90 tool calls
1038+ - 50+ elements: Focus on primary regions first. Use parallel exploration agents
8571039- Explore at minimum 2-3 complete multi-step workflows
8581040- Stop when you've covered all major regions and primary actions
8591041- Skip purely decorative or repetitive elements (e.g., 50 identical list items — explore 1-2)
1042+ - **Parallel exploration**: For apps with 5+ independent regions (e.g., Facebook has
1043+ header bar, sidebar, feed, composer, chat), use the Agent tool to explore 2-3
1044+ independent regions concurrently. Each agent explores its assigned region and
1045+ returns the exploration results. Merge results into `exploration/log.json`.
1046+ Only parallelize regions that don't affect each other's state (e.g., sidebar
1047+ navigation changes the content area, so those are NOT independent).
8601048- **Time budget**: Aim to complete exploration within 30-60 tool calls. If the app
8611049 is very complex, focus on the most important regions first and note unexplored
8621050 areas in the report.
@@ -1142,7 +1330,10 @@ Read the following template files:
11421330| ` clickByAriaLabel(label) ` | Click element by aria-label | Quick click when aria-label is known |
11431331| ` findButtonByText(text) ` | Find ` <button> ` or ` [role="button"] ` by text | Buttons without aria-labels |
11441332| ` findElementByText(role, text, opts) ` | Find element by role + text content | Elements without aria-labels |
1145- | ` clickMenuItem(itemText) ` | Click ` [role="menuitem"] ` by text | Menu interactions |
1333+ | ` clickMenuItem(itemText) ` | Click menu item by text (searches menuitem + button roles) | Menu interactions |
1334+ | ` findMenuItemByText(text) ` | Find menu item element without clicking | When you need __ clickCoords for a menu item |
1335+ | ` extractTextContent(el, opts) ` | Extract text from any editor (Lexical, ProseMirror, plain) | Reading back values from rich text editors |
1336+ | ` retryWithFallback(primary, fallback) ` | Try primary action, fallback on failure | Resilient tool implementations |
11461337| ` clickAndWait(clickSel, waitSel, timeout) ` | Click then wait for result element | Toolbar button → dialog pattern |
11471338| ` getPageState() ` | Return diagnostic page info | Precondition checks, debugging |
11481339| ` navigateTo(url) ` | Navigate via ` __navigate ` signal | In-app page transitions |
@@ -1170,6 +1361,11 @@ Read the following template files:
11701361- ` __followUp ` — optional function string to evaluate after a trusted click/hover/key
11711362- Single retry with helper re-injection on failure
11721363- Post-execution URL re-check for in-app navigation
1364+ - Session expiry detection — if post-execution URL matches auth patterns
1365+ (` /login ` , ` /signin ` , ` /auth ` , ` /checkpoint ` ), returns ` state_error ` with
1366+ ` session_expired ` hint instead of a cryptic selector failure
1367+ - ` --smoke-test ` CLI flag — run ` node index.mjs --smoke-test ` to verify
1368+ connectivity and helper injection without starting the MCP server
11731369
11741370### 5.2 Generate Command Library
11751371
@@ -1775,6 +1971,12 @@ probe (Phase 0.5). If the browser is unreachable, relaunch before proceeding.
17751971 - SKIP (inferred but untested) → confidence 0.65-0.75
17761972 - FAIL then fixed → confidence 0.75-0.85
17771973
1974+ **Selector validation cache**: Track which selectors have been confirmed working
1975+ during validation. If multiple tools share the same selector (e.g., several tools
1976+ use `[aria-label="Add question"]`), only validate that selector once. When a
1977+ selector is confirmed, mark it in a set and skip re-validation in subsequent tools.
1978+ This can reduce validation time by 20-30% for tools with overlapping selectors.
1979+
17781980**Time budget**: Aim to validate each tool in 1-2 browser interactions. For a
1779198130-tool server, this should take ~30-60 tool calls total.
17801982
@@ -1816,13 +2018,22 @@ export async function _validate_all() {
18162018Rules for generated validation:
18172019- **Query tools**: Call with no args or safe defaults, expect ` success: true `
18182020- **Mutation tools**: Capture current state → mutate → verify → restore original
2021+ - **` __clickCoords` tools**: Tools that return ` __clickCoords` (trusted click pattern)
2022+ cannot be validated via ` page .evaluate ()` alone — they need the full ` exec ()` pipeline.
2023+ The ` _validate_all ()` function should call these tools through the MCP server's own
2024+ ` exec ()` function, or mark them as requiring manual validation
2025+ - **` __hoverCoords` tools**: Similarly, hover-reveal tools need the full exec() pipeline.
2026+ Mark these for manual validation or call through exec()
18192027- **Skip list**: Tools that create permanent side effects (e.g., ` publish_site` ,
18202028 ` delete_page` ) are skipped and listed in the report
18212029- **Output**: Return pass/fail summary; export for use via ` run_script`
18222030
18232031This function is NOT registered as an MCP tool — it's an internal diagnostic
18242032callable through ` run_script` for regression testing after app updates.
18252033
2034+ The generated server also supports ` -- smoke- test` mode: run ` node index .mjs -- smoke- test`
2035+ to verify basic connectivity and helper injection without starting the MCP server.
2036+
18262037### 6.1.2 End-to-End Test Task (Mandatory)
18272038
18282039After per-tool validation, create a **test task** that exercises the MCP server as a
0 commit comments