feat(web): add web plugin with browser automation and page analysis skills by catalan-adobe · Pull Request #164 · adobe/skills

catalan-adobe · 2026-05-30T10:26:08Z

Summary

Adds a new top-level plugins/web plugin containing 10 browser automation and page analysis skills that work with any website (not AEM-specific). These started as personal skills and are contributed here as a reusable, generic web tooling layer.

Skills added:

browser-universal - dispatch layer: detects playwright-cli / Playwright MCP / cmux-browser / CDP and loads the right commands
cdp-connect - zero-dependency Chrome CDP control via Node 22 built-in WebSocket; no Playwright required
cdp-ext-pilot - launch Chrome with an unpacked extension loaded, test sidepanel/popup/options via CDP
browser-probe - escalation ladder to detect CDN bot protection (Akamai, Cloudflare, DataDome, AWS WAF) and emit a browser-recipe.json
page-prep - detect and dismiss overlays (cookie banners, GDPR consent, modals, paywalls) using a 300+ CMP DB
page-tree - capture the spatial DOM hierarchy with positions, z-ordering, overlay detection, and a nodeMap of CSS selectors
page-reduce - two-phase page-to-skeleton tokenizer producing skeleton.html + manifest.json
page-collect - extract icons/SVGs, metadata, text, forms, videos, social links from any webpage
domain-mask - HTTPS reverse proxy behind a custom domain (mkcert + /etc/hosts) for clean demo recordings
page-langs - detect all languages used on a webpage: combines declared structural signals (html@lang, hreflang alternates, nested lang=, meta content-language) with Google CLD3 content detection (cld3-asm WASM); reconciles declared vs detected to surface mismatches. First skill in the plugin with a runtime npm dependency (cld3-asm@4.0.0).

What not in this PR (deferred): migrate-header and brand-setup will follow in a separate PR into plugins/aem/edge-delivery-services/.

Quality gates (all passing locally)

npm run validate - all 10 web skills pass skills-ref validation ✅
tessl skill lint plugins/web - tile adobe/web@1.0.0 is valid ✅
tessl skill review - all 10 skills score >=90%: browser-universal 100%, cdp-connect 100%, page-collect 100%, page-langs 95%, cdp-ext-pilot 97%, browser-probe 94%, page-tree 94%, page-reduce 94%, page-prep 90%, domain-mask 90% ✅
license: Apache-2.0 present in all 10 SKILL.md frontmatter ✅
CODEOWNERS entry added for /plugins/web ✅

Notes for reviewers

category: web in marketplace.json is a new category value. If a different value is preferred, easy to adjust.
CODEOWNERS handle is @catalan-adobe - please confirm this is the correct GitHub handle or update before merge.
playwright-cli is an external binary - browser-universal gracefully falls back to Playwright MCP, cmux-browser, or CDP if not on PATH.
page-collect requires a one-time npm install in its scripts/ directory to pin Playwright 1.59.1.
page-langs requires a one-time npm install --prefix plugins/web/skills/page-langs for cld3-asm@4.0.0 (WASM, model bundled, no native build). See references/output-schema.md for the vendoring fallback.
Evals are intentionally minimal for the first PR. Maintainers can trigger with eval: commit prefix.
The page-reduce (~100KB) and page-tree (~21KB) bundles are pre-built source artifacts - no build step required.

Test plan

npm run validate in CI - should pass for all new skills
tessl skill review in CI - all new skills should score >=80%
tessl skill lint in CI - tile adobe/web@1.0.0 should be valid
Install locally and confirm browser-universal activates on browser-related prompts
For page-langs: npm install --prefix plugins/web/skills/page-langs, then playwright-cli open <url> && playwright-cli run-code --filename=scripts/collect.js | node scripts/detect.mjs

Generated with Claude Code

…kills Introduces a new top-level plugins/web plugin containing 9 skills for interacting with arbitrary webpages — browser layer detection, Chrome CDP control, CDN bot probing, overlay dismissal, spatial DOM capture, page skeleton reduction, and structured resource extraction. Skills included: - browser-universal: dispatch layer detecting playwright-cli / MCP / cmux / CDP - cdp-connect: zero-dep Chrome CDP control via Node 22 built-in WebSocket - cdp-ext-pilot: launch Chrome with unpacked extensions, test via CDP - browser-probe: escalation ladder to detect CDN bot protection, emit recipe - page-prep: dismiss cookie/GDPR/modal overlays via CMP DB + heuristics - visual-tree: capture spatial DOM hierarchy as structured tree + nodeMap - reduce-page: two-phase page-to-skeleton tokenizer (inject + LLM reasoning) - page-collect: extract icons, metadata, text, forms, videos, social links - domain-mask: HTTPS reverse proxy for masking URLs during demos All skills: license Apache-2.0, pass skills-ref validate, score ≥90% on tessl skill review, tessl lint shows tile as valid. Full release plumbing included (tile.json, per-skill package.json + .releaserc.json + evals). Deferred: migrate-header and brand-setup (heavy AEM/EDS coupling) will follow in a separate PR targeting plugins/aem/edge-delivery-services. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-30T10:26:20Z

Tessl Skill Lint

✅ web — clean

✅ All 1 tile(s) clean.

Updated by tessl-lint for commit 1edc485.

trieloff · 2026-05-30T10:30:14Z

+
+## Always verify with Context7
+
+When using any playwright-cli command — especially unfamiliar flags, subcommands, or argument syntax — look up the current docs via Context7 before writing code. Search Context7 for "Playwright CLI". Do not guess from codebase patterns or memory; the CLI evolves and has non-obvious conventions (e.g., `screenshot` takes a ref not a CSS selector, `eval` is expression-only, `--raw` strips envelope formatting).


Maybe using --help would eliminate one dependency

@trieloff, my bad I should have open the PR as a draft from the very start. You caught that too early :)
I had a second run on this new plugin, streamlining the overall browser interaction stack. All skills now default to playwright-cli, no more browser-universal skill (it was weak overall) and I generalized --help trick.

Drop browser-universal (multi-layer dispatcher with Context7 dependency, solving a problem no other plugin in the repo has). Remove the Playwright npm dependency from page-collect by refactoring it to playwright-cli. Changes: - Delete browser-universal skill and its references/ (LAYERS.md, playwright-cli.md) - page-prep: replace 3 browser-tool sections with a single playwright-cli eval pattern; bundle is a verified IIFE expression - reduce-page: replace browser-universal detection with direct playwright-cli open + initScript/bootstrap pattern to handle async detectSections() - visual-tree: fix compatibility field (drop browser-universal reference) - browser-probe: add playwright-cli --help pointer - page-collect: replace programmatic Playwright (chromium.launch + 7 files with npm dep) with playwright-cli-based orchestrator; extract all in-page logic to new page-collect-bundle.js IIFE; keep Node-side classify/ deriveName/optimizeSvg unchanged; delete 6 collect-*.js files and scripts/package.json Result: entire plugin's browser story is playwright-cli on PATH. CDP skills (cdp-connect, cdp-ext-pilot) remain as-is — they are the CDP layer, add no install burden. All 8 skills pass skills-ref validate; tessl lint reports 0 errors; all reviewed skills score ≥94%. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Eval assertions across all 8 skills used node -e "require(...)" or import() with backwards catch logic to check syntax. Browser bundle scripts (visual-tree, reduce-page, page-collect) are browser-only IIFEs — require() crashes with ReferenceError: window is not defined. ESM scripts (browser-probe, cdp-ext-pilot) fail with SyntaxError in CJS require(). Replace all with node --check which parses for syntax validity without executing, works for any JS file regardless of module system or browser/Node target. Also: fix page-collect eval to use a real URL with SVG icons (adobe.com) and the correct output path (page-collect-output/icons). Fix stripEnvelope() in page-collect.js to use lastIndexOf instead of indexOf for the trailing marker — indexOf silently truncates the JSON payload if page content contains the "### Ran Playwright code" string. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

- tile.json: align version to 1.0.0 (was 0.1.0, mismatched plugin.json and marketplace.json which both said 1.0.0) - page-prep/references/known-patterns.md: replace stale page.evaluate() API reference with playwright-cli eval - reduce-page/SKILL.md: remove duplicate "open URL" from Step 2 (Step 3 already opens via playwright-cli with --config; running open twice would start a session without the injected bundle) - page-collect.js: add timeout (60s open, 30s run-code, 10s screenshot) to all spawnSync calls; add ETIMEDOUT and ENOBUFS error messages with actionable context; surface screenshot failures as warnings instead of silently discarding them Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Code fixes: - page-collect-bundle.js: add missing await on res.text() so mid-body network failures are caught by the surrounding try/catch rather than propagating as unhandled rejections through Promise.all - page-collect.js: remove unused mkdirSync import; remove duplicate icons.json write in the icons subcommand branch (writeIcons already writes it) Documentation fixes: - browser-probe, page-prep, reduce-page SKILL.md: add compatibility frontmatter declaring playwright-cli on PATH requirement (visual-tree and page-collect already had it) - visual-tree, reduce-page, page-collect SKILL.md: add external content warning (matches cdp-connect, cdp-ext-pilot, page-prep) - visual-tree SKILL.md: add URL="<target URL>" declaration to code block; fix "playwright-cli help" → "playwright-cli --help" - reduce-page SKILL.md: add URL="<target URL>" declaration to code block - page-collect SKILL.md: add screenshot.jpg to the "all" subcommand output table Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Tessl review flagged the 'What the proxy does' section as implementation details the agent doesn't need to run the skill. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Tessl review flagged verbose prose in interpretation rules and repetition in the consumer usage section. Replaces 9-bullet prose with a signal table; collapses 4-step consumer walkthrough to 3 lines. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…able Tessl content feedback: signal table duplicates stealth-config.md Provider Signature Table; config mapping table was verbose; When to Use was wordy. - Drop signal table from Step 3, point to stealth-config.md reference - Collapse config mapping to 4 columns (no repeated long strings) - Trim When to Use from 4 bullets to one line Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Tessl progressive disclosure feedback: Tips section too long for inline. Keep critical non-obvious tips inline (React inputs, port conflicts). Move context/target/content-script/load-error tips to new references/troubleshooting.md. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Tessl conciseness feedback: Watch Mode JS lengthy, format sections overlap, Injecting the Bundle repeats Step 4. - Remove Injecting the Bundle section (duplicate of Step 4) - Trim Recipe Manifest Format JSON (drop verbose comments/js field) - Move Watch Mode JS to references/watch-mode.md, replace with 4-line summary Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

… rationale Tessl conciseness feedback: source field re-explained in Steps 5, 6, and 8; Step 9b closing rationale paragraph is self-evident. - Steps 6/8: drop parenthetical source re-definitions (Step 5 defines them once) - Step 9b: remove 4-line closing rationale paragraph Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Tessl conciseness feedback: Detection Report and Recipe Manifest format blocks add bulk that belongs in a reference file. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Tessl conciseness feedback: bundle description, Step 9a rationale, and Step 9b DOM-vs-screenshot rationale explain things Claude can infer. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Tessl conciseness feedback: database description, IIFE explanation, quick-mode re-narration after Step 8, Step 9 mode-restatement intro, and two mode tips duplicate the mode table defined once at the top. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…ove token table Tessl conciseness feedback: 'Why this pattern' is restated by the bash comment; Phase 1 JSON example too detailed; token table duplicates PHASE2-RULES.md Token Vocabulary section. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Pipeline lists 6 internal function names — implementation detail not needed to run the skill; output behavior is described in Step 3 already. Tip about text vs nodeMap format repeats Step 3 output descriptions. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Aligns with the page-* naming convention used by page-prep, page-collect, and browser-* / domain-* groupings. Both skills operate on pages, the new names make that clear at a glance. Renames: directories, bundle files, name frontmatter, tile.json keys, package.json names, evals.json skill_name, and all path references in SKILL.md. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…ssing import.meta.url `path.resolve` does not follow symlinks, so invoking the script through a `.claude/skills/` directory symlink caused `process.argv[1]` and `import.meta.url` to differ, silently skipping `main()`. Switch to `realpathSync` to canonicalise both sides of the comparison. Wrap in try/catch so non-standard runtimes where `import.meta.url` is unavailable fall back to `isMain = true` instead of silently doing nothing. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…ext found Extensions without content_scripts have no injection point for the open_side_panel message workaround. chrome.sidePanel.open() requires a user gesture enforced at the browser process level — there is no CDP command to bypass this. When the content script context times out, open the sidepanel URL as a chrome-extension:// tab instead (same pattern used by popup/options), emit a warning explaining the context difference, and include context:"tab" in the JSON output so callers can detect it. Also document the behaviour and the underlying Chrome API constraint in references/troubleshooting.md. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

The previous instruction used two wrong patterns: - Passed the file path as a positional argument, which playwright-cli interprets as a CSS selector (causing "Unexpected token while parsing css selector" error) - Used /tmp/ as the output path, which is outside playwright-cli's allowed write roots (project root and .playwright-cli/) Fix: use --filename flag with a .playwright-cli/-relative path, add the -s <session> flag, and document both constraints explicitly. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

The check used offsetWidth/offsetHeight to filter trivial elements but didn't verify the element was actually within the viewport. Off-screen fixed elements (e.g. slide-in panels in their closed state) with high z-index were flagged as false positives. Add getBoundingClientRect() bounds guard to exclude elements fully outside the visible viewport. Also wrap result in JSON.stringify() so the output is parseable rather than a raw object repr. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

playwright-cli restricts run-code --filename and screenshot --filename to paths within the project root and .playwright-cli/. Writing config and extraction script to os.tmpdir() causes a file access denied error. Switch tmpPrefix to use the output directory instead, which is always within the project root. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…cal testing Two focused files in plugins/web/docs/ aimed at agents and humans working on the web plugin itself, not at skill consumers: - playwright-cli-constraints.md: documents the /tmp/ path restriction affecting screenshot and run-code, the eval single-expression constraint, initScript path rules, and session cleanup. All discovered during PR testing. - testing-locally.md: documents how to set up project-scope .claude/skills/ copies for local testing, the global-vs-project precedence limitation for existing skills, the symlink isMain bug, and how to sync edits back to the plugin source before committing. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Detects all languages used on a webpage by combining declared structural signals (html@lang, hreflang alternates, nested lang= attributes, meta content-language) with content-based detection via Google CLD3 (cld3-asm WASM). Reconciles the two signal sets to surface mismatches such as undeclared languages in the body or stale hreflang declarations. Two-script design following the direct playwright-cli pattern used by page-tree and page-reduce: collect.js is a run-code script that extracts signals from the live page, detect.mjs strips the envelope, runs CLD3, and writes langs.json with detected languages, declared signals, and a reconciliation report. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Addresses tessl skill review feedback (score 95/100): add a Step 4 validation checkpoint after the collect/detect pipeline with guidance on the three common failure modes (empty detected array, lost session, all-null declared fields). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

catalan-adobe requested review from shsteimer and trieloff as code owners May 30, 2026 10:26

catalan-adobe added the ai-generated Pull request contains code generated or co-authored by AI tools label May 30, 2026

catalan-adobe temporarily deployed to eval May 30, 2026 10:26 — with GitHub Actions Inactive

trieloff approved these changes May 30, 2026

View reviewed changes

catalan-adobe marked this pull request as draft May 30, 2026 11:55

catalan-adobe temporarily deployed to eval June 2, 2026 10:48 — with GitHub Actions Inactive

catalan-adobe marked this pull request as ready for review June 2, 2026 11:05

catalan-adobe requested a review from trieloff June 2, 2026 11:05

catalan-adobe temporarily deployed to eval June 2, 2026 12:06 — with GitHub Actions Inactive

catalan-adobe temporarily deployed to eval June 2, 2026 12:12 — with GitHub Actions Inactive

catalan-adobe temporarily deployed to eval June 2, 2026 12:22 — with GitHub Actions Inactive

catalan-adobe and others added 11 commits June 2, 2026 14:36

docs(domain-mask): remove internal proxy mechanics section

daec9af

Tessl review flagged the 'What the proxy does' section as implementation details the agent doesn't need to run the skill. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

docs(page-prep): move format schemas to references/formats.md

8a83e96

Tessl conciseness feedback: Detection Report and Recipe Manifest format blocks add bulk that belongs in a reference file. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

docs(page-prep): trim explanatory rationale passages

1721e4b

Tessl conciseness feedback: bundle description, Step 9a rationale, and Step 9b DOM-vs-screenshot rationale explain things Claude can infer. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

catalan-adobe temporarily deployed to eval June 2, 2026 13:20 — with GitHub Actions Inactive

catalan-adobe temporarily deployed to eval June 2, 2026 13:29 — with GitHub Actions Inactive

catalan-adobe temporarily deployed to eval June 4, 2026 09:39 — with GitHub Actions Inactive

catalan-adobe temporarily deployed to eval June 4, 2026 12:21 — with GitHub Actions Inactive

catalan-adobe temporarily deployed to eval June 4, 2026 12:57 — with GitHub Actions Inactive

catalan-adobe temporarily deployed to eval June 4, 2026 12:59 — with GitHub Actions Inactive

catalan-adobe temporarily deployed to eval June 4, 2026 13:16 — with GitHub Actions Inactive

catalan-adobe temporarily deployed to eval June 4, 2026 13:49 — with GitHub Actions Inactive

catalan-adobe temporarily deployed to eval June 8, 2026 10:34 — with GitHub Actions Inactive

catalan-adobe temporarily deployed to eval June 8, 2026 10:41 — with GitHub Actions Inactive

trieloff approved these changes Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(web): add web plugin with browser automation and page analysis skills#164

feat(web): add web plugin with browser automation and page analysis skills#164
catalan-adobe wants to merge 25 commits into
mainfrom
worktree-feat-add-web-plugin-skills

catalan-adobe commented May 30, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 30, 2026 •

edited

Loading

Uh oh!

trieloff May 30, 2026

Uh oh!

catalan-adobe Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		## Always verify with Context7

		When using any playwright-cli command — especially unfamiliar flags, subcommands, or argument syntax — look up the current docs via Context7 before writing code. Search Context7 for "Playwright CLI". Do not guess from codebase patterns or memory; the CLI evolves and has non-obvious conventions (e.g., `screenshot` takes a ref not a CSS selector, `eval` is expression-only, `--raw` strips envelope formatting).

Conversation

catalan-adobe commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Quality gates (all passing locally)

Notes for reviewers

Test plan

Uh oh!

github-actions Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tessl Skill Lint

Uh oh!

trieloff May 30, 2026

Choose a reason for hiding this comment

Uh oh!

catalan-adobe Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

catalan-adobe commented May 30, 2026 •

edited

Loading

github-actions Bot commented May 30, 2026 •

edited

Loading