Skip to content

awin: document advertiser dashboard scraping#349

Open
CitizenZM wants to merge 1 commit into
browser-use:mainfrom
CitizenZM:add-awin-domain-skill
Open

awin: document advertiser dashboard scraping#349
CitizenZM wants to merge 1 commit into
browser-use:mainfrom
CitizenZM:add-awin-domain-skill

Conversation

@CitizenZM
Copy link
Copy Markdown

@CitizenZM CitizenZM commented May 13, 2026

Summary

Adds agent-workspace/domain-skills/awin/scraping.md documenting Awin's advertiser dashboard (app.awin.com).

Discovered while building a scraper against the dashboard. The non-obvious traps cost a few iterations to figure out, so capturing them so the next agent doesn't re-pay the same tax.

Key gotchas documented

  • Login URL stays /login after success. id.awin.com/u/login/password?... is part of the redirect chain. Detect login by visible text (Your Accounts, Manage Accounts) — not by URL.
  • Cookie banner blocks lazy load. Even after domcontentloaded + a fixed sleep, the dashboard renders only skeleton placeholders until Accept all is clicked. Symptom: screenshot shows three loading dots and a sidebar full of gray rectangles. Dismiss on every page visit, not just login.
  • Skeleton polling pattern — poll [class*=skeleton] count + document.body.innerText length, up to 45s.
  • Top-5 partner chart values have U+200B zero-width space between the duplicated value. Regex must account for it.
  • Partnership list rows have an empty Primary promotional type line for some publishers — capture with [^\n]* not [^\n]+.
  • Publisher Performance page is canvas-rendered (Looker embed). DOM scraping returns only chrome text. Use CSV export or screenshot.

Also includes

  • URL patterns for home / publisher performance / partnerships / commissions / campaigns
  • Regex extractors for yesterday + 7-day KPIs and partnership row blocks
  • Isolated-profile pattern for running concurrent with the MCP browser (separate user_data_dir)
  • Hints toward unexplored private APIs (/api/advertiser/{mid}/dashboard/kpi) for the next agent

Test plan

  • Another agent reads the skill and successfully scrapes an Awin advertiser home without re-discovering the cookie-banner trap
  • Regex extractors work against fresh document.body.innerText captures

Summary by cubic

Add agent-workspace/domain-skills/awin/scraping.md documenting how to scrape Awin’s advertiser dashboard (app.awin.com). Covers login detection by text (not URL), dismissing the cookie banner to unblock lazy-loaded KPIs, a skeleton-load polling pattern, working regex extractors for KPIs/Top-5/partnerships, URL patterns, the canvas-based Publisher Performance limitation (use CSV or screenshot), isolated-profile setup, and API endpoints worth exploring.

Written for commit 4bfe82d. Summary will update on new commits.

Cookie banner blocks lazy-loaded KPIs even after domcontentloaded —
dismiss before waiting for content. Login success must be detected
by visible text, not URL, because id.awin.com keeps "/login" in URL
through password handoff.

Includes regex extractors for yesterday + 7-day KPIs, top-5 partner
bar chart values (zero-width space U+200B separator), and partnership
list rows. Also notes the Publisher Performance page is canvas-rendered
and not DOM-scrapable — use CSV export or screenshot.
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 1 file

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="agent-workspace/domain-skills/awin/scraping.md">

<violation number="1" location="agent-workspace/domain-skills/awin/scraping.md:37">
P2: Login recipe documents waiting ~3s for password page transition but implements no wait, creating a race condition where password entry may be silently skipped.</violation>

<violation number="2" location="agent-workspace/domain-skills/awin/scraping.md:162">
P2: Partnership row regex is inconsistent with documented status/date variants, causing it to miss non-Partners rows (Pending, Left your program) and rows using Left on instead of Partners since.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.


```python
re.compile(
r"([A-Za-z][\w\s\.,&\-\(\)']{1,60})\n(\d{4,7})\nStatus\nPartners\n"
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Partnership row regex is inconsistent with documented status/date variants, causing it to miss non-Partners rows (Pending, Left your program) and rows using Left on instead of Partners since.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At agent-workspace/domain-skills/awin/scraping.md, line 162:

<comment>Partnership row regex is inconsistent with documented status/date variants, causing it to miss non-Partners rows (Pending, Left your program) and rows using Left on instead of Partners since.</comment>

<file context>
@@ -0,0 +1,202 @@
+
+```python
+re.compile(
+  r"([A-Za-z][\w\s\.,&\-\(\)']{1,60})\n(\d{4,7})\nStatus\nPartners\n"
+  r"Website\n([^\n]+)\n"
+  r"Primary promotional type\n([^\n]*)\n"
</file context>
Fix with Cubic

const cont = [...document.querySelectorAll('button')].find(b => /continue/i.test(b.textContent));
if (cont) cont.click();
// wait ~3s for password page transition
const pw = document.querySelector('input[type="password"]');
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Login recipe documents waiting ~3s for password page transition but implements no wait, creating a race condition where password entry may be silently skipped.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At agent-workspace/domain-skills/awin/scraping.md, line 37:

<comment>Login recipe documents waiting ~3s for password page transition but implements no wait, creating a race condition where password entry may be silently skipped.</comment>

<file context>
@@ -0,0 +1,202 @@
+  const cont = [...document.querySelectorAll('button')].find(b => /continue/i.test(b.textContent));
+  if (cont) cont.click();
+  // wait ~3s for password page transition
+  const pw = document.querySelector('input[type="password"]');
+  if (pw) { pw.focus(); pw.value = PASSWORD;
+    pw.dispatchEvent(new Event('input', {bubbles:true}));
</file context>
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant