From 4bfe82d56c1961b9e5c5f19c911724ce2dcc74ec Mon Sep 17 00:00:00 2001 From: CitizenZM <59761612+CitizenZM@users.noreply.github.com> Date: Tue, 12 May 2026 23:16:10 -0700 Subject: [PATCH] awin: document advertiser dashboard scraping MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cookie banner blocks lazy-loaded KPIs even after domcontentloaded — dismiss before waiting for content. Login success must be detected by visible text, not URL, because id.awin.com keeps "/login" in URL through password handoff. Includes regex extractors for yesterday + 7-day KPIs, top-5 partner bar chart values (zero-width space U+200B separator), and partnership list rows. Also notes the Publisher Performance page is canvas-rendered and not DOM-scrapable — use CSV export or screenshot. --- .../domain-skills/awin/scraping.md | 202 ++++++++++++++++++ 1 file changed, 202 insertions(+) create mode 100644 agent-workspace/domain-skills/awin/scraping.md diff --git a/agent-workspace/domain-skills/awin/scraping.md b/agent-workspace/domain-skills/awin/scraping.md new file mode 100644 index 00000000..4f9b4661 --- /dev/null +++ b/agent-workspace/domain-skills/awin/scraping.md @@ -0,0 +1,202 @@ +# Awin (app.awin.com) — Advertiser dashboard scraping + +Awin's advertiser dashboard is a Vue/React SPA hosted on `app.awin.com`, with auth on `id.awin.com`. KPI tiles, charts, and tables render asynchronously after the SPA boots, behind a cookie banner that blocks lazy load until dismissed. + +## URL patterns + +| Page | URL | +|---|---| +| Login | `https://app.awin.com/login` (redirects to `id.awin.com/u/login/identifier?...`) | +| User home (account picker) | `https://ui.awin.com/user` | +| Advertiser home | `https://app.awin.com/en/awin/advertiser/{merchant_id}/home` | +| Publisher Performance report | `https://app.awin.com/en/awin/advertiser/{merchant_id}/reports/publisher-performance` | +| All partnerships | `https://app.awin.com/en/awin/advertiser/{merchant_id}/partnerships/all` | +| Commissions | `https://app.awin.com/en/awin/advertiser/{merchant_id}/commissions` | +| Campaigns (new UI) | `https://app.awin.com/en/awin/advertiser/{merchant_id}/campaigns` | + +Merchant IDs are stable integers (5–7 digits) — read them off the URL after picking an account on `ui.awin.com/user`. The same advertiser brand may have separate IDs per region (US / EU / APAC). + +## Login flow + +Two-step: email → Continue → password → Sign in. Note that after successful login the URL still contains `/login` for a moment (`id.awin.com/u/login/password?...` → `ui.awin.com/user`) — **detect success by visible text ("Your Accounts", "Manage Accounts", "Advertiser Reports"), not by URL.** + +```python +async () => { + // dismiss cookie banner first — it blocks lazy-loaded KPIs + const ck = [...document.querySelectorAll('button')].find(b => /accept all/i.test(b.textContent||'')); + if (ck) ck.click(); + + const email = document.querySelector('input[type="email"], input[name="username"]'); + if (email) { email.focus(); email.value = EMAIL; + email.dispatchEvent(new Event('input', {bubbles:true})); + email.dispatchEvent(new Event('change', {bubbles:true})); + } + const cont = [...document.querySelectorAll('button')].find(b => /continue/i.test(b.textContent)); + if (cont) cont.click(); + // wait ~3s for password page transition + const pw = document.querySelector('input[type="password"]'); + if (pw) { pw.focus(); pw.value = PASSWORD; + pw.dispatchEvent(new Event('input', {bubbles:true})); + pw.dispatchEvent(new Event('change', {bubbles:true})); + } + const submit = [...document.querySelectorAll('button')].find(b => /sign in|log in|submit/i.test(b.textContent)); + if (submit) submit.click(); +} +``` + +## The cookie-banner trap + +If the cookie banner ("Cookies and privacy") is still visible, the advertiser home renders **only skeleton placeholders** — gray bars where KPI cards should be. `wait_for_load()` returns immediately because the SPA is "ready," but the actual data fetches are deferred until the banner is dismissed. Symptom: screenshot shows three loading dots and a sidebar full of gray rectangles. + +**Always dismiss the banner before waiting for content.** Dismiss runs on every page visit, not just login — Awin re-shows it on some routes. + +## Skeleton-load polling pattern + +`domcontentloaded` + a fixed `sleep(6)` is not enough. The home page can take 8–15s for KPI tiles to render. Poll for either: + +1. Skeleton placeholder count to drop below ~5: `[class*=skeleton],[class*=Skeleton],[class*=placeholder]` +2. Specific KPI text to appear: `Revenue`, `Transactions`, `Clicks`, `Performance` + +```js +async () => { + const skel = document.querySelectorAll('[class*=skeleton],[class*=Skeleton],[class*=placeholder]').length; + const txt = document.body.innerText || ''; + return { skel, ready: skel < 5 && txt.length > 800 }; +} +``` + +Poll every 1s, max 45s. Also do a slow scroll to bottom + back to top — it triggers IntersectionObserver-driven lazy mounts for sections below the fold. + +## Where the real data lives + +Awin renders KPIs in styled `