awin: document advertiser dashboard scraping#349
Open
CitizenZM wants to merge 1 commit into
Open
Conversation
Cookie banner blocks lazy-loaded KPIs even after domcontentloaded — dismiss before waiting for content. Login success must be detected by visible text, not URL, because id.awin.com keeps "/login" in URL through password handoff. Includes regex extractors for yesterday + 7-day KPIs, top-5 partner bar chart values (zero-width space U+200B separator), and partnership list rows. Also notes the Publisher Performance page is canvas-rendered and not DOM-scrapable — use CSV export or screenshot.
Contributor
There was a problem hiding this comment.
2 issues found across 1 file
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="agent-workspace/domain-skills/awin/scraping.md">
<violation number="1" location="agent-workspace/domain-skills/awin/scraping.md:37">
P2: Login recipe documents waiting ~3s for password page transition but implements no wait, creating a race condition where password entry may be silently skipped.</violation>
<violation number="2" location="agent-workspace/domain-skills/awin/scraping.md:162">
P2: Partnership row regex is inconsistent with documented status/date variants, causing it to miss non-Partners rows (Pending, Left your program) and rows using Left on instead of Partners since.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
|
|
||
| ```python | ||
| re.compile( | ||
| r"([A-Za-z][\w\s\.,&\-\(\)']{1,60})\n(\d{4,7})\nStatus\nPartners\n" |
Contributor
There was a problem hiding this comment.
P2: Partnership row regex is inconsistent with documented status/date variants, causing it to miss non-Partners rows (Pending, Left your program) and rows using Left on instead of Partners since.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At agent-workspace/domain-skills/awin/scraping.md, line 162:
<comment>Partnership row regex is inconsistent with documented status/date variants, causing it to miss non-Partners rows (Pending, Left your program) and rows using Left on instead of Partners since.</comment>
<file context>
@@ -0,0 +1,202 @@
+
+```python
+re.compile(
+ r"([A-Za-z][\w\s\.,&\-\(\)']{1,60})\n(\d{4,7})\nStatus\nPartners\n"
+ r"Website\n([^\n]+)\n"
+ r"Primary promotional type\n([^\n]*)\n"
</file context>
| const cont = [...document.querySelectorAll('button')].find(b => /continue/i.test(b.textContent)); | ||
| if (cont) cont.click(); | ||
| // wait ~3s for password page transition | ||
| const pw = document.querySelector('input[type="password"]'); |
Contributor
There was a problem hiding this comment.
P2: Login recipe documents waiting ~3s for password page transition but implements no wait, creating a race condition where password entry may be silently skipped.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At agent-workspace/domain-skills/awin/scraping.md, line 37:
<comment>Login recipe documents waiting ~3s for password page transition but implements no wait, creating a race condition where password entry may be silently skipped.</comment>
<file context>
@@ -0,0 +1,202 @@
+ const cont = [...document.querySelectorAll('button')].find(b => /continue/i.test(b.textContent));
+ if (cont) cont.click();
+ // wait ~3s for password page transition
+ const pw = document.querySelector('input[type="password"]');
+ if (pw) { pw.focus(); pw.value = PASSWORD;
+ pw.dispatchEvent(new Event('input', {bubbles:true}));
</file context>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
agent-workspace/domain-skills/awin/scraping.mddocumenting Awin's advertiser dashboard (app.awin.com).Discovered while building a scraper against the dashboard. The non-obvious traps cost a few iterations to figure out, so capturing them so the next agent doesn't re-pay the same tax.
Key gotchas documented
/loginafter success.id.awin.com/u/login/password?...is part of the redirect chain. Detect login by visible text (Your Accounts,Manage Accounts) — not by URL.domcontentloaded+ a fixed sleep, the dashboard renders only skeleton placeholders untilAccept allis clicked. Symptom: screenshot shows three loading dots and a sidebar full of gray rectangles. Dismiss on every page visit, not just login.[class*=skeleton]count +document.body.innerTextlength, up to 45s.Primary promotional typeline for some publishers — capture with[^\n]*not[^\n]+.Also includes
user_data_dir)/api/advertiser/{mid}/dashboard/kpi) for the next agentTest plan
document.body.innerTextcapturesSummary by cubic
Add
agent-workspace/domain-skills/awin/scraping.mddocumenting how to scrape Awin’s advertiser dashboard (app.awin.com). Covers login detection by text (not URL), dismissing the cookie banner to unblock lazy-loaded KPIs, a skeleton-load polling pattern, working regex extractors for KPIs/Top-5/partnerships, URL patterns, the canvas-based Publisher Performance limitation (use CSV or screenshot), isolated-profile setup, and API endpoints worth exploring.Written for commit 4bfe82d. Summary will update on new commits.