Add per-PR visual diff view (Playwright + Chromatic)#785
Add per-PR visual diff view (Playwright + Chromatic)#785RisingOrange wants to merge 17 commits intoPauseAI:mainfrom
Conversation
👷 Deploy request for pauseai pending review.Visit the deploys page to approve it
|
|
I like the idea but setting up a project requires running the Chromatic CLI and that doesn't work on my machine because of a space in the directory path. @anthonybailey Could you set this up maybe? |
|
Probably. I'll try. Nag me. My understanding is that pnpm build actions are executed in a new place (a machine owned by GitHub) without any secrets in environment settings. So LLM access to localize new source content will not occur (a build without keys is en-only.) This is fine and sensible as a default. We will continue to move as much static content generation from remote sources as possible into build. But again, it should function sensibly without keys. The basic model here seems OK. Any "but in production we do this instead" qualms are mostly answered by "at the moment we have zero diff support so this is simply new information". We do need reviewers to not be misled re what has been exercised though. Any thoughts there @RisingOrange ? What prevents misunderstanding this? Day to day CR needs some clues future users see so they know to read some subset of this PR. The only other thing I dislike is specifying routes separately in the test. We will rename things, accidentally changing test coverage. Any way to have the non-default sample/excluded coverage choices specified in the source code for the individual cases? |
Captures HTML+asset archives of 15 key routes × 2 viewports on each PR and uploads to Chromatic for server-side rendering and diffing against baseline. Posts a non-blocking "UI Review" check on the PR. - Routes auto-discovered from src/routes/, plus 5 markdown-post representatives; form/admin/content-churn routes explicitly excluded with WHY comments - Deterministic snapshots: /api/news mocked with a fixture, Tally iframes on /statement and /join aborted to avoid async-height churn - Chromium-only, non-blocking by design (exitZeroOnChanges) - Fork PRs supported via plaintext project token in the workflow (Chromatic's recommended pattern) - Playwright report (per-route screenshots) uploaded as GHA artifact
Without production secrets, Notion / Airtable / Substack calls during `pnpm build` (prerender, remote prerender fns) and `pnpm preview` (on-demand SSR, client-side fetches to /api/*) either fail or fall back to a couple of placeholder records — leaving snapshots too sparse to catch layout regressions in populated list/card components. Boot MSW in the Node process via `NPM_CONFIG_NODE_OPTIONS=--import` (the only way to propagate `--import` through pnpm, which otherwise overwrites NODE_OPTIONS with `--experimental-global-webcrypto`; see pnpm/pnpm#6210). The setup file is a no-op unless `VISUAL_TEST=1`. Handlers cover the two Notion databases (press, funding), the three Airtable tables (people, signatories, national-groups), the Substack RSS feed hit by /api/news, plus catch-all empty-result responses for any un-fixtured endpoint. Fake API keys are set in the workflow so the SDKs actually make HTTP requests (they short-circuit to a "no key" error otherwise, bypassing MSW entirely). All fixture data lives under a single `tests/visual/fixtures/` directory — the earlier `tests/visual/fixtures/news.json`, which was intercepted at the browser boundary via `page.route().fulfill()`, is replaced by `substack-feed.xml` served by MSW upstream of the /api/news server code. `page.route()` in smoke.spec.ts is now reserved for aborting third-party widgets (Tally, Mapbox, Luma) whose requests never reach our Node process. Clean split: MSW for fixture data, page.route for cross-origin aborts. Fixtures use realistic-shaped but unambiguously-fake data: first names paired with Example / Sample / Fixture surnames, Lorem-ipsum bios, example.org URLs, Example Foundation / Example News. Chromatic snapshots persist in its UI, so the content leaves no room to mistake a fixture attribution for a real endorsement if an image escapes context. Adds `msw` as a dev dependency and entry-points tests/visual/msw- setup.mjs in knip.config.ts so the indirect import is recognized.
Previously excluded because its chronological listing churns on any new post, same mechanism as the home page's LatestNews section — but the home page is included (its news churn is accepted as one re-approval per news-flagged post). Include /posts on the same trade-off: the cost is a re-approval in Chromatic per post add, which is trivial. Removes an inconsistency in which pages opt out vs which accept content-driven churn.
Four mobile snapshots blew past Chromatic's 25M-pixel snapshot cap with the default Pixel 7 DPR of 2.625: /dear-sir-demis-2025 (52M, 2.1× over), /posts (47M, 1.9× over), /learn (25.8M), /write (25.2M). Override deviceScaleFactor to 1 on the mobile project. The viewport is still 412 CSS px (the layout-relevant dimension); we just capture at 1× instead of 2.6× physical pixels. Mobile layout diffs don't need the hi-DPI oversampling — a layout regression is visible at either DPR. After the change, the largest mobile snapshot is 412 × 18468 ≈ 7.6M, well under the cap.
Addresses the reviewer concern that a green Chromatic check could silently imply coverage we don't actually have. On every PR, post a sticky comment summarizing what the run snapshotted: covered/excluded ratios for pages and posts, what "covered" doesn't mean (fixtures, widget aborts, en-only), and a conditional warning when un-fixtured requests fall through to MSW catch-alls / bypass. Plumbing: - tests/visual/scope-comment.ts renders the comment body from repo state (routes, posts, annotations) + the MSW warning log. - msw-setup.mjs listens for request:unhandled and msw-handlers.ts's catch-alls log to MSW_WARN_LOG when set, so the warning section has data to render. - visual-diff.yml generates the artifact (body.md + pr-number.txt) on every PR run. - visual-diff-comment.yml (new) triggers on workflow_run, runs in the base-repo context with pull-requests: write, downloads the artifact, posts or updates a sticky comment via marocchino/sticky-pull-request- comment. Works for fork PRs without granting forks write permissions. - README updated with a disclaimer about browser-side third-party embeds being out of scope for the MSW-based warning.
The previous page.route setup aborted three specific hosts (tally.so, lu.ma, api.mapbox.com). A new third-party widget added in a future PR would have loaded normally in CI, potentially leaking live state into snapshots with no signal. Replace with a generic rule: abort any cross-origin request whose resourceType is `document` (iframe), `xhr`, or `fetch`. Pass through cross-origin resources — images (Cloudinary), fonts, scripts (inlang CDN), stylesheets — since those are embedded assets whose content is part of the rendered design we want to capture. Server-side external calls remain handled by MSW-node. All three previous specific aborts fall under the generic rule: - tally.so iframes → `document` → aborted - lu.ma iframes → `document` → aborted - api.mapbox.com tiles (via mapbox-gl.js) → `xhr` → aborted Also: - Derive the "same origin" from Playwright's `baseURL` fixture rather than hardcoding `http://localhost:4173`, so a port change in the config doesn't silently invert the rule. - Update the "Not exercised" caveat to reflect the new behavior: containers render empty and reviewers should check the Netlify preview deploy for real rendering.
Codex review (gpt-5.5 high) flagged that the PR-number resolver filters on head_branch and head_repository.id but not base_ref, so a fork branch opened as two PRs against different base branches simultaneously could route the comment to the wrong PR. Fix options were all ugly (iterate + post to each, fail on >1 match); the case is rare enough that documenting the limit is the pragmatic call.
3b24d8c to
bd3b964
Compare
d8c68cb to
3218312
Compare
3218312 to
e3e04e7
Compare
|
Good points! 1. "Reviewers should not be misled about what has been exercised" Addressed by three things working together:
Residual caveats are honest: CI is en-only, fixture data ≠ live data, browser-side third-party widgets are not covered. These are called out in the comment's "Not exercised" section and documented in 2. "Specify routes in the source instead of separately in the test" Done via a colocated What do you think? |
`dotenv.config({ override: true })` was clobbering the fake
AIRTABLE_API_KEY / NOTION_API_KEY that the visual-diff workflow sets,
with the empty values from template.env. The SDKs then short-circuited
on empty key before ever making a request, so MSW had nothing to
intercept and /about, /statement, /communities snapshots rendered their
hardcoded "[FALLBACK DATA]" fallback paths instead of fixture data.
Previously the three `PauseAI Example North/South/East` records didn't match any country name in the static `communities` list, so `nationalGroups.find()` returned undefined for every card and the fixture had no visible effect — snapshots were identical whether the fixture returned records or an empty list. Now one record matches 'France' and carries an inline SVG tricolor as its image, so France's card shows a fixture-provided flag while the rest keep the default icon. The populated-card render path is now covered.
Adds @visualDiffEnabled: true to three posts whose embedded components or layout aren't represented by the existing covered set: Donate (donate.md), Doomers (pdoom.md), and the auto-generated SimpleToc (faq.md).
# Conflicts: # knip.config.ts # pnpm-lock.yaml
When a commit lives on the repo's default branch, GitHub's
listPullRequestsAssociatedWithCommit endpoint only returns merged PRs;
open PRs with that commit at their head are filtered out. This is
unlikely in production (the feature branch is never main), but it
silently breaks the scope comment on any fork where the feature branch
happens to be default.
Switch to pulls?state=open&head={owner}:{branch}, which has no such
quirk. Trusted-inputs safety property is unchanged: head owner/branch
come from workflow_run, and we still verify head.repo.id and head.sha
against the trusted payload before posting.
Adds a per-PR visual diff view: every PR gets side-by-side before/after renders of the site's pages, so reviewers can see the visual impact of a change without clicking through Netlify previews page-by-page. Under the hood this is visual regression testing via Playwright + Chromatic.
Why
/outcomeslayout bug shipped unnoticed —/outcomesis in the covered set, so this setup would have surfaced it)./if-anyone-builds-it-campaignat a glance, rather than spot-checking via Netlify.How it works
pnpm build && pnpm previewand navigates Playwright through the covered routes.@chromatic-com/playwrightcaptures HTML + asset archives of each page.chromaui/actionuploads archives; Chromatic re-renders them server-side in a controlled browser and diffs against the baseline.⚠️warning section when any un-fixtured server-side request falls through.mainmoves.No baselines in git — Chromatic stores them per-branch and promotes on merge. No regeneration workflow, no Linux-vs-macOS font-parity problems, no PR churn from binary PNGs.
Coverage
22 routes × 2 viewports = 44 snapshots per run. The covered set is discovered by walking
src/routes/; each page controls its own inclusion via a@visualDiffEnabledannotation right in its source file.Routes and posts opt in or out via a
@visualDiffEnabledannotation in the source file itself:<!-- @visualDiffEnabled: false — admin-only tool -->src/routes/default to included. Opt out with@visualDiffEnabled: falsein+page.svelteor+page.ts.src/posts/(rendered via[slug]) default to excluded, since they share a layout; only representatives of distinct variants (banner image, link-heavy, long-form, embedded Svelte components) opt in.Renames carry the annotation with the file, so the test set can't silently drift from reality.
tests/visual/routes.tsandtests/visual/scope-comment.tsshare one scanner (tests/visual/annotations.ts) — what the test actually runs and what the PR comment reports can't diverge.Current opt-outs (grep
@visualDiffEnabled: false):/submitted— redirects on mount/verify— token-dependent/quotes— ~28M-pixel desktop snapshot, over Chromatic's 25M capDeterministic rendering — how snapshots are kept stable
A visual-regression check is only useful if the same code produces the same screenshot every run. Several measures together keep snapshots stable:
External data (the biggest source of noise) is intercepted at two boundaries:
tests/visual/msw-setup.ts) intercepts outbound HTTP from the Node process duringpnpm build(prerender, remote prerender functions) andpnpm preview(on-demand SSR). Notion, Airtable, and Substack RSS are served from pinned fixtures intests/visual/fixtures/. Catch-all handlers return empty results for un-fixtured Airtable/Notion endpoints; when one is hit (or a wholly uncovered host is bypassed), the scope comment's⚠️section surfaces it.page.route()default-denies any cross-origin iframe, XHR, or fetch request from the browser. Tally, Luma, Mapbox, segment.io, and any new third-party widget someone adds in a future PR render as an empty container — no live third-party state leaks in. Cross-origin resources (Cloudinary images, fonts, inlang CDN scripts) pass through unchanged.Non-HTTP sources of drift are neutralized in
playwright.config.tsand the test harness:reducedMotion: 'reduce'— animations and CSS transitions complete instantly (the site already honorsprefers-reduced-motioninsrc/styles/reset.css).locale: 'en-US'andtimezoneId: 'UTC'pinned at the browser context level, so anyIntl/ date formatting is deterministic.img[loading="lazy"]is flipped toeagerbefore each screenshot, and the run waits fordocument.fonts.readyplusnetworkidle, so fonts and below-the-fold images are fully resolved at capture time.Sizing — the mobile project uses
deviceScaleFactor: 1(overriding Pixel 7's default DPR of 2.625) so long-page snapshots stay under Chromatic's 25M-pixel limit without losing layout fidelity.A green diff on
/aboutmeans "no layout regression given the fixture people list," not "the Airtable integration works." The PR comment spells out this distinction.Scope comment
Posted on each PR by a companion workflow (
.github/workflows/visual-diff-comment.yml), updated in-place on re-runs. Baseline state and warning state on fork PRs:⚠️section surfaces it.Sample comment body (baseline)
Sample warning section (when an un-fixtured request is hit)
Safety of the comment pipeline
The posting workflow (
visual-diff-comment.yml) triggers onworkflow_runand runs in the base-repo context withpull-requests: write. Fork code never executes in the trusted context — the trusted workflow only consumes the artifact produced by the fork's run and posts its Markdown as a sticky comment. The PR number is derived fromworkflow_run.head_sha+head_branch+head_repository.idvia the trusted GitHub API, not from anything the fork artifact could tamper with. Annotation-reason text is escaped before rendering so a malicious reason can't close the wrapping<details>, ping users, or inject links. The residual surface — fork-authored Markdown posted verbatim — is explicitly an accepted tradeoff given the contributor model (known volunteers, worst case = visible-and-recoverable comment spam).CI behavior + fork PRs
main.exitZeroOnChanges: true— workflow exits 0 even when Chromatic detects diffs. Chromatic's check is informational; branch protection is not configured to require it.exitOnceUploaded: true— CI exits after archive upload. Warm-cache runs: ~2:45–3:00 (scope comment + MSW overhead add ~30s to the original ~2:15 baseline). Cold-cache first run: ~5:30.concurrencygroup cancels stale runs on new pushes.playwright-report/uploaded as a GHA artifact on every run (14d retention) for inspection without needing the Chromatic UI.Fork PRs work end-to-end, including the scope comment. GitHub Actions can't pass repo secrets to fork-PR runs, so the Chromatic project token is inlined as plaintext in the workflow — this is Chromatic's own recommended pattern for fork-PR support. The scope comment posts via the
workflow_run-based companion workflow described above.Setup required for upstream adoption
.github/workflows/visual-diff.ymlwith the new project's token.mainuploads initial baselines; accept them once in Chromatic and the pipeline is live.Any repo collaborator can review and accept baselines via GitHub OAuth login — no separate Chromatic invites needed.
Local screenshot workflow — using the Playwright suite for manual before/after
The Playwright suite doubles as a local before/after screenshot tool:
Runs the suite against a fresh preview build and opens an HTML report with per-route screenshots. For before/after: run on the base branch, save
playwright-report/somewhere, switch branches, re-run, eyeball both reports. No Chromatic quota, no PR check side effects.Known limitations
enonly — locale-dependent layout regressions are invisible.Possible follow-ups — out of scope for this PR
Natural extensions once the base setup is adopted: