Skip to content

Add per-PR visual diff view (Playwright + Chromatic)#785

Open
RisingOrange wants to merge 17 commits intoPauseAI:mainfrom
RisingOrange:visual-diff
Open

Add per-PR visual diff view (Playwright + Chromatic)#785
RisingOrange wants to merge 17 commits intoPauseAI:mainfrom
RisingOrange:visual-diff

Conversation

@RisingOrange
Copy link
Copy Markdown
Contributor

@RisingOrange RisingOrange commented Apr 20, 2026

Adds a per-PR visual diff view: every PR gets side-by-side before/after renders of the site's pages, so reviewers can see the visual impact of a change without clicking through Netlify previews page-by-page. Under the hood this is visual regression testing via Playwright + Chromatic.

Non-blocking / informational. The diff view is a signal reviewers can glance at; it never blocks a merge. The workflow exits 0 regardless of diffs, and nothing in branch protection gates on it. Adopting this shouldn't change the merge flow, just surface information.

Why

  • Visual regressions slip past manual QA (e.g. a past /outcomes layout bug shipped unnoticed — /outcomes is in the covered set, so this setup would have surfaced it).
  • Netlify previews only enable manual inspection; the reviewer has to open each affected page themselves and remember what it looked like before. A diff view flips that: reviewers see exactly what changed, automatically.
  • Works for catching regressions and reviewing intentional changes. Example: #702 changed the post banner aspect ratio across all posts — with this setup, a reviewer would have seen the change applied to /if-anyone-builds-it-campaign at a glance, rather than spot-checking via Netlify.

How it works

  1. On each PR, GitHub Actions runs pnpm build && pnpm preview and navigates Playwright through the covered routes.
  2. @chromatic-com/playwright captures HTML + asset archives of each page.
  3. chromaui/action uploads archives; Chromatic re-renders them server-side in a controlled browser and diffs against the baseline.
  4. A sticky PR comment (live example) summarizes what the run covered — counts, per-category ratios, exclusions with reasons, a link to the Chromatic review UI, and a ⚠️ warning section when any un-fixtured server-side request falls through.
  5. Chromatic posts a separate "UI Review" check showing the live diff status (N changes to accept, accepted baselines, etc.).
  6. Reviewers click the Chromatic link to accept/deny per snapshot. Accepted snapshots become the new baseline once main moves.

No baselines in git — Chromatic stores them per-branch and promotes on merge. No regeneration workflow, no Linux-vs-macOS font-parity problems, no PR churn from binary PNGs.

Coverage

22 routes × 2 viewports = 44 snapshots per run. The covered set is discovered by walking src/routes/; each page controls its own inclusion via a @visualDiffEnabled annotation right in its source file.

Routes and posts opt in or out via a @visualDiffEnabled annotation in the source file itself:

<!-- @visualDiffEnabled: false — admin-only tool -->
---
title: My Post
---

<!-- @visualDiffEnabled: true — post with a banner image -->
  • Pages under src/routes/ default to included. Opt out with @visualDiffEnabled: false in +page.svelte or +page.ts.
  • Posts under src/posts/ (rendered via [slug]) default to excluded, since they share a layout; only representatives of distinct variants (banner image, link-heavy, long-form, embedded Svelte components) opt in.

Renames carry the annotation with the file, so the test set can't silently drift from reality. tests/visual/routes.ts and tests/visual/scope-comment.ts share one scanner (tests/visual/annotations.ts) — what the test actually runs and what the PR comment reports can't diverge.

Current opt-outs (grep @visualDiffEnabled: false):

  • /submitted — redirects on mount
  • /verify — token-dependent
  • /quotes — ~28M-pixel desktop snapshot, over Chromatic's 25M cap
Deterministic rendering — how snapshots are kept stable

A visual-regression check is only useful if the same code produces the same screenshot every run. Several measures together keep snapshots stable:

External data (the biggest source of noise) is intercepted at two boundaries:

  • MSW-node (tests/visual/msw-setup.ts) intercepts outbound HTTP from the Node process during pnpm build (prerender, remote prerender functions) and pnpm preview (on-demand SSR). Notion, Airtable, and Substack RSS are served from pinned fixtures in tests/visual/fixtures/. Catch-all handlers return empty results for un-fixtured Airtable/Notion endpoints; when one is hit (or a wholly uncovered host is bypassed), the scope comment's ⚠️ section surfaces it.
  • Playwright page.route() default-denies any cross-origin iframe, XHR, or fetch request from the browser. Tally, Luma, Mapbox, segment.io, and any new third-party widget someone adds in a future PR render as an empty container — no live third-party state leaks in. Cross-origin resources (Cloudinary images, fonts, inlang CDN scripts) pass through unchanged.

Non-HTTP sources of drift are neutralized in playwright.config.ts and the test harness:

  • reducedMotion: 'reduce' — animations and CSS transitions complete instantly (the site already honors prefers-reduced-motion in src/styles/reset.css).
  • locale: 'en-US' and timezoneId: 'UTC' pinned at the browser context level, so any Intl / date formatting is deterministic.
  • img[loading="lazy"] is flipped to eager before each screenshot, and the run waits for document.fonts.ready plus networkidle, so fonts and below-the-fold images are fully resolved at capture time.

Sizing — the mobile project uses deviceScaleFactor: 1 (overriding Pixel 7's default DPR of 2.625) so long-page snapshots stay under Chromatic's 25M-pixel limit without losing layout fidelity.

A green diff on /about means "no layout regression given the fixture people list," not "the Airtable integration works." The PR comment spells out this distinction.

Scope comment

Posted on each PR by a companion workflow (.github/workflows/visual-diff-comment.yml), updated in-place on re-runs. Baseline state and warning state on fork PRs:

Sample comment body (baseline)
**Visual diff coverage**
_commit [`abc1234`](https://…/commit/abc1234def) · [run #12345](https://…/actions/runs/12345)_

[Review snapshots in Chromatic](https://www.chromatic.com/build?appId=…&number=66)

- **Pages:** 16/19 covered. _Included_ by default.
- **Posts:** 6/101 covered. _Excluded_ by default (posts share a layout).

<details><summary>Route breakdown</summary>

- **Pages excluded:**
  - `/quotes`: ~28M-pixel desktop snapshot is over Chromatic's 25M cap
  - `/submitted`: post-form state
  - `/verify`: token-dependent
- **Posts covered:**
  - `/action`
  - `/funding`
  - `/if-anyone-builds-it-campaign`
  - `/join`
  - `/learn`
  - `/values`

</details>

<details><summary>Not exercised</summary>

- Pages that fetch from external APIs render against pinned fixtures, not live data.
- Cross-origin iframes and API calls (Tally, Luma, Mapbox, analytics, newly added widgets) are aborted at the browser boundary. Containers render as empty. Check the Netlify preview deploy to verify their real rendering.
- Only `en` locale is built.

</details>

[`tests/visual/README.md`]() explains the annotation syntax and the coverage model.
Sample warning section (when an un-fixtured request is hit)
<details open><summary>⚠️ 1 external request not covered by fixtures</summary>

- **Endpoints:**
  -`GET https://api.example.com/v1/data` (no handler; allowed through)
- **Fix:** add an explicit handler + fixture in `tests/visual/msw-setup.ts` (so the page renders its populated layout against stable data), or exclude the page with `@visualDiffEnabled: false`.
- **Why it matters:** without a fixture, CI gets either a 401 (endpoint needs auth) or drifting live data, so the snapshot won't match production.

</details>

Safety of the comment pipeline

The posting workflow (visual-diff-comment.yml) triggers on workflow_run and runs in the base-repo context with pull-requests: write. Fork code never executes in the trusted context — the trusted workflow only consumes the artifact produced by the fork's run and posts its Markdown as a sticky comment. The PR number is derived from workflow_run.head_sha + head_branch + head_repository.id via the trusted GitHub API, not from anything the fork artifact could tamper with. Annotation-reason text is escaped before rendering so a malicious reason can't close the wrapping <details>, ping users, or inject links. The residual surface — fork-authored Markdown posted verbatim — is explicitly an accepted tradeoff given the contributor model (known volunteers, worst case = visible-and-recoverable comment spam).

CI behavior + fork PRs
  • Runs on PRs and pushes to main.
  • Non-blocking. exitZeroOnChanges: true — workflow exits 0 even when Chromatic detects diffs. Chromatic's check is informational; branch protection is not configured to require it.
  • exitOnceUploaded: true — CI exits after archive upload. Warm-cache runs: ~2:45–3:00 (scope comment + MSW overhead add ~30s to the original ~2:15 baseline). Cold-cache first run: ~5:30.
  • concurrency group cancels stale runs on new pushes.
  • Playwright browsers and imagetools transforms are cached.
  • playwright-report/ uploaded as a GHA artifact on every run (14d retention) for inspection without needing the Chromatic UI.

Fork PRs work end-to-end, including the scope comment. GitHub Actions can't pass repo secrets to fork-PR runs, so the Chromatic project token is inlined as plaintext in the workflow — this is Chromatic's own recommended pattern for fork-PR support. The scope comment posts via the workflow_run-based companion workflow described above.

Setup required for upstream adoption

  1. Create a Chromatic project linked to this repo (free tier: 5k snapshots/month — expected usage ~1–2k/mo based on current PR volume).
  2. Replace the inlined project token in .github/workflows/visual-diff.yml with the new project's token.
  3. Merge this PR — the first push to main uploads initial baselines; accept them once in Chromatic and the pipeline is live.

Any repo collaborator can review and accept baselines via GitHub OAuth login — no separate Chromatic invites needed.

Local screenshot workflow — using the Playwright suite for manual before/after

The Playwright suite doubles as a local before/after screenshot tool:

pnpm exec playwright test
pnpm exec playwright show-report

Runs the suite against a fresh preview build and opens an HTML report with per-route screenshots. For before/after: run on the base branch, save playwright-report/ somewhere, switch branches, re-run, eyeball both reports. No Chromatic quota, no PR check side effects.

Known limitations

  • Transient mid-hydration flashes aren't caught. Static snapshots capture the settled state.
  • Chromium-only. Firefox- or Safari-specific rendering bugs won't surface. Adding more browsers is possible but multiplies the snapshot count.
  • en only — locale-dependent layout regressions are invisible.
  • Fixture data, not live. The diff exercises the layout path given the fixture shape, not the external integration.
  • Free-tier ceiling is 5,000 snapshots/month; we're well under, but worth watching as the route count grows.
Possible follow-ups — out of scope for this PR

Natural extensions once the base setup is adopted:

  • More browsers. Firefox and/or WebKit to catch vendor-specific rendering bugs (e.g. #664 was a Firefox-only hero color mismatch).
  • More viewports. A portrait desktop viewport would catch bugs like #667 (large portrait desktop incorrectly getting the mobile layout).
  • Banner-visible snapshots. The campaign banner is disabled in layout, so banner-styling regressions (e.g. #677) aren't surfaced. Optionally force a banner on in visual-test mode.

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 20, 2026

👷 Deploy request for pauseai pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit 6133410

@Wituareard
Copy link
Copy Markdown
Collaborator

I like the idea but setting up a project requires running the Chromatic CLI and that doesn't work on my machine because of a space in the directory path. @anthonybailey Could you set this up maybe?

@anthonybailey
Copy link
Copy Markdown
Collaborator

Probably. I'll try. Nag me.

My understanding is that pnpm build actions are executed in a new place (a machine owned by GitHub) without any secrets in environment settings.

So LLM access to localize new source content will not occur (a build without keys is en-only.) This is fine and sensible as a default.

We will continue to move as much static content generation from remote sources as possible into build. But again, it should function sensibly without keys. The basic model here seems OK. Any "but in production we do this instead" qualms are mostly answered by "at the moment we have zero diff support so this is simply new information". We do need reviewers to not be misled re what has been exercised though.

Any thoughts there @RisingOrange ? What prevents misunderstanding this? Day to day CR needs some clues future users see so they know to read some subset of this PR.

The only other thing I dislike is specifying routes separately in the test. We will rename things, accidentally changing test coverage.

Any way to have the non-default sample/excluded coverage choices specified in the source code for the individual cases?

Captures HTML+asset archives of 15 key routes × 2 viewports on each PR
and uploads to Chromatic for server-side rendering and diffing against
baseline. Posts a non-blocking "UI Review" check on the PR.

- Routes auto-discovered from src/routes/, plus 5 markdown-post
  representatives; form/admin/content-churn routes explicitly excluded
  with WHY comments
- Deterministic snapshots: /api/news mocked with a fixture, Tally
  iframes on /statement and /join aborted to avoid async-height churn
- Chromium-only, non-blocking by design (exitZeroOnChanges)
- Fork PRs supported via plaintext project token in the workflow
  (Chromatic's recommended pattern)
- Playwright report (per-route screenshots) uploaded as GHA artifact
Without production secrets, Notion / Airtable / Substack calls during
`pnpm build` (prerender, remote prerender fns) and `pnpm preview`
(on-demand SSR, client-side fetches to /api/*) either fail or fall
back to a couple of placeholder records — leaving snapshots too sparse
to catch layout regressions in populated list/card components.

Boot MSW in the Node process via `NPM_CONFIG_NODE_OPTIONS=--import`
(the only way to propagate `--import` through pnpm, which otherwise
overwrites NODE_OPTIONS with `--experimental-global-webcrypto`; see
pnpm/pnpm#6210). The setup file is a no-op unless `VISUAL_TEST=1`.

Handlers cover the two Notion databases (press, funding), the three
Airtable tables (people, signatories, national-groups), the Substack
RSS feed hit by /api/news, plus catch-all empty-result responses for
any un-fixtured endpoint. Fake API keys are set in the workflow so
the SDKs actually make HTTP requests (they short-circuit to a "no
key" error otherwise, bypassing MSW entirely).

All fixture data lives under a single `tests/visual/fixtures/`
directory — the earlier `tests/visual/fixtures/news.json`, which was
intercepted at the browser boundary via `page.route().fulfill()`, is
replaced by `substack-feed.xml` served by MSW upstream of the
/api/news server code. `page.route()` in smoke.spec.ts is now
reserved for aborting third-party widgets (Tally, Mapbox, Luma)
whose requests never reach our Node process. Clean split: MSW for
fixture data, page.route for cross-origin aborts.

Fixtures use realistic-shaped but unambiguously-fake data: first
names paired with Example / Sample / Fixture surnames, Lorem-ipsum
bios, example.org URLs, Example Foundation / Example News. Chromatic
snapshots persist in its UI, so the content leaves no room to
mistake a fixture attribution for a real endorsement if an image
escapes context.

Adds `msw` as a dev dependency and entry-points tests/visual/msw-
setup.mjs in knip.config.ts so the indirect import is recognized.
Previously excluded because its chronological listing churns on any new
post, same mechanism as the home page's LatestNews section — but the
home page is included (its news churn is accepted as one re-approval
per news-flagged post).

Include /posts on the same trade-off: the cost is a re-approval in
Chromatic per post add, which is trivial. Removes an inconsistency
in which pages opt out vs which accept content-driven churn.
Four mobile snapshots blew past Chromatic's 25M-pixel snapshot cap with
the default Pixel 7 DPR of 2.625: /dear-sir-demis-2025 (52M, 2.1× over),
/posts (47M, 1.9× over), /learn (25.8M), /write (25.2M).

Override deviceScaleFactor to 1 on the mobile project. The viewport is
still 412 CSS px (the layout-relevant dimension); we just capture at
1× instead of 2.6× physical pixels. Mobile layout diffs don't need the
hi-DPI oversampling — a layout regression is visible at either DPR.

After the change, the largest mobile snapshot is 412 × 18468 ≈ 7.6M,
well under the cap.
Addresses the reviewer concern that a green Chromatic check could
silently imply coverage we don't actually have. On every PR, post a
sticky comment summarizing what the run snapshotted: covered/excluded
ratios for pages and posts, what "covered" doesn't mean (fixtures,
widget aborts, en-only), and a conditional warning when un-fixtured
requests fall through to MSW catch-alls / bypass.

Plumbing:
- tests/visual/scope-comment.ts renders the comment body from repo
  state (routes, posts, annotations) + the MSW warning log.
- msw-setup.mjs listens for request:unhandled and msw-handlers.ts's
  catch-alls log to MSW_WARN_LOG when set, so the warning section
  has data to render.
- visual-diff.yml generates the artifact (body.md + pr-number.txt)
  on every PR run.
- visual-diff-comment.yml (new) triggers on workflow_run, runs in the
  base-repo context with pull-requests: write, downloads the artifact,
  posts or updates a sticky comment via marocchino/sticky-pull-request-
  comment. Works for fork PRs without granting forks write permissions.
- README updated with a disclaimer about browser-side third-party
  embeds being out of scope for the MSW-based warning.
The previous page.route setup aborted three specific hosts (tally.so,
lu.ma, api.mapbox.com). A new third-party widget added in a future PR
would have loaded normally in CI, potentially leaking live state into
snapshots with no signal.

Replace with a generic rule: abort any cross-origin request whose
resourceType is `document` (iframe), `xhr`, or `fetch`. Pass through
cross-origin resources — images (Cloudinary), fonts, scripts (inlang
CDN), stylesheets — since those are embedded assets whose content is
part of the rendered design we want to capture. Server-side external
calls remain handled by MSW-node.

All three previous specific aborts fall under the generic rule:
- tally.so iframes → `document` → aborted
- lu.ma iframes → `document` → aborted
- api.mapbox.com tiles (via mapbox-gl.js) → `xhr` → aborted

Also:
- Derive the "same origin" from Playwright's `baseURL` fixture rather
  than hardcoding `http://localhost:4173`, so a port change in the
  config doesn't silently invert the rule.
- Update the "Not exercised" caveat to reflect the new behavior:
  containers render empty and reviewers should check the Netlify
  preview deploy for real rendering.
Codex review (gpt-5.5 high) flagged that the PR-number resolver
filters on head_branch and head_repository.id but not base_ref, so a
fork branch opened as two PRs against different base branches
simultaneously could route the comment to the wrong PR. Fix options
were all ugly (iterate + post to each, fail on >1 match); the case is
rare enough that documenting the limit is the pragmatic call.
@RisingOrange
Copy link
Copy Markdown
Contributor Author

RisingOrange commented Apr 24, 2026

Good points!

1. "Reviewers should not be misled about what has been exercised"

Addressed by three things working together:

  • Intercept external requests and serve fixtures. Prevents live-data drift between runs and lets snapshots actually exercise populated-layout paths. Details: Deterministic rendering.
  • Scope comment on each PR. A sticky comment tells reviewers exactly what was covered — counts, exclusions with reasons, caveats in the "Not exercised" section, a link into Chromatic, and a ⚠️ flag for any un-fixtured request so a new integration can't silently pass. Details: Scope comment. Live samples on fork PRs: #3 (baseline), #4 (post opt-in), #5 (warning).
  • Coverage is broad. 14/17 non-post pages and 9 representative posts; the 3 non-post opt-outs are Chromatic size cap (/quotes), post-form redirect (/submitted), and token-gated (/verify).

Residual caveats are honest: CI is en-only, fixture data ≠ live data, browser-side third-party widgets are not covered. These are called out in the comment's "Not exercised" section and documented in tests/visual/README.md.

2. "Specify routes in the source instead of separately in the test"

Done via a colocated @visualDiffEnabled comment on each page — pages default to included and opt out, posts default to excluded and opt in. Renaming a file carries its opt-in/out with it; there's no central route list to keep in sync. The scope comment uses the same scanner as the test, so what runs and what the comment reports can't drift. Details: Coverage.

What do you think?

`dotenv.config({ override: true })` was clobbering the fake
AIRTABLE_API_KEY / NOTION_API_KEY that the visual-diff workflow sets,
with the empty values from template.env. The SDKs then short-circuited
on empty key before ever making a request, so MSW had nothing to
intercept and /about, /statement, /communities snapshots rendered their
hardcoded "[FALLBACK DATA]" fallback paths instead of fixture data.
Previously the three `PauseAI Example North/South/East` records didn't
match any country name in the static `communities` list, so
`nationalGroups.find()` returned undefined for every card and the
fixture had no visible effect — snapshots were identical whether the
fixture returned records or an empty list. Now one record matches
'France' and carries an inline SVG tricolor as its image, so France's
card shows a fixture-provided flag while the rest keep the default
icon. The populated-card render path is now covered.
Adds @visualDiffEnabled: true to three posts whose embedded components
or layout aren't represented by the existing covered set: Donate
(donate.md), Doomers (pdoom.md), and the auto-generated SimpleToc
(faq.md).
# Conflicts:
#	knip.config.ts
#	pnpm-lock.yaml
When a commit lives on the repo's default branch, GitHub's
listPullRequestsAssociatedWithCommit endpoint only returns merged PRs;
open PRs with that commit at their head are filtered out. This is
unlikely in production (the feature branch is never main), but it
silently breaks the scope comment on any fork where the feature branch
happens to be default.

Switch to pulls?state=open&head={owner}:{branch}, which has no such
quirk. Trusted-inputs safety property is unchanged: head owner/branch
come from workflow_run, and we still verify head.repo.id and head.sha
against the trusted payload before posting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants