Add per-PR visual diff view (Playwright + Chromatic) by RisingOrange · Pull Request #785 · PauseAI/pauseai-website

RisingOrange · 2026-04-20T11:14:47Z

Adds a per-PR visual diff view: every PR gets side-by-side before/after renders of the site's pages, so reviewers can see the visual impact of a change without clicking through Netlify previews page-by-page. Under the hood this is visual regression testing via Playwright + Chromatic.

Non-blocking / informational. The diff view is a signal reviewers can glance at; it never blocks a merge. The workflow exits 0 regardless of diffs, and nothing in branch protection gates on it. Adopting this shouldn't change the merge flow, just surface information.

Why

Visual regressions slip past manual QA (e.g. a past /outcomes layout bug shipped unnoticed — /outcomes is in the covered set, so this setup would have surfaced it).
Netlify previews only enable manual inspection; the reviewer has to open each affected page themselves and remember what it looked like before. A diff view flips that: reviewers see exactly what changed, automatically.
Works for catching regressions and reviewing intentional changes. Example: #702 changed the post banner aspect ratio across all posts — with this setup, a reviewer would have seen the change applied to /if-anyone-builds-it-campaign at a glance, rather than spot-checking via Netlify.

How it works

On each PR, GitHub Actions runs pnpm build && pnpm preview and navigates Playwright through the covered routes.
@chromatic-com/playwright captures HTML + asset archives of each page.
chromaui/action uploads archives; Chromatic re-renders them server-side in a controlled browser and diffs against the baseline.
A sticky PR comment (live example) summarizes what the run covered — counts, per-category ratios, exclusions with reasons, a link to the Chromatic review UI, and a ⚠️ warning section when any un-fixtured server-side request falls through.
Chromatic posts a separate "UI Review" check showing the live diff status (N changes to accept, accepted baselines, etc.).
Reviewers click the Chromatic link to accept/deny per snapshot. Accepted snapshots become the new baseline once main moves.

No baselines in git — Chromatic stores them per-branch and promotes on merge. No regeneration workflow, no Linux-vs-macOS font-parity problems, no PR churn from binary PNGs.

Coverage

22 routes × 2 viewports = 44 snapshots per run. The covered set is discovered by walking src/routes/; each page controls its own inclusion via a @visualDiffEnabled annotation right in its source file.

Routes and posts opt in or out via a @visualDiffEnabled annotation in the source file itself:

<!-- @visualDiffEnabled: false — admin-only tool -->

---
title: My Post
---

<!-- @visualDiffEnabled: true — post with a banner image -->

Pages under src/routes/ default to included. Opt out with @visualDiffEnabled: false in +page.svelte or +page.ts.
Posts under src/posts/ (rendered via [slug]) default to excluded, since they share a layout; only representatives of distinct variants (banner image, link-heavy, long-form, embedded Svelte components) opt in.

Renames carry the annotation with the file, so the test set can't silently drift from reality. tests/visual/routes.ts and tests/visual/scope-comment.ts share one scanner (tests/visual/annotations.ts) — what the test actually runs and what the PR comment reports can't diverge.

Current opt-outs (grep @visualDiffEnabled: false):

/submitted — redirects on mount
/verify — token-dependent
/quotes — ~28M-pixel desktop snapshot, over Chromatic's 25M cap

Deterministic rendering — how snapshots are kept stable

A visual-regression check is only useful if the same code produces the same screenshot every run. Several measures together keep snapshots stable:

External data (the biggest source of noise) is intercepted at two boundaries:

MSW-node (tests/visual/msw-setup.ts) intercepts outbound HTTP from the Node process during pnpm build (prerender, remote prerender functions) and pnpm preview (on-demand SSR). Notion, Airtable, and Substack RSS are served from pinned fixtures in tests/visual/fixtures/. Catch-all handlers return empty results for un-fixtured Airtable/Notion endpoints; when one is hit (or a wholly uncovered host is bypassed), the scope comment's ⚠️ section surfaces it.
Playwright page.route() default-denies any cross-origin iframe, XHR, or fetch request from the browser. Tally, Luma, Mapbox, segment.io, and any new third-party widget someone adds in a future PR render as an empty container — no live third-party state leaks in. Cross-origin resources (Cloudinary images, fonts, inlang CDN scripts) pass through unchanged.

Non-HTTP sources of drift are neutralized in playwright.config.ts and the test harness:

reducedMotion: 'reduce' — animations and CSS transitions complete instantly (the site already honors prefers-reduced-motion in src/styles/reset.css).
locale: 'en-US' and timezoneId: 'UTC' pinned at the browser context level, so any Intl / date formatting is deterministic.
img[loading="lazy"] is flipped to eager before each screenshot, and the run waits for document.fonts.ready plus networkidle, so fonts and below-the-fold images are fully resolved at capture time.

Sizing — the mobile project uses deviceScaleFactor: 1 (overriding Pixel 7's default DPR of 2.625) so long-page snapshots stay under Chromatic's 25M-pixel limit without losing layout fidelity.

A green diff on /about means "no layout regression given the fixture people list," not "the Airtable integration works." The PR comment spells out this distinction.

Scope comment

Posted on each PR by a companion workflow (.github/workflows/visual-diff-comment.yml), updated in-place on re-runs. Baseline state and warning state on fork PRs:

RisingOrange/pauseai-website#3 — baseline comment, no warnings.
RisingOrange/pauseai-website#4 — PR opts a new post in via frontmatter annotation; comment reflects the updated count.
RisingOrange/pauseai-website#5 — PR introduces a deliberately un-fixtured outbound call; comment's ⚠️ section surfaces it.

Sample comment body (baseline)

**Visual diff coverage**
_commit [`abc1234`](https://…/commit/abc1234def) · [run #12345](https://…/actions/runs/12345)_

[Review snapshots in Chromatic](https://www.chromatic.com/build?appId=…&number=66)

- **Pages:** 16/19 covered. _Included_ by default.
- **Posts:** 6/101 covered. _Excluded_ by default (posts share a layout).

<details><summary>Route breakdown</summary>

- **Pages excluded:**
  - `/quotes`: ~28M-pixel desktop snapshot is over Chromatic's 25M cap
  - `/submitted`: post-form state
  - `/verify`: token-dependent
- **Posts covered:**
  - `/action`
  - `/funding`
  - `/if-anyone-builds-it-campaign`
  - `/join`
  - `/learn`
  - `/values`

</details>

<details><summary>Not exercised</summary>

- Pages that fetch from external APIs render against pinned fixtures, not live data.
- Cross-origin iframes and API calls (Tally, Luma, Mapbox, analytics, newly added widgets) are aborted at the browser boundary. Containers render as empty. Check the Netlify preview deploy to verify their real rendering.
- Only `en` locale is built.

</details>

[`tests/visual/README.md`](…) explains the annotation syntax and the coverage model.

Sample warning section (when an un-fixtured request is hit)

<details open><summary>⚠️ 1 external request not covered by fixtures</summary>

- **Endpoints:**
  - 1× `GET https://api.example.com/v1/data` (no handler; allowed through)
- **Fix:** add an explicit handler + fixture in `tests/visual/msw-setup.ts` (so the page renders its populated layout against stable data), or exclude the page with `@visualDiffEnabled: false`.
- **Why it matters:** without a fixture, CI gets either a 401 (endpoint needs auth) or drifting live data, so the snapshot won't match production.

</details>

Safety of the comment pipeline

The posting workflow (visual-diff-comment.yml) triggers on workflow_run and runs in the base-repo context with pull-requests: write. Fork code never executes in the trusted context — the trusted workflow only consumes the artifact produced by the fork's run and posts its Markdown as a sticky comment. The PR number is derived from workflow_run.head_sha + head_branch + head_repository.id via the trusted GitHub API, not from anything the fork artifact could tamper with. Annotation-reason text is escaped before rendering so a malicious reason can't close the wrapping <details>, ping users, or inject links. The residual surface — fork-authored Markdown posted verbatim — is explicitly an accepted tradeoff given the contributor model (known volunteers, worst case = visible-and-recoverable comment spam).

CI behavior + fork PRs

Runs on PRs and pushes to main.
Non-blocking. exitZeroOnChanges: true — workflow exits 0 even when Chromatic detects diffs. Chromatic's check is informational; branch protection is not configured to require it.
exitOnceUploaded: true — CI exits after archive upload. Warm-cache runs: ~2:45–3:00 (scope comment + MSW overhead add ~30s to the original ~2:15 baseline). Cold-cache first run: ~5:30.
concurrency group cancels stale runs on new pushes.
Playwright browsers and imagetools transforms are cached.
playwright-report/ uploaded as a GHA artifact on every run (14d retention) for inspection without needing the Chromatic UI.

Fork PRs work end-to-end, including the scope comment. GitHub Actions can't pass repo secrets to fork-PR runs, so the Chromatic project token is inlined as plaintext in the workflow — this is Chromatic's own recommended pattern for fork-PR support. The scope comment posts via the workflow_run-based companion workflow described above.

Setup required for upstream adoption

Create a Chromatic project linked to this repo (free tier: 5k snapshots/month — expected usage ~1–2k/mo based on current PR volume).
Replace the inlined project token in .github/workflows/visual-diff.yml with the new project's token.
Merge this PR — the first push to main uploads initial baselines; accept them once in Chromatic and the pipeline is live.

Any repo collaborator can review and accept baselines via GitHub OAuth login — no separate Chromatic invites needed.

Local screenshot workflow — using the Playwright suite for manual before/after

The Playwright suite doubles as a local before/after screenshot tool:

pnpm exec playwright test
pnpm exec playwright show-report

Runs the suite against a fresh preview build and opens an HTML report with per-route screenshots. For before/after: run on the base branch, save playwright-report/ somewhere, switch branches, re-run, eyeball both reports. No Chromatic quota, no PR check side effects.

Known limitations

Transient mid-hydration flashes aren't caught. Static snapshots capture the settled state.
Chromium-only. Firefox- or Safari-specific rendering bugs won't surface. Adding more browsers is possible but multiplies the snapshot count.
en only — locale-dependent layout regressions are invisible.
Fixture data, not live. The diff exercises the layout path given the fixture shape, not the external integration.
Free-tier ceiling is 5,000 snapshots/month; we're well under, but worth watching as the route count grows.

Possible follow-ups — out of scope for this PR

Natural extensions once the base setup is adopted:

More browsers. Firefox and/or WebKit to catch vendor-specific rendering bugs (e.g. #664 was a Firefox-only hero color mismatch).
More viewports. A portrait desktop viewport would catch bugs like #667 (large portrait desktop incorrectly getting the mobile layout).
Banner-visible snapshots. The campaign banner is disabled in layout, so banner-styling regressions (e.g. #677) aren't surfaced. Optionally force a banner on in visual-test mode.

netlify · 2026-04-20T11:14:51Z

👷 Deploy request for pauseai pending review.

Visit the deploys page to approve it

Name	Link
🔨 Latest commit	`6133410`

Wituareard · 2026-04-22T20:33:42Z

I like the idea but setting up a project requires running the Chromatic CLI and that doesn't work on my machine because of a space in the directory path. @anthonybailey Could you set this up maybe?

anthonybailey · 2026-04-23T08:07:46Z

Probably. I'll try. Nag me.

My understanding is that pnpm build actions are executed in a new place (a machine owned by GitHub) without any secrets in environment settings.

So LLM access to localize new source content will not occur (a build without keys is en-only.) This is fine and sensible as a default.

We will continue to move as much static content generation from remote sources as possible into build. But again, it should function sensibly without keys. The basic model here seems OK. Any "but in production we do this instead" qualms are mostly answered by "at the moment we have zero diff support so this is simply new information". We do need reviewers to not be misled re what has been exercised though.

Any thoughts there @RisingOrange ? What prevents misunderstanding this? Day to day CR needs some clues future users see so they know to read some subset of this PR.

The only other thing I dislike is specifying routes separately in the test. We will rename things, accidentally changing test coverage.

Any way to have the non-default sample/excluded coverage choices specified in the source code for the individual cases?

Captures HTML+asset archives of 15 key routes × 2 viewports on each PR and uploads to Chromatic for server-side rendering and diffing against baseline. Posts a non-blocking "UI Review" check on the PR. - Routes auto-discovered from src/routes/, plus 5 markdown-post representatives; form/admin/content-churn routes explicitly excluded with WHY comments - Deterministic snapshots: /api/news mocked with a fixture, Tally iframes on /statement and /join aborted to avoid async-height churn - Chromium-only, non-blocking by design (exitZeroOnChanges) - Fork PRs supported via plaintext project token in the workflow (Chromatic's recommended pattern) - Playwright report (per-route screenshots) uploaded as GHA artifact

Without production secrets, Notion / Airtable / Substack calls during `pnpm build` (prerender, remote prerender fns) and `pnpm preview` (on-demand SSR, client-side fetches to /api/*) either fail or fall back to a couple of placeholder records — leaving snapshots too sparse to catch layout regressions in populated list/card components. Boot MSW in the Node process via `NPM_CONFIG_NODE_OPTIONS=--import` (the only way to propagate `--import` through pnpm, which otherwise overwrites NODE_OPTIONS with `--experimental-global-webcrypto`; see pnpm/pnpm#6210). The setup file is a no-op unless `VISUAL_TEST=1`. Handlers cover the two Notion databases (press, funding), the three Airtable tables (people, signatories, national-groups), the Substack RSS feed hit by /api/news, plus catch-all empty-result responses for any un-fixtured endpoint. Fake API keys are set in the workflow so the SDKs actually make HTTP requests (they short-circuit to a "no key" error otherwise, bypassing MSW entirely). All fixture data lives under a single `tests/visual/fixtures/` directory — the earlier `tests/visual/fixtures/news.json`, which was intercepted at the browser boundary via `page.route().fulfill()`, is replaced by `substack-feed.xml` served by MSW upstream of the /api/news server code. `page.route()` in smoke.spec.ts is now reserved for aborting third-party widgets (Tally, Mapbox, Luma) whose requests never reach our Node process. Clean split: MSW for fixture data, page.route for cross-origin aborts. Fixtures use realistic-shaped but unambiguously-fake data: first names paired with Example / Sample / Fixture surnames, Lorem-ipsum bios, example.org URLs, Example Foundation / Example News. Chromatic snapshots persist in its UI, so the content leaves no room to mistake a fixture attribution for a real endorsement if an image escapes context. Adds `msw` as a dev dependency and entry-points tests/visual/msw- setup.mjs in knip.config.ts so the indirect import is recognized.

Previously excluded because its chronological listing churns on any new post, same mechanism as the home page's LatestNews section — but the home page is included (its news churn is accepted as one re-approval per news-flagged post). Include /posts on the same trade-off: the cost is a re-approval in Chromatic per post add, which is trivial. Removes an inconsistency in which pages opt out vs which accept content-driven churn.

Four mobile snapshots blew past Chromatic's 25M-pixel snapshot cap with the default Pixel 7 DPR of 2.625: /dear-sir-demis-2025 (52M, 2.1× over), /posts (47M, 1.9× over), /learn (25.8M), /write (25.2M). Override deviceScaleFactor to 1 on the mobile project. The viewport is still 412 CSS px (the layout-relevant dimension); we just capture at 1× instead of 2.6× physical pixels. Mobile layout diffs don't need the hi-DPI oversampling — a layout regression is visible at either DPR. After the change, the largest mobile snapshot is 412 × 18468 ≈ 7.6M, well under the cap.

Addresses the reviewer concern that a green Chromatic check could silently imply coverage we don't actually have. On every PR, post a sticky comment summarizing what the run snapshotted: covered/excluded ratios for pages and posts, what "covered" doesn't mean (fixtures, widget aborts, en-only), and a conditional warning when un-fixtured requests fall through to MSW catch-alls / bypass. Plumbing: - tests/visual/scope-comment.ts renders the comment body from repo state (routes, posts, annotations) + the MSW warning log. - msw-setup.mjs listens for request:unhandled and msw-handlers.ts's catch-alls log to MSW_WARN_LOG when set, so the warning section has data to render. - visual-diff.yml generates the artifact (body.md + pr-number.txt) on every PR run. - visual-diff-comment.yml (new) triggers on workflow_run, runs in the base-repo context with pull-requests: write, downloads the artifact, posts or updates a sticky comment via marocchino/sticky-pull-request- comment. Works for fork PRs without granting forks write permissions. - README updated with a disclaimer about browser-side third-party embeds being out of scope for the MSW-based warning.

The previous page.route setup aborted three specific hosts (tally.so, lu.ma, api.mapbox.com). A new third-party widget added in a future PR would have loaded normally in CI, potentially leaking live state into snapshots with no signal. Replace with a generic rule: abort any cross-origin request whose resourceType is `document` (iframe), `xhr`, or `fetch`. Pass through cross-origin resources — images (Cloudinary), fonts, scripts (inlang CDN), stylesheets — since those are embedded assets whose content is part of the rendered design we want to capture. Server-side external calls remain handled by MSW-node. All three previous specific aborts fall under the generic rule: - tally.so iframes → `document` → aborted - lu.ma iframes → `document` → aborted - api.mapbox.com tiles (via mapbox-gl.js) → `xhr` → aborted Also: - Derive the "same origin" from Playwright's `baseURL` fixture rather than hardcoding `http://localhost:4173`, so a port change in the config doesn't silently invert the rule. - Update the "Not exercised" caveat to reflect the new behavior: containers render empty and reviewers should check the Netlify preview deploy for real rendering.

Codex review (gpt-5.5 high) flagged that the PR-number resolver filters on head_branch and head_repository.id but not base_ref, so a fork branch opened as two PRs against different base branches simultaneously could route the comment to the wrong PR. Fix options were all ugly (iterate + post to each, fail on >1 match); the case is rare enough that documenting the limit is the pragmatic call.

RisingOrange · 2026-04-24T16:11:03Z

Good points!

1. "Reviewers should not be misled about what has been exercised"

Addressed by three things working together:

Intercept external requests and serve fixtures. Prevents live-data drift between runs and lets snapshots actually exercise populated-layout paths. Details: Deterministic rendering.
Scope comment on each PR. A sticky comment tells reviewers exactly what was covered — counts, exclusions with reasons, caveats in the "Not exercised" section, a link into Chromatic, and a ⚠️ flag for any un-fixtured request so a new integration can't silently pass. Details: Scope comment. Live samples on fork PRs: #3 (baseline), #4 (post opt-in), #5 (warning).
Coverage is broad. 14/17 non-post pages and 9 representative posts; the 3 non-post opt-outs are Chromatic size cap (/quotes), post-form redirect (/submitted), and token-gated (/verify).

Residual caveats are honest: CI is en-only, fixture data ≠ live data, browser-side third-party widgets are not covered. These are called out in the comment's "Not exercised" section and documented in tests/visual/README.md.

2. "Specify routes in the source instead of separately in the test"

Done via a colocated @visualDiffEnabled comment on each page — pages default to included and opt out, posts default to excluded and opt in. Renaming a file carries its opt-in/out with it; there's no central route list to keep in sync. The scope comment uses the same scanner as the test, so what runs and what the comment reports can't drift. Details: Coverage.

What do you think?

`dotenv.config({ override: true })` was clobbering the fake AIRTABLE_API_KEY / NOTION_API_KEY that the visual-diff workflow sets, with the empty values from template.env. The SDKs then short-circuited on empty key before ever making a request, so MSW had nothing to intercept and /about, /statement, /communities snapshots rendered their hardcoded "[FALLBACK DATA]" fallback paths instead of fixture data.

Previously the three `PauseAI Example North/South/East` records didn't match any country name in the static `communities` list, so `nationalGroups.find()` returned undefined for every card and the fixture had no visible effect — snapshots were identical whether the fixture returned records or an empty list. Now one record matches 'France' and carries an inline SVG tricolor as its image, so France's card shows a fixture-provided flag while the rest keep the default icon. The populated-card render path is now covered.

Adds @visualDiffEnabled: true to three posts whose embedded components or layout aren't represented by the existing covered set: Donate (donate.md), Doomers (pdoom.md), and the auto-generated SimpleToc (faq.md).

# Conflicts: # knip.config.ts # pnpm-lock.yaml

When a commit lives on the repo's default branch, GitHub's listPullRequestsAssociatedWithCommit endpoint only returns merged PRs; open PRs with that commit at their head are filtered out. This is unlikely in production (the feature branch is never main), but it silently breaks the scope comment on any fork where the feature branch happens to be default. Switch to pulls?state=open&head={owner}:{branch}, which has no such quirk. Trusted-inputs safety property is unchanged: head owner/branch come from workflow_run, and we still verify head.repo.id and head.sha against the trusted payload before posting.

Wituareard force-pushed the visual-diff branch from 6c31c23 to 602c389 Compare April 22, 2026 21:13

RisingOrange added 7 commits April 24, 2026 17:22

RisingOrange force-pushed the visual-diff branch from 3b24d8c to bd3b964 Compare April 24, 2026 15:23

RisingOrange added 3 commits April 24, 2026 17:26

Update visual-diff README for default-deny + scope comment

7778906

Note bypass-vs-catch-all policy split in msw-handlers

7dd119e

Merge MSW setup and handlers into one msw-setup.ts

f74adce

RisingOrange force-pushed the visual-diff branch from d8c68cb to 3218312 Compare April 24, 2026 15:48

Add file docstrings to routes.ts and smoke.spec.ts

e3e04e7

RisingOrange force-pushed the visual-diff branch from 3218312 to e3e04e7 Compare April 24, 2026 15:49

RisingOrange added 4 commits April 24, 2026 19:40

Opt donate/pdoom/faq into visual-diff coverage

127fc33

Adds @visualDiffEnabled: true to three posts whose embedded components or layout aren't represented by the existing covered set: Donate (donate.md), Doomers (pdoom.md), and the auto-generated SimpleToc (faq.md).

Merge remote-tracking branch 'origin/main' into visual-diff

13c5c27

# Conflicts: # knip.config.ts # pnpm-lock.yaml

RisingOrange mentioned this pull request Apr 26, 2026

Test: visual-diff GHAs after merge + new post coverage RisingOrange/pauseai-website#6

Open

RisingOrange added 2 commits April 26, 2026 18:10

Merge remote-tracking branch 'origin/main' into visual-diff

6133410

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add per-PR visual diff view (Playwright + Chromatic)#785

Add per-PR visual diff view (Playwright + Chromatic)#785
RisingOrange wants to merge 17 commits intoPauseAI:mainfrom
RisingOrange:visual-diff

RisingOrange commented Apr 20, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

Wituareard commented Apr 22, 2026

Uh oh!

anthonybailey commented Apr 23, 2026

Uh oh!

RisingOrange commented Apr 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

RisingOrange commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

How it works

Coverage

Scope comment

Safety of the comment pipeline

Setup required for upstream adoption

Known limitations

Uh oh!

netlify Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👷 Deploy request for pauseai pending review.

Uh oh!

Wituareard commented Apr 22, 2026

Uh oh!

anthonybailey commented Apr 23, 2026

Uh oh!

RisingOrange commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RisingOrange commented Apr 20, 2026 •

edited

Loading

netlify Bot commented Apr 20, 2026 •

edited

Loading

RisingOrange commented Apr 24, 2026 •

edited

Loading