Skip to content

Run differential tests on GitHub runner#15

Merged
jserv merged 1 commit into
mainfrom
ci-refine
Jun 15, 2026
Merged

Run differential tests on GitHub runner#15
jserv merged 1 commit into
mainfrom
ci-refine

Conversation

@jserv

@jserv jserv commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

.github/workflows/differential.yml: new workflow with five parallel jobs (motif / violawww / mosaic / osiris / xfig) that each build the downstream target twice on the same ubuntu-24.04 runner - once against the apt-installed system libX11 stack and once against libx11-compat - then screenshot-compare under Xvfb. Replaces the SSH-to-node11 path the differential blocks in ci.yml used to gate on, so the differential diff is reproducible on stock GH infrastructure without an external host. Each job pulls its target-specific apt set on top of the shared COMMON_BUILD_PKGS / SYSTEM_X11_DEV_PKGS / DIFFERENTIAL_PKGS lists, shares the upstream-src cache key with ci.yml so a warm CI run primes this workflow (and vice versa), and adds a 12-minute hard timeout so a stuck Xvfb cannot stall the queue. Motif and osiris additionally cache the system-side build tree across runs keyed on the configure inputs, which lets a cold PR restore the post-configure object set from the most recent main run instead of rerunning autoreconf and meson cold.


Summary by cubic

Run screenshot differential tests on GitHub-hosted runners in a new parallel workflow for Motif, ViolaWWW, Mosaic, Osiris, and Xfig. Drops the SSH host, runs double-builds and captures in parallel, and stabilizes deps, logs, thresholds, and timings for reproducible diffs.

  • New Features

    • Added .github/workflows/differential.yml with 5 parallel jobs on ubuntu-24.04; each double-builds (system vs libx11-compat) and compares under Xvfb with a 12-minute timeout. Runs as separate checks; ci.yml drops inline differential gates. Shares caches with ci.yml; Motif/Osiris also cache the system-side build.
    • Scripts support local mode (--local / *_DIFF_LOCAL=1) with safe-path checks and local compare by default. Builds and captures run concurrently on two Xvfb displays; upstream sources are pre-extracted; failures surface with clear log tailing; ccache is wrapped via PATH with CC=gcc. Osiris caps ninja parallelism for runner stability.
    • Makefiles add XXX_DIFF_LOCAL knobs that propagate to scripts and toggle the default compare location.
    • Xfig differential compares the startup screen only.
  • Bug Fixes

    • Install required deps: libxpm-dev, xbitmaps, and legacy fonts (xfonts-base, xfonts-100dpi, xfonts-75dpi, xfonts-scalable) to avoid configure crashes and font-driven diffs.
    • Improve isolation and diagnostics: per-side $DISPLAY (separate Xvfb), always stage both sides’ replay-* and xvfb-*.log, and guard local runs against unsafe --remote-root.
    • Xfig: run configure/make from the source tree.
    • Calibrate timing/thresholds for parallel load: Motif MAE=0.15, changed=0.32; ViolaWWW wait-window 15s + 2s settle + 3s post-wheel; Mosaic wait-window 15s.

Written for commit 10d4301. Summary will update on new commits.

Review in cubic

cubic-dev-ai[bot]

This comment was marked as resolved.

@jserv jserv force-pushed the ci-refine branch 3 times, most recently from d4bf16c to 2a230e8 Compare June 15, 2026 21:27
.github/workflows/differential.yml: new workflow with five parallel
jobs (motif / violawww / mosaic / osiris / xfig) that each build the
downstream target twice on the same ubuntu-24.04 runner - once against
the apt-installed system libX11 stack and once against libx11-compat -
then screenshot-compare under Xvfb. Replaces the SSH-to-node11 path
the differential blocks in ci.yml used to gate on, so the differential
diff is reproducible on stock GH infrastructure without an external
host. Each job pulls its target-specific apt set on top of the shared
COMMON_BUILD_PKGS, SYSTEM_X11_DEV_PKGS, MOTIF_BUILD_PKGS,
OSIRIS_BUILD_PKGS, and DIFFERENTIAL_PKGS lists, shares the
upstream-src cache key with ci.yml so a warm CI run primes this
workflow, adds a 12 minute hard timeout, and the motif and osiris
jobs additionally cache the system-side build tree across runs keyed
on the configure inputs. SYSTEM_X11_DEV_PKGS includes libxpm-dev
(Motif configure requires xpm.pc via pkg-config) and xbitmaps (Motif
lib/Xm/I18List.c includes X11/bitmaps/gray directly).
DIFFERENTIAL_PKGS includes xfonts-base / 100dpi / 75dpi / scalable
because Motif demos and ViolaWWW request bitmap font families via
XLoadQueryFont and without these the system-side fall-through to 9x15
either crashes ViolaWWW's bundled xloadimage or blows up the diff
threshold against libx11-compat's SDL_ttf path.

.github/workflows/ci.yml: drops the steps.*-differential blocks and
their matching artifact uploads. The remaining ci.yml jobs (build,
sanitize, lint, debug-build, plus per-target compat-side smoke runs)
stay unchanged so the fast PR feedback loop is not delayed by the
heavier double-build pipeline.

scripts/run-{motif,mosaic,violawww,osiris,xfig}-differential-tests.py:
adds --local mode that short-circuits the rsync + ssh round trips and
runs the build / capture / compare pipeline on the current host via
"sh -s". The compat-side and system-side builds run in parallel
background subshells with explicit "status=0; wait \$pid || status=\$?"
propagation so a "set -eu" payload surfaces background failures
instead of swallowing diagnostics; the upstream source is pre-extracted
serially via a make stamp target so the two parallel sides do not
race on autoreconf / tarball extraction. Screenshot capture is split
the same way: two Xvfb instances on display N and N+1 let system-side
and compat-side captures run concurrently rather than serially, and
each capture_* shell helper derives display_num from the subshell's
\$DISPLAY env so the two sides target their own X server even though
the helper itself is shared. xfig's system-side configure does an
explicit cd into \$system_build/source first because autotools
generates Makefiles in the current working directory, not at the
configure script's path. ccache wrapping goes through
PATH=/usr/lib/ccache:\$PATH and CC=gcc so recursive Makefiles that
pass \$(CC) unquoted to sub-make (Mosaic's makefiles/Makefile.linux
is one) do not tokenize the value into "CC=ccache" + "gcc-target".
Osiris caps ninja parallelism via OSIRIS_NINJA="ninja -j\$JOBS" on
the compat side and "ninja -j\$JOBS -C" on the system side so the
default vCPU+2 heuristic does not launch 12 compiler processes across
both sides on a 4-vCPU runner.

The post-wait diagnostic blocks surface both build sides' failure
messages and tail both sides' logs when either side is non-zero, then
exit with the first non-zero status. Same pattern for the post-capture
diagnostic block. After the wait, Xvfb logs and any partial replay-*
trees are staged into \$remote_root/logs so the artifact upload picks
them up regardless of capture success. A CLI > env > --local-default
> SSH-default precedence chain resolves --compare-location and
--remote-root so XXX_DIFF_LOCAL=1 alone (no env override) switches
the comparison location to local without surprising the SSH-mode
developer flow.

The xfig differential captures the startup screen only. The
xfig-draw-line replay clicks (32, 96) targeting the line tool icon,
but xfig 3.2.9a's drawing toolbox now has a "Drawing" section label
that pushes the tool grid down so (32, 96) hits empty label space
and selects no tool. The compat-side internal input backend masks the
miss because libx11-compat's event-injection path handles the click
via widget translation, but xdotool on the system side sends a literal
X click at the wrong coord and no drawing happens. The smoke job
(mk/xfig.mk:check-smoke-xfig) keeps the compat-side draw-line
coverage; the differential compares the startup screen, which is the
meaningful library-parity check this workflow is supposed to gate.

mk/{motif,mosaic,violawww,osiris,xfig}.mk: adds XXX_DIFF_LOCAL knob
(default 0) that threads --local through to the script, and changes
XXX_DIFF_COMPARE_LOCATION default from unconditional "remote" to
"local" when XXX_DIFF_LOCAL=1 so the make wrapper does not silently
override the script's local-mode resolution. mk/motif.mk additionally
sets MOTIF_DIFF_MAE_THRESHOLD to 0.15 and MOTIF_DIFF_CHANGED_THRESHOLD
to 0.32 to admit the GH runner baseline; the worst observed deltas
(ColorSel/colordemo at MAE 0.133, changed 0.293) are real
widget-geometry parity gaps between system Motif Xm and
libx11-compat-built Motif Xm (ColorSel left-pane width, Ext18List row
metrics, TabStack header padding, workspace/wsm geometry) that need
follow-up work inside libx11-compat / our Motif Xm patches rather
than CI-side regressions.

tests/ui/replays/violawww-scroll-system.replay: wait-window budget
bumped from 3 seconds to 15 seconds with a 2 second post-wait-window
settle delay before the first interaction, and post-wheel delay from
1 second to 3 seconds. The smoke-job budgets were calibrated for a
single dedicated Xvfb; the differential workflow drives two Xvfb
instances and two captures concurrently in the same job, so vw
startup contends for CPU and disk against the system-side Motif and
vw builds, the window name can match before vw finishes parsing the
fixture HTML (so a wheel-down arriving immediately afterward gets
queued against a non-scrollable layout), and libx11-compat's
internal-backend event loop needs more wallclock to drain the
wheel-down sequence before the post-wheel screenshot is taken.
@jserv jserv merged commit c950eb6 into main Jun 15, 2026
12 checks passed
@jserv jserv deleted the ci-refine branch June 15, 2026 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant