Skip to content

Replace :latest diff CI with testthat output tests#28

Merged
annakrystalli merged 3 commits into
mainfrom
ak/predevals-output-tests/23
May 21, 2026
Merged

Replace :latest diff CI with testthat output tests#28
annakrystalli merged 3 commits into
mainfrom
ak/predevals-output-tests/23

Conversation

@annakrystalli
Copy link
Copy Markdown
Member

@annakrystalli annakrystalli commented May 19, 2026

Summary

  • Replaces the :latest diff test step (which broke on any schema change since the prior image cannot produce output for a schema it doesn't know) with a fixture-based testthat suite that asserts the docker pipeline's predevals output shape: file presence per (target, eval_set, disaggregation), predevals-options.json contract (target list, per-target metrics after the _scaled_relative_skill rewrite), and non-empty scores.csv.
  • Test fixtures: hubverse-org/dashboard-test-hub + hubverse-org/dashboard-test-hub-dashboard (both pinned to main). 35 testthat assertions pass on the current three-target config (wk inc flu hosp, wk inc flu death, wk flu hosp rate category).
  • Adds testthat as a renv dependency. Lockfile also has minor / patch bumps to seven transitives required by testthat; fs is the only major bump (1.6.6 to 2.1.0), API used in create-predevals-data.R (fs::path, fs::dir_ls) is stable.
  • Intentionally does not re-test what hubPredEvalsData / hubEvals / scoringutils unit-test (column types, value bounds, all-NA guards). This CI's job is packaging + structural contract, not numerical correctness.
  • Tactical pin: also pins rocker/r-ver:4 to 4.5.2. The base tag floated to R 4.6.0 on 2026-04-24 and the pinned source packages in renv.lock no longer compile against R 4.6 (rlang / vctrs / arrow private-API removals). Pure unblock so CI on this PR can run; proper fix (configure PPM binaries + refresh lockfile) tracked in Production Docker build failing on arrow source compile after rocker R 4.6 base bump #24.

Closes #23.

Test plan

  • Build job completes with the pinned R 4.5.2 base (verifies new renv.lock, including the fs major bump, restores cleanly inside the image)
  • Test job runs the pipeline against dashboard-test-hub + dashboard-test-hub-dashboard main, then test.R runs 35 testthat assertions and they all pass
  • Confirm publish job does NOT run on this PR
  • (After merge) Confirm publish job does NOT run on push to main either; only on tag push

Follow-ups

The previous test job compared a PR's image output against `:latest`'s
output of the same hub, which broke on any schema change (the `:latest`
image cannot emit output conformant to a schema it doesn't yet know).
That circular dependency is exactly what the dashboard release-deployment
RFC (2026-03-10) calls out: the output schema is the contract, not a
prior image's output.

Closes #23.

Changes:
- `scripts/test.R`: thin wrapper that calls
  `testthat::test_file(stop_on_failure = TRUE)`, same shape as r-lib's
  check-r-package step.
- `tests/testthat/test-predevals-output.R`: assertions against the
  pipeline output. Pinned to dashboard-test-hub +
  dashboard-test-hub-dashboard as versioned fixtures. Tests the
  predevals-options.json contract (presence, target list, per-target
  metrics after the `_scaled_relative_skill` rewrite) and that the
  expected scores.csv files exist and are non-empty. Intentionally does
  not re-test what hubPredEvalsData / hubEvals / scoringutils unit-test
  (column types, value bounds, all-NA guards).
- `.github/workflows/build-container.yaml`: test job now clones
  dashboard-test-hub, fetches the live predevals-config from
  dashboard-test-hub-dashboard main, runs the pipeline against the PR
  image, then runs `test.R`. No third-party actions.
- `Dockerfile`: copies `tests/testthat/test-predevals-output.R` into the
  image alongside `scripts/test.R`.
- `DESCRIPTION` + `renv.lock`: adds testthat (test-time dependency).
  Lockfile also has minor / patch bumps to seven transitives required by
  testthat. fs is the only major bump (1.6.6 to 2.1.0). The fs API used
  in scripts/create-predevals-data.R (`fs::path`, `fs::dir_ls`) is
  stable across this bump.

Follow-ups tracked in:
- #27 (update expected_* for rps + transform-suffixed columns once
  hubPredEvalsData v1.1.0 lands on :latest)
`rocker/r-ver:4` floated to R 4.6.0 on 2026-04-24, and the pinned
source packages in renv.lock don't compile against R 4.6 (rlang /
vctrs / arrow all hit private-API removals). The production Dockerfile
uses CRAN as its only repo, so renv has to compile from source rather
than pull binaries.

Tactical pin so this PR's build job can succeed. The proper fix
(configure p3m.dev as a binary repo + refresh renv.lock) is tracked in
#24; this pin should be revisited then.

Refs #24.
testthat::test_file() changes the working directory to the test file's
directory before running. When test.R was invoked with a relative `-o`
(as the CI workflow does with `-o out`), the env var still held the
relative path, so test-predevals-output.R resolved `out/...` against
/usr/local/bin/ instead of /project/. Tests failed with "out/predevals-
options.json: No such file or directory" even though create-predevals-
data.R had just written it.

Resolve to absolute via normalizePath() before exporting. Smoke-tested
against the workflow's relative-path invocation pattern; 35/35 pass.
Copy link
Copy Markdown
Contributor

@matthewcornell matthewcornell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great, Anna. The strategy change is smart. I'm not super well-versed in the repo, but I don't see anything worrisome that stands out.

@annakrystalli annakrystalli merged commit 2d3bb69 into main May 21, 2026
5 checks passed
@annakrystalli annakrystalli deleted the ak/predevals-output-tests/23 branch May 21, 2026 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace comparison-based CI with independent output-contract validation

2 participants