Replace :latest diff CI with testthat output tests by annakrystalli · Pull Request #28 · hubverse-org/hubPredEvalsData-docker

annakrystalli · 2026-05-19T13:50:16Z

Summary

Replaces the :latest diff test step (which broke on any schema change since the prior image cannot produce output for a schema it doesn't know) with a fixture-based testthat suite that asserts the docker pipeline's predevals output shape: file presence per (target, eval_set, disaggregation), predevals-options.json contract (target list, per-target metrics after the _scaled_relative_skill rewrite), and non-empty scores.csv.
Test fixtures: hubverse-org/dashboard-test-hub + hubverse-org/dashboard-test-hub-dashboard (both pinned to main). 35 testthat assertions pass on the current three-target config (wk inc flu hosp, wk inc flu death, wk flu hosp rate category).
Adds testthat as a renv dependency. Lockfile also has minor / patch bumps to seven transitives required by testthat; fs is the only major bump (1.6.6 to 2.1.0), API used in create-predevals-data.R (fs::path, fs::dir_ls) is stable.
Intentionally does not re-test what hubPredEvalsData / hubEvals / scoringutils unit-test (column types, value bounds, all-NA guards). This CI's job is packaging + structural contract, not numerical correctness.
Tactical pin: also pins rocker/r-ver:4 to 4.5.2. The base tag floated to R 4.6.0 on 2026-04-24 and the pinned source packages in renv.lock no longer compile against R 4.6 (rlang / vctrs / arrow private-API removals). Pure unblock so CI on this PR can run; proper fix (configure PPM binaries + refresh lockfile) tracked in Production Docker build failing on arrow source compile after rocker R 4.6 base bump #24.

Closes #23.

Test plan

Build job completes with the pinned R 4.5.2 base (verifies new renv.lock, including the fs major bump, restores cleanly inside the image)
Test job runs the pipeline against dashboard-test-hub + dashboard-test-hub-dashboard main, then test.R runs 35 testthat assertions and they all pass
Confirm publish job does NOT run on this PR
(After merge) Confirm publish job does NOT run on push to main either; only on tag push

Follow-ups

Production Docker build failing on arrow source compile after rocker R 4.6 base bump #24 (the proper rocker / renv.lock fix; this PR's R 4.5.2 pin is tactical pending that work)
Update expected_* lists once hubPredEvalsData v1.1.0 lands on :latest #27 (update expected_* lists for rps + transform-suffixed columns once hubPredEvalsData v1.1.0 lands on :latest)

The previous test job compared a PR's image output against `:latest`'s output of the same hub, which broke on any schema change (the `:latest` image cannot emit output conformant to a schema it doesn't yet know). That circular dependency is exactly what the dashboard release-deployment RFC (2026-03-10) calls out: the output schema is the contract, not a prior image's output. Closes #23. Changes: - `scripts/test.R`: thin wrapper that calls `testthat::test_file(stop_on_failure = TRUE)`, same shape as r-lib's check-r-package step. - `tests/testthat/test-predevals-output.R`: assertions against the pipeline output. Pinned to dashboard-test-hub + dashboard-test-hub-dashboard as versioned fixtures. Tests the predevals-options.json contract (presence, target list, per-target metrics after the `_scaled_relative_skill` rewrite) and that the expected scores.csv files exist and are non-empty. Intentionally does not re-test what hubPredEvalsData / hubEvals / scoringutils unit-test (column types, value bounds, all-NA guards). - `.github/workflows/build-container.yaml`: test job now clones dashboard-test-hub, fetches the live predevals-config from dashboard-test-hub-dashboard main, runs the pipeline against the PR image, then runs `test.R`. No third-party actions. - `Dockerfile`: copies `tests/testthat/test-predevals-output.R` into the image alongside `scripts/test.R`. - `DESCRIPTION` + `renv.lock`: adds testthat (test-time dependency). Lockfile also has minor / patch bumps to seven transitives required by testthat. fs is the only major bump (1.6.6 to 2.1.0). The fs API used in scripts/create-predevals-data.R (`fs::path`, `fs::dir_ls`) is stable across this bump. Follow-ups tracked in: - #27 (update expected_* for rps + transform-suffixed columns once hubPredEvalsData v1.1.0 lands on :latest)

`rocker/r-ver:4` floated to R 4.6.0 on 2026-04-24, and the pinned source packages in renv.lock don't compile against R 4.6 (rlang / vctrs / arrow all hit private-API removals). The production Dockerfile uses CRAN as its only repo, so renv has to compile from source rather than pull binaries. Tactical pin so this PR's build job can succeed. The proper fix (configure p3m.dev as a binary repo + refresh renv.lock) is tracked in #24; this pin should be revisited then. Refs #24.

testthat::test_file() changes the working directory to the test file's directory before running. When test.R was invoked with a relative `-o` (as the CI workflow does with `-o out`), the env var still held the relative path, so test-predevals-output.R resolved `out/...` against /usr/local/bin/ instead of /project/. Tests failed with "out/predevals- options.json: No such file or directory" even though create-predevals- data.R had just written it. Resolve to absolute via normalizePath() before exporting. Smoke-tested against the workflow's relative-path invocation pattern; 35/35 pass.

matthewcornell

I think this looks great, Anna. The strategy change is smart. I'm not super well-versed in the repo, but I don't see anything worrisome that stands out.

annakrystalli added 3 commits May 19, 2026 16:49

matthewcornell approved these changes May 20, 2026

View reviewed changes

annakrystalli merged commit 2d3bb69 into main May 21, 2026
5 checks passed

annakrystalli deleted the ak/predevals-output-tests/23 branch May 21, 2026 08:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace :latest diff CI with testthat output tests#28

Replace :latest diff CI with testthat output tests#28
annakrystalli merged 3 commits into
mainfrom
ak/predevals-output-tests/23

annakrystalli commented May 19, 2026 •

edited

Loading

Uh oh!

matthewcornell left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

annakrystalli commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Follow-ups

Uh oh!

matthewcornell left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

annakrystalli commented May 19, 2026 •

edited

Loading