Replace :latest diff CI with testthat output tests#28
Merged
Conversation
The previous test job compared a PR's image output against `:latest`'s output of the same hub, which broke on any schema change (the `:latest` image cannot emit output conformant to a schema it doesn't yet know). That circular dependency is exactly what the dashboard release-deployment RFC (2026-03-10) calls out: the output schema is the contract, not a prior image's output. Closes #23. Changes: - `scripts/test.R`: thin wrapper that calls `testthat::test_file(stop_on_failure = TRUE)`, same shape as r-lib's check-r-package step. - `tests/testthat/test-predevals-output.R`: assertions against the pipeline output. Pinned to dashboard-test-hub + dashboard-test-hub-dashboard as versioned fixtures. Tests the predevals-options.json contract (presence, target list, per-target metrics after the `_scaled_relative_skill` rewrite) and that the expected scores.csv files exist and are non-empty. Intentionally does not re-test what hubPredEvalsData / hubEvals / scoringutils unit-test (column types, value bounds, all-NA guards). - `.github/workflows/build-container.yaml`: test job now clones dashboard-test-hub, fetches the live predevals-config from dashboard-test-hub-dashboard main, runs the pipeline against the PR image, then runs `test.R`. No third-party actions. - `Dockerfile`: copies `tests/testthat/test-predevals-output.R` into the image alongside `scripts/test.R`. - `DESCRIPTION` + `renv.lock`: adds testthat (test-time dependency). Lockfile also has minor / patch bumps to seven transitives required by testthat. fs is the only major bump (1.6.6 to 2.1.0). The fs API used in scripts/create-predevals-data.R (`fs::path`, `fs::dir_ls`) is stable across this bump. Follow-ups tracked in: - #27 (update expected_* for rps + transform-suffixed columns once hubPredEvalsData v1.1.0 lands on :latest)
`rocker/r-ver:4` floated to R 4.6.0 on 2026-04-24, and the pinned source packages in renv.lock don't compile against R 4.6 (rlang / vctrs / arrow all hit private-API removals). The production Dockerfile uses CRAN as its only repo, so renv has to compile from source rather than pull binaries. Tactical pin so this PR's build job can succeed. The proper fix (configure p3m.dev as a binary repo + refresh renv.lock) is tracked in #24; this pin should be revisited then. Refs #24.
testthat::test_file() changes the working directory to the test file's directory before running. When test.R was invoked with a relative `-o` (as the CI workflow does with `-o out`), the env var still held the relative path, so test-predevals-output.R resolved `out/...` against /usr/local/bin/ instead of /project/. Tests failed with "out/predevals- options.json: No such file or directory" even though create-predevals- data.R had just written it. Resolve to absolute via normalizePath() before exporting. Smoke-tested against the workflow's relative-path invocation pattern; 35/35 pass.
This was referenced May 19, 2026
matthewcornell
approved these changes
May 20, 2026
Contributor
matthewcornell
left a comment
There was a problem hiding this comment.
I think this looks great, Anna. The strategy change is smart. I'm not super well-versed in the repo, but I don't see anything worrisome that stands out.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
:latestdiff test step (which broke on any schema change since the prior image cannot produce output for a schema it doesn't know) with a fixture-based testthat suite that asserts the docker pipeline's predevals output shape: file presence per (target, eval_set, disaggregation),predevals-options.jsoncontract (target list, per-target metrics after the_scaled_relative_skillrewrite), and non-emptyscores.csv.hubverse-org/dashboard-test-hub+hubverse-org/dashboard-test-hub-dashboard(both pinned tomain). 35 testthat assertions pass on the current three-target config (wk inc flu hosp,wk inc flu death,wk flu hosp rate category).testthatas a renv dependency. Lockfile also has minor / patch bumps to seven transitives required by testthat;fsis the only major bump (1.6.6 to 2.1.0), API used increate-predevals-data.R(fs::path,fs::dir_ls) is stable.hubPredEvalsData/hubEvals/scoringutilsunit-test (column types, value bounds, all-NA guards). This CI's job is packaging + structural contract, not numerical correctness.rocker/r-ver:4to4.5.2. The base tag floated to R 4.6.0 on 2026-04-24 and the pinned source packages in renv.lock no longer compile against R 4.6 (rlang / vctrs / arrow private-API removals). Pure unblock so CI on this PR can run; proper fix (configure PPM binaries + refresh lockfile) tracked in Production Docker build failing onarrowsource compile after rocker R 4.6 base bump #24.Closes #23.
Test plan
renv.lock, including thefsmajor bump, restores cleanly inside the image)dashboard-test-hub+dashboard-test-hub-dashboardmain, thentest.Rruns 35 testthat assertions and they all passFollow-ups
arrowsource compile after rocker R 4.6 base bump #24 (the proper rocker / renv.lock fix; this PR's R 4.5.2 pin is tactical pending that work)expected_*lists for rps + transform-suffixed columns oncehubPredEvalsDatav1.1.0 lands on:latest)