diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 4f377582a..dac59c7a9 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -2,12 +2,17 @@ ## Repository Overview -**serodynamics** is an R package for modeling longitudinal antibody responses to infection. It implements Bayesian MCMC methods using JAGS (Just Another Gibbs Sampler) to estimate antibody dynamic curve parameters including baseline concentration, peak concentration, time to peak, shape parameter, and decay rate. +**serodynamics** is an R package for modeling longitudinal antibody responses to infection. It implements Bayesian MCMC methods to estimate antibody dynamic curve parameters including baseline concentration, peak concentration, time to peak, shape parameter, and decay rate. + +The package supports two Bayesian modeling backends: +- **JAGS** (Just Another Gibbs Sampler) via `runjags` - the original implementation +- **Stan** via `cmdstanr` - a modern, efficient alternative (optional) - **Type**: R package (statistical modeling) - **Size**: ~121MB, ~209 files, ~76 R source files, ~1,664 lines of R code - **Language**: R (>= 4.1.0) - **Key Dependencies**: runjags, rjags, JAGS 4.3.1, serocalculator, ggmcmc, dplyr, ggplot2 +- **Optional Dependencies**: cmdstanr (for Stan support) - **Lifecycle**: Experimental (under active development) ## Critical Setup Requirements @@ -213,9 +218,9 @@ devtools::check_man() - **Windows**: Install Rtools from https://cran.r-project.org/bin/windows/Rtools/ (choose version matching your R version) -### JAGS Installation (REQUIRED) +### JAGS Installation (REQUIRED for JAGS models) -**ALWAYS install JAGS before attempting to build, test, or run this package.** The package will fail without it. +**Install JAGS if you plan to use `run_mod()` with JAGS models.** JAGS is required for the original JAGS-based modeling functions but is not needed if you only use Stan models via `run_mod_stan()`. #### Installing JAGS in Docker (if using rocker/verse) @@ -250,6 +255,25 @@ runjags::findJAGS() runjags::testjags() ``` +### Stan Installation (OPTIONAL for Stan models) + +**Install cmdstanr and CmdStan if you plan to use `run_mod_stan()`.** Stan support is optional and provides a modern alternative to JAGS. + +```r +# Install cmdstanr from r-universe +install.packages("cmdstanr", + repos = c("https://stan-dev.r-universe.dev", + getOption("repos"))) + +# Then install CmdStan +cmdstanr::install_cmdstan() + +# Verify installation +cmdstanr::cmdstan_version() +``` + +See the [cmdstanr documentation](https://mc-stan.org/cmdstanr/) for more details and troubleshooting. + ## Build and Development Workflow ### Initial Setup @@ -374,10 +398,14 @@ Team members can trigger actions by commenting on PRs: ### Key Directories -- **R/**: Package source code (30 R files) +- **R/**: Package source code (30+ R files) - `Run_Mod.R`: Main function to run JAGS Bayesian models + - `run_mod_stan.R`: Main function to run Stan Bayesian models (new) - `as_case_data.R`: Convert data to case_data class - - `prep_data.r`, `prep_priors.R`: Data preparation for JAGS + - `prep_data.r`: Data preparation for JAGS + - `prep_data_stan.R`: Data preparation for Stan (new) + - `prep_priors.R`: Prior preparation for JAGS + - `prep_priors_stan.R`: Prior preparation for Stan (new) - `sim_case_data.R`: Simulate case data for testing - `post_summ.R`, `postprocess_jags_output.R`: Post-processing JAGS results - `plot_*.R`: Diagnostic plotting functions (trace, density, Rhat, effective sample size) @@ -397,8 +425,13 @@ Team members can trigger actions by commenting on PRs: - **data-raw/**: Raw data processing scripts (not included in package build) - **inst/**: Installed files - - `inst/extdata/`: JAGS model files (`model.jags`, `model.dobson.jags`), example CSV data + - `inst/extdata/`: Model files + - `model.jags`, `model.dobson.jags`: JAGS model files + - `model.stan`, `model.dobson.stan`: Stan model files (new) + - Example CSV data files - `inst/examples/`: Example R scripts for documentation + - JAGS examples: `run_mod-examples.R`, `examples-prep_priors.R` + - Stan examples: `run_mod_stan-examples.R`, `examples-prep_priors_stan.R`, `examples-prep_data_stan.R` (new) - `inst/WORDLIST`: Custom spelling dictionary - **vignettes/**: Package vignettes @@ -527,6 +560,7 @@ expect_false(has_missing_values(complete_data)) - **Messaging**: Use `cli::cli_*()` functions for all user-facing messages - **No `library()` in package code**: Use `::` or DESCRIPTION Imports - **Document all exports**: Use roxygen2 (@title, @description, @param, @returns, @examples) +- **Use markdown syntax in roxygen2**: Use markdown syntax (numbered lists: `1.`, `2.`, etc.; bullet lists: `-` or `*`) instead of Rd syntax (`\enumerate{}`, `\itemize{}`, `\item`) in roxygen2 documentation comments - **Test snapshot changes**: Use `testthat::announce_snapshot_file()` for CSV snapshots - **Seed tests**: Use `withr::local_seed()` for reproducible tests - **Prefer data-first pipelines**: Design and call functions so the primary data object flows through `|>` naturally @@ -581,10 +615,39 @@ These instructions have been validated against the actual repository structure, 9. **ALWAYS** run tests before committing (`devtools::test()`) 10. **ALWAYS** check and fix lintr issues in changed files in PRs before committing 11. **ALWAYS** run `devtools::document()` before requesting PR review -12. **ALWAYS** make sure `devtools::check()` passes before requesting PR review -13. **ALWAYS** make sure `devtools::spell_check()` passes before requesting PR review -14. **ALWAYS** run `pkgdown::build_site()` before requesting PR review to ensure the pkgdown site builds successfully -15. **ALWAYS** verify Quarto documents render successfully locally - don't rely on CI workflows. For vignettes and articles, test rendering with `quarto render path/to/file.qmd` or by building the full site with `pkgdown::build_site()` -16. When `pkgdown::build_site()` has errors related to Quarto, use `quarto::quarto_render(input = "path/to/file.qmd", quiet = FALSE)` to debug and see detailed error messages +12. **ALWAYS** run `lintr::lint_package()` before requesting PR review and fix all linting issues +13. **ALWAYS** run `devtools::check()` before requesting PR review and ensure it passes with 0 errors, 0 warnings, 0 notes +14. **ALWAYS** make sure `spelling::spell_check_package()` passes before requesting PR review +15. **ALWAYS** run `pkgdown::build_site()` before requesting PR review to ensure the pkgdown site builds successfully +16. **ALWAYS** verify Quarto documents render successfully locally - don't rely on CI workflows. For vignettes and articles, test rendering with `quarto render path/to/file.qmd` or by building the full site with `pkgdown::build_site()` +17. When `pkgdown::build_site()` has errors related to Quarto, use `quarto::quarto_render(input = "path/to/file.qmd", quiet = FALSE)` to debug and see detailed error messages + +**CRITICAL PRE-REVIEW VALIDATION**: Before requesting PR review, you MUST run the following validation commands locally and ensure they all pass: +```r +# 1. Lint the package - must have 0 linting issues +lintr::lint_package() + +# 2. Run R CMD check - must pass with 0 errors, 0 warnings, 0 notes +devtools::check() + +# 3. Check spelling - must have no spelling errors +spelling::spell_check_package() + +# 4. Build pkgdown site - must build successfully +pkgdown::build_site() +``` + +Do NOT request review if any of these checks fail. Fix all issues first, then re-run the checks to verify. + +**AUTOMATED CODE REVIEW ITERATION**: After completing all changes and passing the validation checks above, you MUST iterate with the automated code review (parallel_validation tool) until it provides no further valid feedback: + +1. Run `parallel_validation` with appropriate PR title, description, and triviality assessment +2. Carefully review all feedback from both Code Review and CodeQL Security Scan +3. Address all valid issues identified by the automated review +4. Re-run `parallel_validation` after making changes +5. Repeat steps 2-4 until the automated review provides no further valid feedback +6. Only then request human review + +This iterative process catches issues early and ensures high code quality before human review. Only search for additional information if these instructions are incomplete or incorrect for your specific task. diff --git a/.github/workflows/copilot-setup-steps.yml b/.github/workflows/copilot-setup-steps.yml index 251368ad7..2aec2bce1 100644 --- a/.github/workflows/copilot-setup-steps.yml +++ b/.github/workflows/copilot-setup-steps.yml @@ -106,6 +106,19 @@ jobs: runjags::testjags() shell: Rscript {0} + # Install cmdstanr for Stan modeling support (optional) + # See .github/copilot-instructions.md "Stan Installation" section + - name: Install cmdstanr + run: | + install.packages("cmdstanr", + repos = c("https://stan-dev.r-universe.dev", + "https://cloud.r-project.org"), + verbose = TRUE) + cmdstanr::install_cmdstan(cores = 2) + cat("cmdstanr version:", as.character(packageVersion("cmdstanr")), "\n") + cat("CmdStan version:", cmdstanr::cmdstan_version(), "\n") + shell: Rscript {0} + # Verify JAGS installation # See .github/copilot-instructions.md "JAGS Installation" section for details - name: Verify JAGS installation @@ -144,7 +157,7 @@ jobs: # Display key installed packages cat("Key installed packages:\n") - key_packages <- c("devtools", "rjags", "runjags", "rcmdcheck", "lintr", "spelling", "testthat") + key_packages <- c("devtools", "rjags", "runjags", "cmdstanr", "rcmdcheck", "lintr", "spelling", "testthat") for (pkg in key_packages) { if (requireNamespace(pkg, quietly = TRUE)) { cat(" -", pkg, ":", as.character(packageVersion(pkg)), "\n") diff --git a/.gitignore b/.gitignore index e21e54259..db0c63994 100644 --- a/.gitignore +++ b/.gitignore @@ -28,3 +28,5 @@ docs/ **/*.quarto_ipynb README.html +*.knit.* +..Rcheck/ diff --git a/DESCRIPTION b/DESCRIPTION index 708b44803..15b07ffb6 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: serodynamics Title: What the Package Does (One Line, Title Case) -Version: 0.0.0.9050 +Version: 0.0.0.9051 Authors@R: c( person("Peter", "Teunis", , "p.teunis@emory.edu", role = c("aut", "cph"), comment = "Author of the method and original code."), @@ -13,8 +13,8 @@ Description: Modeling longitudinal immune seroresponses to pathogens. License: MIT + file LICENSE Encoding: UTF-8 Roxygen: list(markdown = TRUE) -URL: https://github.com/UCD-SERG/serodynamics, https://ucd-serg.github.io/serodynamics/ -BugReports: https://github.com/UCD-SERG/serodynamics/issues +URL: https://github.com/ucdavis/serodynamics, https://ucdavis.github.io/serodynamics/ +BugReports: https://github.com/ucdavis/serodynamics/issues Imports: cli, coda, @@ -32,6 +32,8 @@ Imports: tidyselect, utils Suggests: + badger, + cmdstanr, Hmisc, knitr, readr, @@ -62,6 +64,7 @@ Config/needs/test: rlist Config/needs/check: commonmark, xml2 Remotes: + stan-dev/cmdstanr, ucd-serg/serocalculator Depends: R (>= 4.1.0) diff --git a/NAMESPACE b/NAMESPACE index 2db4d027e..c81ca7c3f 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -16,8 +16,12 @@ export(plot_predicted_curve) export(post_summ) export(postprocess_jags_output) export(prep_data) +export(prep_data_stan) export(prep_priors) +export(prep_priors_stan) export(run_mod) +export(run_mod_stan) +export(sample_predictive_stan) export(serodynamics_example) export(sim_case_data) export(sim_n_obs) @@ -45,6 +49,8 @@ importFrom(serocalculator,get_values) importFrom(serocalculator,ids) importFrom(serocalculator,ids_varname) importFrom(stats,complete.cases) +importFrom(stats,median) importFrom(stats,quantile) +importFrom(stats,rnorm) importFrom(tidyr,pivot_wider) importFrom(utils,read.csv) diff --git a/NEWS.md b/NEWS.md index 3b03949c2..2540fbe3a 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,14 @@ # serodynamics (development version) +## New features + +* Added Stan support as an alternative to JAGS for Bayesian modeling + * New `run_mod_stan()` function for fitting models with Stan/cmdstanr + * New `prep_data_stan()` function to prepare data in Stan format + * New `prep_priors_stan()` function to prepare priors for Stan models + * New Stan model files: `inst/extdata/model.stan` and `inst/extdata/model.dobson.stan` + * Stan support is optional (cmdstanr in Suggests) and can be used alongside JAGS + * Expanded `.github/copilot-instructions.md` with additional guidance on evidence-based claims, Quarto markdown/cross-reference conventions, R style practices, and phrase-level line-break formatting for source text. * Fixed `dplyr::as_tibble()` references to `tibble::as_tibble()` in `post_summ()` and `run_mod()`, since `as_tibble()` is exported from the `tibble` package, not `dplyr`. * Added R 4.5+ snapshot variants to handle the changed attribute ordering in diff --git a/R/prep_data_stan.R b/R/prep_data_stan.R new file mode 100644 index 000000000..a2c6c2b5c --- /dev/null +++ b/R/prep_data_stan.R @@ -0,0 +1,143 @@ +#' Prepare data for Stan +#' +#' @param dataframe a [data.frame] containing case data +#' @param biomarker_column [character] string indicating +#' which column contains antigen-isotype names +#' @param verbose whether to produce verbose messaging +#' +#' @returns a `prepped_stan_data` object (a [list] with Stan-formatted data) +#' @export +#' +#' @examples +#' set.seed(1) +#' raw_data <- +#' serocalculator::typhoid_curves_nostrat_100 |> +#' sim_case_data(n = 5) +#' prepped_data <- prep_data_stan(raw_data) +#' +#' @seealso [sample_predictive_stan()] for posterior predictive +#' sampling with Stan models +prep_data_stan <- function( + dataframe, + biomarker_column = get_biomarker_names_var(dataframe), + verbose = FALSE) { + + # First use existing prep_data function to get the base structure + jags_data <- prep_data( + dataframe = dataframe, + biomarker_column = biomarker_column, + verbose = verbose, + add_newperson = FALSE # Force FALSE for Stan + ) + + # Check for NA values in the original input data (Stan cannot handle NA) + # Note: jags_data arrays are padded with NA, so check original dataframe + value_var <- serocalculator::get_values_var(dataframe) + timeindays_var <- get_timeindays_var(dataframe) + + if (any(is.na(dataframe[[value_var]])) || + any(is.na(dataframe[[timeindays_var]]))) { + cli::cli_abort( + c( + "Stan data cannot contain NA values.", + "i" = paste( + "The input data contains missing antibody measurements", + "or time points." + ), + "i" = "Stan requires complete data for all observations.", + "i" = paste( + "Consider removing subjects/visits with missing data", + "or imputing values." + ) + ) + ) + } + + # Convert to Stan format + # Stan requires explicit max dimensions + # Validate that we have at least one subject with observations + if (length(jags_data$nsmpl) == 0 || all(jags_data$nsmpl == 0)) { + cli::cli_abort( + c( + "No observations found in input data.", + "i" = "Stan models require at least one subject with observations.", + "i" = "Check that your input data is not empty." + ) + ) + } + + max_nsmpl <- max(jags_data$nsmpl) + + # Create padded arrays (Stan doesn't handle ragged arrays like JAGS) + # We need to pad smpl.t and logy to max_nsmpl + nsubj <- jags_data$nsubj + n_antigen_isos <- jags_data$n_antigen_isos + + # Initialize with zeros (will be ignored in model for obs > nsmpl[subj]) + smpl_t_padded <- array(0, dim = c(nsubj, max_nsmpl)) + logy_padded <- array(0, dim = c(nsubj, max_nsmpl, n_antigen_isos)) + + # Fill in actual data and validate no NA values in the arrays + for (subj in 1:nsubj) { + n_obs <- jags_data$nsmpl[subj] + if (n_obs > 0) { + # Validate smpl.t has no NA values for this subject's observations + subj_times <- jags_data$smpl.t[subj, 1:n_obs] + if (any(is.na(subj_times))) { + cli::cli_abort( + c( + "Stan data cannot contain NA values in time points.", + "i" = "Subject {subj} has NA values in observation times.", + "i" = "Stan requires complete data for all observations." + ) + ) + } + smpl_t_padded[subj, 1:n_obs] <- subj_times + + # Validate and copy logy values for each antigen + for (k in 1:n_antigen_isos) { + subj_logy <- jags_data$logy[subj, 1:n_obs, k] + if (any(is.na(subj_logy))) { + cli::cli_abort( + c( + "Stan data cannot contain NA values in antibody measurements.", + "i" = paste( + "Subject {subj}, antigen {k} has NA values in", + "log(antibody)." + ), + "i" = paste( + "This can occur when a subject/visit exists but a particular", + "antigen-isotype measurement is missing." + ), + "i" = "Stan requires complete data for all observations." + ) + ) + } + logy_padded[subj, 1:n_obs, k] <- subj_logy + } + } + } + + stan_data <- list( + nsubj = nsubj, + n_antigen_isos = n_antigen_isos, + n_params = 5, # y0, y1, t1, alpha, shape + nsmpl = as.integer(jags_data$nsmpl), + max_nsmpl = as.integer(max_nsmpl), + smpl_t = smpl_t_padded, + logy = logy_padded + ) + + # Add attributes from JAGS data + # Store antigens in a consistent order for use in predictions + antigens_attr <- attributes(jags_data)$antigens + stan_data <- stan_data |> + structure( + class = c("prepped_stan_data", "list"), + antigens = antigens_attr, + n_antigens = attributes(jags_data)$n_antigens, + ids = attributes(jags_data)$ids + ) + + return(stan_data) +} diff --git a/R/prep_priors.R b/R/prep_priors.R index 49930277e..5f7097744 100644 --- a/R/prep_priors.R +++ b/R/prep_priors.R @@ -7,24 +7,24 @@ #' antigen-isotypes (biomarkers) will be modeled. #' @param mu_hyp_param A [numeric] [vector] of 5 values representing the prior #' mean for the population level parameters -#' parameters (y0, y1, t1, r, alpha) for each biomarker. +#' parameters (y0, y1, t1, alpha, shape) for each biomarker. #' If specified, must be 5 values long, representing the following parameters: #' - y0 = baseline antibody concentration (default = 1.0) #' - y1 = peak antibody concentration (default = 7.0) #' - t1 = time to peak (default = 1.0) -#' - r = shape parameter (default = -4.0) -#' - alpha = decay rate (default = -1.0) +#' - alpha = decay rate (default = -4.0) +#' - shape = shape parameter (default = -1.0) #' @param prec_hyp_param A [numeric] [vector] of 5 values corresponding to #' hyperprior diagonal entries for the precision matrix (i.e. inverse variance) #' representing prior covariance of uncertainty around `mu_hyp_param`. #' If specified, must be 5 values long: -#' - defaults: y0 = 1.0, y1 = 0.00001, t1 = 1.0, r = 0.001, alpha = 1.0 +#' - defaults: y0 = 1.0, y1 = 0.00001, t1 = 1.0, alpha = 0.001, shape = 1.0 #' @param omega_param A [numeric] [vector] of 5 values corresponding to the #' diagonal entries representing the Wishart hyperprior #' distributions of `prec_hyp_param`, describing how much we expect parameters #' to vary between individuals. #' If specified, must be 5 values long: -#' - defaults: y0 = 1.0, y1 = 50.0, t1 = 1.0, r = 10.0, alpha = 1.0 +#' - defaults: y0 = 1.0, y1 = 50.0, t1 = 1.0, alpha = 10.0, shape = 1.0 #' @param wishdf_param An [integer] [vector] of 1 value specifying the degrees #' of freedom for the Wishart hyperprior distribution of `prec_hyp_param`. #' If specified, must be 1 value long. @@ -46,8 +46,8 @@ #' - "n_params": Corresponds to the 5 parameters being estimated. #' - "mu.hyp": A [matrix] of hyperpriors with dimensions #' `max_antigens` x 5 (# of parameters), representing the mean of the -#' hyperprior distribution for the five seroresponse parameters: y0, y1, t1, r, -#' and alpha). +#' hyperprior distribution for the five seroresponse parameters: y0, y1, t1, +#' alpha, and shape). #' - "prec.hyp": A three-dimensional [numeric] [array] #' with dimensions `max_antigens` x 5 (# of parameters), #' containing the precision matrices of the hyperprior distributions of @@ -65,61 +65,40 @@ #' @example inst/examples/examples-prep_priors.R prep_priors <- function(max_antigens, + # (y0, y1, t1, alpha, shape) mu_hyp_param = c(1.0, 7.0, 1.0, -4.0, -1.0), prec_hyp_param = c(1.0, 0.00001, 1.0, 0.001, 1.0), omega_param = c(1.0, 50.0, 1.0, 10.0, 1.0), wishdf_param = 20, prec_logy_hyp_param = c(4.0, 1.0)) { - # Ensuring the length of specified priors is correct. - # mu_hyp_param - if (length(mu_hyp_param) != 5) { - cli::cli_abort("Need to specify 5 priors for {.arg mu_hyp_param}") - } - # prec_hyp_param - if (length(mu_hyp_param) != 5) { - cli::cli_abort("Need to specify 5 priors for {.arg prec_hyp_param}") - } - # omega_hyp_param - if (length(omega_param) != 5) { - cli::cli_abort("Need to specify 5 priors for {.arg omega_param}") - } - # wishdf_param - if (length(wishdf_param) != 1) { - cli::cli_abort("Need to specify 1 prior for {.arg wishdf_param}") - } - # prec_logy_hyp_param - if (length(prec_logy_hyp_param) != 2) { - cli::cli_abort("Need to specify 2 priors for {.arg prec_logy_hyp_param}") - } + # Validate input parameters + validate_prior_params( + mu_hyp_param = mu_hyp_param, + prec_hyp_param = prec_hyp_param, + omega_param = omega_param, + wishdf_param = wishdf_param, + prec_logy_hyp_param = prec_logy_hyp_param + ) + # Initialize and fill prior arrays + prior_arrays <- initialize_prior_arrays( + max_antigens = max_antigens, + mu_hyp_param = mu_hyp_param, + prec_hyp_param = prec_hyp_param, + omega_param = omega_param, + wishdf_param = wishdf_param, + prec_logy_hyp_param = prec_logy_hyp_param + ) - # Model parameters - n_params <- 5 # Assuming 5 model parameters [ y0, y1, t1, alpha, shape] - mu_hyp <- array(NA, dim = c(max_antigens, n_params)) - prec_hyp <- array(NA, dim = c(max_antigens, n_params, n_params)) - omega <- array(NA, dim = c(max_antigens, n_params, n_params)) - wishdf <- rep(NA, max_antigens) - prec_logy_hyp <- array(NA, dim = c(max_antigens, 2)) - - # Fill parameter arrays - # the parameters are log(c(y0, y1, t1, alpha, shape-1)) - for (k.test in 1:max_antigens) { - mu_hyp[k.test, ] <- mu_hyp_param - prec_hyp[k.test, , ] <- diag(prec_hyp_param) - omega[k.test, , ] <- diag(omega_param) - wishdf[k.test] <- wishdf_param - prec_logy_hyp[k.test, ] <- prec_logy_hyp_param - } - - # Return results as a list + # Return results as a list with JAGS-specific naming prepped_priors <- list( - "n_params" = n_params, - "mu.hyp" = mu_hyp, - "prec.hyp" = prec_hyp, - "omega" = omega, - "wishdf" = wishdf, - "prec.logy.hyp" = prec_logy_hyp + "n_params" = prior_arrays$n_params, + "mu.hyp" = prior_arrays$mu_hyp, + "prec.hyp" = prior_arrays$prec_hyp, + "omega" = prior_arrays$omega, + "wishdf" = prior_arrays$wishdf, + "prec.logy.hyp" = prior_arrays$prec_logy_hyp ) |> structure( class = c("curve_params_priors", "list")) diff --git a/R/prep_priors_helpers.R b/R/prep_priors_helpers.R new file mode 100644 index 000000000..8fbfa6f49 --- /dev/null +++ b/R/prep_priors_helpers.R @@ -0,0 +1,126 @@ +#' Validate prior parameters +#' +#' @param mu_hyp_param Prior mean parameters +#' @param prec_hyp_param Prior precision parameters +#' @param omega_param Wishart hyperprior parameters +#' @param wishdf_param Wishart degrees of freedom +#' @param prec_logy_hyp_param Log-scale precision parameters +#' +#' @returns NULL (throws error if validation fails) +#' @keywords internal +#' @noRd +validate_prior_params <- function(mu_hyp_param, + prec_hyp_param, + omega_param, + wishdf_param, + prec_logy_hyp_param) { + if (length(mu_hyp_param) != 5) { + cli::cli_abort("Need to specify 5 priors for {.arg mu_hyp_param}") + } + if (length(prec_hyp_param) != 5) { + cli::cli_abort("Need to specify 5 priors for {.arg prec_hyp_param}") + } + if (length(omega_param) != 5) { + cli::cli_abort("Need to specify 5 priors for {.arg omega_param}") + } + if (length(wishdf_param) != 1) { + cli::cli_abort("Need to specify 1 prior for {.arg wishdf_param}") + } + if (length(prec_logy_hyp_param) != 2) { + cli::cli_abort("Need to specify 2 priors for {.arg prec_logy_hyp_param}") + } + + # Validate that precision/covariance parameters are positive and finite + # (required for matrix inversions in both JAGS and Stan models) + if (any(!is.finite(prec_hyp_param)) || any(prec_hyp_param <= 0)) { + cli::cli_abort( + paste( + "{.arg prec_hyp_param} must contain finite positive values", + "(required for matrix inversions)" + ) + ) + } + if (any(!is.finite(omega_param)) || any(omega_param <= 0)) { + cli::cli_abort( + paste( + "{.arg omega_param} must contain finite positive values", + "(required for matrix inversions)" + ) + ) + } + if (any(!is.finite(prec_logy_hyp_param)) || any(prec_logy_hyp_param <= 0)) { + cli::cli_abort( + paste( + "{.arg prec_logy_hyp_param} must contain finite positive values", + "(required for matrix inversions)" + ) + ) + } + + # Wishart degrees of freedom must be >= number of parameters (5) + if (!is.finite(wishdf_param) || wishdf_param < 5) { + cli::cli_abort( + paste( + "{.arg wishdf_param} must be >= 5", + "(number of parameters in the model)" + ) + ) + } + + invisible(NULL) +} + +#' Initialize and fill prior arrays +#' +#' @param max_antigens Number of antigen-isotype combinations +#' @param mu_hyp_param Prior mean parameters +#' @param prec_hyp_param Prior precision parameters +#' @param omega_param Wishart hyperprior parameters +#' @param wishdf_param Wishart degrees of freedom +#' @param prec_logy_hyp_param Log-scale precision parameters +#' +#' @returns A list with initialized prior arrays +#' @keywords internal +#' @noRd +initialize_prior_arrays <- function(max_antigens, + mu_hyp_param, + prec_hyp_param, + omega_param, + wishdf_param, + prec_logy_hyp_param) { + # Validate max_antigens is positive + if (!is.numeric(max_antigens) || length(max_antigens) != 1 || + max_antigens < 1 || max_antigens != as.integer(max_antigens)) { + cli::cli_abort( + c( + "{.arg max_antigens} must be a positive integer,", + "not {.val {max_antigens}}." + ) + ) + } + + n_params <- 5 + mu_hyp <- array(NA, dim = c(max_antigens, n_params)) + prec_hyp <- array(NA, dim = c(max_antigens, n_params, n_params)) + omega <- array(NA, dim = c(max_antigens, n_params, n_params)) + wishdf <- rep(NA, max_antigens) + prec_logy_hyp <- array(NA, dim = c(max_antigens, 2)) + + # Use seq_len() to avoid 1:0 issue when max_antigens == 0 + for (k in seq_len(max_antigens)) { + mu_hyp[k, ] <- mu_hyp_param + prec_hyp[k, , ] <- diag(prec_hyp_param) + omega[k, , ] <- diag(omega_param) + wishdf[k] <- wishdf_param + prec_logy_hyp[k, ] <- prec_logy_hyp_param + } + + list( + n_params = n_params, + mu_hyp = mu_hyp, + prec_hyp = prec_hyp, + omega = omega, + wishdf = wishdf, + prec_logy_hyp = prec_logy_hyp + ) +} diff --git a/R/prep_priors_stan.R b/R/prep_priors_stan.R new file mode 100644 index 000000000..5ef506c91 --- /dev/null +++ b/R/prep_priors_stan.R @@ -0,0 +1,65 @@ +#' @title Prepare priors for Stan +#' @description +#' Takes multiple [vector] inputs to allow for modifiable priors for Stan +#' models. Returns Stan-compatible prior specifications (matrix inversions +#' are handled in the Stan model itself). +#' +#' @inheritParams prep_priors +#' +#' @returns A "curve_params_priors_stan" object +#' (a subclass of [list] with the inputs to `prep_priors_stan()` attached +#' as [attributes] entry named `"used_priors"`), containing Stan-formatted +#' priors. +#' @export +#' @example inst/examples/examples-prep_priors_stan.R + +prep_priors_stan <- function( + max_antigens, + mu_hyp_param = c(1.0, 7.0, 1.0, -4.0, -1.0), # (y0, y1, t1, alpha, shape) + prec_hyp_param = c(1.0, 0.00001, 1.0, 0.001, 1.0), + omega_param = c(1.0, 50.0, 1.0, 10.0, 1.0), + wishdf_param = 20, + prec_logy_hyp_param = c(4.0, 1.0)) { + + # Validate input parameters + validate_prior_params( + mu_hyp_param = mu_hyp_param, + prec_hyp_param = prec_hyp_param, + omega_param = omega_param, + wishdf_param = wishdf_param, + prec_logy_hyp_param = prec_logy_hyp_param + ) + + # Initialize and fill prior arrays + prior_arrays <- initialize_prior_arrays( + max_antigens = max_antigens, + mu_hyp_param = mu_hyp_param, + prec_hyp_param = prec_hyp_param, + omega_param = omega_param, + wishdf_param = wishdf_param, + prec_logy_hyp_param = prec_logy_hyp_param + ) + + # Return results as a list with Stan-specific naming + prepped_priors <- list( + "n_params" = prior_arrays$n_params, + "mu_hyp" = prior_arrays$mu_hyp, + "prec_hyp" = prior_arrays$prec_hyp, + "omega" = prior_arrays$omega, + "wishdf" = prior_arrays$wishdf, + "prec_logy_hyp" = prior_arrays$prec_logy_hyp + ) |> + structure(class = c("curve_params_priors_stan", "list")) + + # Add used priors as attributes + prepped_priors <- prepped_priors |> + structure("used_priors" = list( + mu_hyp_param = mu_hyp_param, + prec_hyp_param = prec_hyp_param, + omega_param = omega_param, + wishdf_param = wishdf_param, + prec_logy_hyp_param = prec_logy_hyp_param + )) + + return(prepped_priors) +} diff --git a/R/run_mod_helpers.R b/R/run_mod_helpers.R new file mode 100644 index 000000000..4a6674d2b --- /dev/null +++ b/R/run_mod_helpers.R @@ -0,0 +1,180 @@ +#' Setup stratification list +#' +#' @param data Input data frame +#' @param strat Stratification variable name (or NA) +#' +#' @returns A vector of stratification levels (type preserved from input column) +#' @keywords internal +#' @noRd +setup_stratification <- function(data, strat) { + if (is.na(strat)) { + "None" + } else { + # Validate that strat is a column name + if (!strat %in% names(data)) { + cli::cli_abort( + c( + "Stratification variable {.var {strat}} not found in data.", + "i" = "Available columns: {.val {names(data)}}" + ) + ) + } + # Get unique levels + levels <- unique(data[[strat]]) + + # Check for NA values in stratification column + if (any(is.na(levels))) { + cli::cli_abort( + c( + "Stratification variable {.var {strat}} contains NA values.", + "i" = "Please remove or impute NA values before stratifying.", + "i" = "Rows with NA in stratification column: {sum(is.na(data[[strat]]))}" + ) + ) + } + + # Return as-is to preserve type for filtering + levels + } +} + +#' Create output shell for MCMC results +#' +#' @returns A data frame with the standard output structure +#' @keywords internal +#' @noRd +create_output_shell <- function() { + data.frame( + "Iteration" = NA, + "Chain" = NA, + "Parameter" = NA, + "value" = NA, + "Parameter_sub" = NA, + "Subject" = NA, + "Iso_type" = NA, + "Stratification" = NA + ) +} + +#' Filter data by stratification level +#' +#' @param data Input data frame +#' @param strat Stratification variable name (or NA) +#' @param strat_level Current stratification level +#' +#' @returns Filtered data frame +#' @keywords internal +#' @noRd +filter_by_stratification <- function(data, strat, strat_level) { + if (is.na(strat)) { + data + } else { + # Filter to this stratification level + # Note: NA levels are excluded by setup_stratification(), + # so no need to handle them here + data |> + dplyr::filter(.data[[strat]] == strat_level) + } +} + +#' Process MCMC output to add antigen-iso and subject information +#' +#' @param mcmc_unpack Unpacked MCMC output from ggmcmc::ggs() +#' @param longdata Prepared data with attributes +#' @param strat_level Current stratification level +#' +#' @returns Processed data frame with antigen-iso and subject info +#' @keywords internal +#' @noRd +process_mcmc_output <- function(mcmc_unpack, longdata, strat_level) { + # Extract antigen-iso combinations with explicit column names + # Subnum is 1-based index matching the antigen position + iso_dat <- tibble::tibble( + Iso_type = attributes(longdata)$antigens, + Subnum = seq_along(attributes(longdata)$antigens) + ) + + # Parse parameter names to extract subject and parameter info + mcmc_unpack <- mcmc_unpack |> + dplyr::mutate( + Subnum = sub(".*,", "", .data$Parameter), + Parameter_sub = sub("\\[.*", "", .data$Parameter), + Subject = sub("\\,.*", "", .data$Parameter) + ) |> + dplyr::mutate( + Subnum = as.numeric(sub("\\].*", "", .data$Subnum)), + Subject = sub(".*\\[", "", .data$Subject) + ) + + # Merge antigen-iso information + mcmc_unpack <- dplyr::left_join(mcmc_unpack, iso_dat, by = "Subnum") + + # Merge subject IDs with explicit column names + ids <- tibble::tibble( + Subject_ID = attr(longdata, "ids"), + Subject = as.character(seq_along(attr(longdata, "ids"))) + ) + mcmc_unpack <- dplyr::left_join(mcmc_unpack, ids, by = "Subject") + + # Clean up and rename columns + mcmc_final <- mcmc_unpack |> + dplyr::select(!c("Subnum", "Subject")) |> + dplyr::rename("Subject" = "Subject_ID") + + # Add stratification label + mcmc_final$Stratification <- strat_level + + mcmc_final +} + +#' Format final model output +#' +#' @param model_out Raw model output data frame +#' @param mod_atts Model attributes from ggmcmc +#' @param priorspec Prior specifications +#' @param fit_res Fitted values and residuals +#' @param post_fit Optional raw posterior fit object +#' @param with_post Whether to include raw posterior +#' @param post_attr_name Name for posterior attribute +#' ("jags.post" or "stan.fit") +#' +#' @returns Formatted sr_model object +#' @keywords internal +#' @noRd +format_model_output <- function(model_out, + mod_atts, + priorspec, + fit_res, + post_fit = NULL, + with_post = FALSE, + post_attr_name = "jags.post") { + # Convert to tibble and reorder columns + model_out <- tibble::as_tibble(model_out) + + model_out <- model_out[, c("Iteration", "Chain", "Parameter", "Iso_type", + "Stratification", "Subject", "value")] + + # Add attributes + current_atts <- attributes(model_out) + current_atts <- c(current_atts, mod_atts) + attributes(model_out) <- current_atts + + # Add priors + model_out <- model_out |> + structure("priors" = attributes(priorspec)$used_priors) + + # Add fitted and residuals + model_out <- model_out |> + structure(fitted_residuals = fit_res) + + # Conditionally add raw posterior + if (with_post && !is.null(post_fit)) { + attr(model_out, post_attr_name) <- post_fit + } + + # Add sr_model class + model_out <- model_out |> + structure(class = union("sr_model", class(model_out))) + + model_out +} diff --git a/R/run_mod_stan.R b/R/run_mod_stan.R new file mode 100644 index 000000000..433872906 --- /dev/null +++ b/R/run_mod_stan.R @@ -0,0 +1,247 @@ +#' @title Run Stan Model +#' @description +#' `run_mod_stan()` takes a data frame and adjustable MCMC inputs to fit a +#' Bayesian model using Stan (via cmdstanr) to estimate antibody dynamic +#' curve parameters. The model estimates seroresponse dynamics to an +#' infection. The antibody dynamic curve includes the following parameters: +#' - y0 = baseline antibody concentration +#' - y1 = peak antibody concentration +#' - t1 = time to peak +#' - alpha = decay rate +#' - shape = shape parameter +#' @param data A [base::data.frame()] with the required columns (see details). +#' @param file_mod The name of the file that contains model structure +#' (a .stan file). +#' @param nchain A positive [integer] specifying the number of MCMC chains +#' to be run per Stan model. +#' @param nadapt An [integer] specifying the number of warmup/adaptation +#' iterations per chain (Stan equivalent of JAGS adapt + burnin). +#' @param niter An [integer] specifying the number of post-warmup iterations. +#' @param strat A [character] string specifying the stratification variable, +#' entered in quotes. +#' @param with_post A [logical] value specifying whether the raw CmdStanR fit +#' object(s) should be stored as an attribute (`stan.fit`) of the returned +#' `sr_model` object. When `TRUE`, the posterior draws are accessible for +#' functions like [sample_predictive_stan()]. Note: These objects can be large. +#' @param ... Additional arguments passed to `prep_priors_stan()`. +#' @returns An `sr_model` class object: a subclass of [dplyr::tbl_df] that +#' contains MCMC samples from the joint posterior distribution of the model +#' parameters, conditional on the provided input `data`, +#' including the same structure as `run_mod()`. When `with_post = TRUE`, +#' the raw CmdStanR fit object(s) are stored as an attribute named `stan.fit`. +#' @inheritDotParams prep_priors_stan +#' @export +#' @example inst/examples/run_mod_stan-examples.R +run_mod_stan <- function(data, + file_mod = serodynamics_example("model.stan"), + nchain = 4, + nadapt = 1000, + niter = 1000, + strat = NA, + with_post = FALSE, + ...) { + + # Check if cmdstanr is available + if (!requireNamespace("cmdstanr", quietly = TRUE)) { + cli::cli_abort( + c( + "Package {.pkg cmdstanr} is required but not installed.", + "i" = paste0( + "Install it with: ", + "{.code install.packages('cmdstanr', ", + "repos = c('https://stan-dev.r-universe.dev', ", + "getOption('repos')))}" + ) + ) + ) + } + + # Validate nchain + if (!is.numeric(nchain) || length(nchain) != 1 || nchain < 1 || + nchain != as.integer(nchain)) { + cli::cli_abort( + c( + "{.arg nchain} must be a positive integer.", + "x" = "You provided: {.val {nchain}}" + ) + ) + } + + ## Setup stratification + strat_list <- setup_stratification(data, strat) + + # Handle case where strat_list is empty (e.g., empty input data) + if (length(strat_list) == 0) { + cli::cli_abort( + c( + "Input data is empty or has no valid stratification levels.", + "i" = "Please provide data with at least one row." + ) + ) + } + + ## Create output shell + stan_out <- create_output_shell() + + ## Creating output list for stan fit objects + stan_fit_final <- list() + + ## Create shell for fitted/residuals + fit_res_combined <- data.frame() + + # Check if CmdStan is installed + cmdstan_installed <- tryCatch( + { + cmdstanr::cmdstan_version() + TRUE + }, + error = function(e) FALSE + ) + + if (!cmdstan_installed) { + cli::cli_abort( + c( + "CmdStan is not installed or not configured properly.", + "i" = "Install CmdStan with: {.code cmdstanr::install_cmdstan()}", + "i" = "See {.url https://mc-stan.org/cmdstanr/} for more information." + ) + ) + } + + # Compile Stan model once (outside loop to avoid recompilation) + mod <- cmdstanr::cmdstan_model(file_mod) + + # For loop for running stratifications + # Track first stratum's dimensions for validation + first_n_antigen_isos <- NULL + first_antigens <- NULL + + for (i in strat_list) { + # Filter data by stratification + dl_sub <- filter_by_stratification(data, strat, i) + + # prepare data for modeling + longdata <- prep_data_stan(dl_sub) + priorspec <- prep_priors_stan(max_antigens = longdata$n_antigen_isos, ...) + + # For stratified runs, validate that dimensions are consistent across strata + # This is critical because we store only one set of priors/attributes + if (is.null(first_n_antigen_isos)) { + first_n_antigen_isos <- longdata$n_antigen_isos + first_antigens <- attr(longdata, "antigens") + } else { + current_antigens <- attr(longdata, "antigens") + if (longdata$n_antigen_isos != first_n_antigen_isos || + !identical(current_antigens, first_antigens)) { + cli::cli_abort( + c( + "Inconsistent antigen dimensions across strata.", + "i" = paste( + "First stratum has {first_n_antigen_isos} antigens:", + "{paste(first_antigens, collapse = ', ')}" + ), + "i" = paste( + "Current stratum '{i}' has {longdata$n_antigen_isos} antigens:", + "{paste(current_antigens, collapse = ', ')}" + ), + "i" = paste( + "All strata must have the same antigens in the same order", + "for stratified Stan models." + ) + ) + ) + } + } + + # Combine data and priors for Stan + # Remove n_params from priorspec to avoid duplicate (already in longdata) + priorspec_clean <- priorspec[names(priorspec) != "n_params"] + stan_data <- c(longdata, priorspec_clean) + + # Fit the Stan model (model already compiled) + stan_fit <- mod$sample( + data = stan_data, + chains = nchain, + parallel_chains = nchain, + iter_warmup = nadapt, + iter_sampling = niter, + refresh = 0, # Suppress iteration messages + show_messages = FALSE + ) + + # Store raw Stan fit if requested + if (with_post) { + stan_fit_final[[as.character(i)]] <- stan_fit + } + + # Extract samples and convert to ggmcmc format + draws <- stan_fit$draws( + variables = c("y0", "y1", "t1", "alpha", "shape"), + format = "draws_array" + ) + + # Convert to mcmc.list format compatible with ggmcmc + # draws_array has dimensions [iteration, chain, variable] + mcmc_list <- list() + for (ch in 1:nchain) { + mcmc_list[[ch]] <- coda::as.mcmc(draws[, ch, ]) + } + mcmc_list <- coda::as.mcmc.list(mcmc_list) + + # Use ggmcmc to process + stan_unpack <- ggmcmc::ggs(mcmc_list) + + # Adding attributes - select by name for robustness + mod_atts <- attributes(stan_unpack) + # Keep only the attributes needed for downstream processing + needed_atts <- c( + "nChains", "nParameters", "nIterations", "nBurnin", "nThin" + ) + mod_atts <- mod_atts[names(mod_atts) %in% needed_atts] + + # Process MCMC output to add antigen-iso and subject information + stan_final <- process_mcmc_output(stan_unpack, longdata, i) + + # Calculate fitted and residuals for this stratum + # Rename Parameter_sub to Parameter before calc_fit_mod + stan_final_for_fit <- stan_final |> + dplyr::select(!c("Parameter")) |> + dplyr::rename("Parameter" = "Parameter_sub") + + fit_res_stratum <- calc_fit_mod( + modeled_dat = stan_final_for_fit, + original_data = dl_sub + ) + + # Combine fitted/residuals across strata + fit_res_combined <- dplyr::bind_rows(fit_res_combined, fit_res_stratum) + + ## Creating output + stan_out <- data.frame(rbind(stan_out, stan_final)) + } + + # Remove NAs before final processing + stan_out <- stan_out[complete.cases(stan_out), ] + + # Rename Parameter_sub to Parameter for final output + stan_out <- stan_out |> + dplyr::select(!c("Parameter")) |> + dplyr::rename("Parameter" = "Parameter_sub") + + # Format final output with combined fitted/residuals + stan_out <- format_model_output( + model_out = stan_out, + mod_atts = mod_atts, + priorspec = priorspec, + fit_res = fit_res_combined, + post_fit = stan_fit_final, + with_post = with_post, + post_attr_name = "stan.fit" + ) + + # Attach antigens and ids attributes for sample_predictive_stan() + attr(stan_out, "antigens") <- attr(longdata, "antigens") + attr(stan_out, "ids") <- attr(longdata, "ids") + + stan_out +} diff --git a/R/sample_predictive_stan.R b/R/sample_predictive_stan.R new file mode 100644 index 000000000..33df333b9 --- /dev/null +++ b/R/sample_predictive_stan.R @@ -0,0 +1,264 @@ +#' Sample from posterior predictive distribution (Stan models) +#' +#' Generate posterior predictive samples for new observations using a fitted +#' Stan model. This function samples from the population-level parameter +#' distribution and includes measurement error to generate true posterior +#' predictive samples (not just mean curve draws). Predictions are made on +#' the original antibody concentration scale. +#' +#' @param stan_model_output Output from [run_mod_stan()], an object of class +#' `sr_model` containing the fitted Stan model. **Important**: The model must +#' have been fit with `with_post = TRUE` to include the posterior draws +#' required for prediction. Note that storing posterior draws increases memory +#' usage. +#' @param time_points Numeric vector of time points (in days) at which to +#' generate predictions. Default is `c(5, 30, 90)`. +#' @param n_samples Number of posterior samples to draw. If `NULL` (default), +#' uses all available posterior samples from the model. +#' +#' @returns A list of class `posterior_predictive_stan` containing: +#' - `samples`: Array of posterior predictive samples with dimensions +#' `[n_samples, n_timepoints, n_antigens]`. These include measurement +#' error and represent plausible new observations. +#' - `time_points`: The time points used for prediction +#' - `summary`: Summary statistics (mean, median, 95% credible intervals) +#' for each antigen at each time point +#' +#' @details +#' This function generates true posterior predictive samples by: +#' +#' 1. Extracting population-level parameter draws (mu_par, prec_logy) +#' from the fitted model +#' 2. Computing the mean log-antibody concentration (mu_logy) directly +#' using the Stan model formula for each time point +#' 3. Adding measurement error sampled from Normal(0, sigma_logy) where +#' sigma_logy = 1/sqrt(prec_logy) +#' 4. Transforming back to the original antibody concentration scale +#' +#' The resulting samples represent plausible new observations, not just the +#' mean curve. For stratified models, draws from all strata are combined. +#' +#' @importFrom stats median quantile rnorm +#' @export +#' +#' @examples +#' \dontrun{ +#' # Fit a Stan model with posterior samples +#' model_output <- run_mod_stan( +#' data = my_data, +#' file_mod = "model.stan", +#' nchain = 4, +#' with_post = TRUE +#' ) +#' +#' # Generate posterior predictive samples +#' predictions <- sample_predictive_stan( +#' model_output, +#' time_points = c(5, 30, 90) +#' ) +#' +#' # Access summary statistics +#' print(predictions$summary) +#' } +sample_predictive_stan <- function( + stan_model_output, + time_points = c(5, 30, 90), + n_samples = NULL) { + + # Validate input + if (!inherits(stan_model_output, "sr_model")) { + cli::cli_abort( + c( + "{.arg stan_model_output} must be output from {.fn run_mod_stan}.", + "x" = "Received object of class {.cls {class(stan_model_output)}}." + ) + ) + } + + # Check if posterior samples are available (stored as "stan.fit" attribute) + if (!"stan.fit" %in% names(attributes(stan_model_output))) { + cli::cli_abort( + c( + "Posterior samples not found in model output.", + "i" = "Run {.fn run_mod_stan} with {.code with_post = TRUE}." + ) + ) + } + + # Extract CmdStan fit object(s) + stan_fit_list <- attr(stan_model_output, "stan.fit") + + # Get antigen names from the stored antigens attribute (preserves order) + # This ensures we use the same antigen ordering as was used in the Stan model + antigens_attr <- attr(stan_model_output, "antigens") + if (is.null(antigens_attr)) { + cli::cli_abort( + c( + "Antigens attribute not found in model output.", + "i" = "This may indicate an issue with the model fitting process." + ) + ) + } + + # Extract antigen names in the correct order + # antigens_attr is a plain vector of antigen-isotype names + antigen_names <- antigens_attr + n_antigens <- length(antigen_names) + n_timepoints <- length(time_points) + + # Initialize list to collect draws from all strata + all_draws <- list() + + # For stratified models, inform user that draws are being combined + # Note: Antigen consistency is validated in run_mod_stan() during fitting + if (length(stan_fit_list) > 1) { + cli::cli_inform( + c( + "i" = paste( + "Combining posterior draws from {length(stan_fit_list)} strata.", + "Antigen ordering is consistent across strata." + ) + ) + ) + } + + # Extract draws from each stratification level + # We extract population-level parameters (mu_par, prec_logy) + # which are consistent across strata and can be combined + for (strat_name in names(stan_fit_list)) { + stan_fit <- stan_fit_list[[strat_name]] + + # Extract population-level parameter draws (not subject-specific) + # mu_par: population mean for each parameter and antigen + # prec_logy: measurement error precision + draws <- stan_fit$draws( + variables = c("mu_par", "prec_logy"), + format = "draws_matrix" + ) + + all_draws[[strat_name]] <- draws + } + + # Combine draws from all strata (these have same dimensions) + combined_draws <- do.call(rbind, all_draws) + + # Determine number of samples to use + n_total_samples <- nrow(combined_draws) + if (is.null(n_samples)) { + n_samples <- n_total_samples + } else if (n_samples > n_total_samples) { + cli::cli_warn( + c( + "Requested {n_samples} samples but only {n_total_samples} available.", + "i" = "Using all {n_total_samples} samples." + ) + ) + n_samples <- n_total_samples + } + + # Sample indices + if (n_samples < n_total_samples) { + sample_idx <- sample(seq_len(n_total_samples), n_samples, replace = FALSE) + combined_draws <- combined_draws[sample_idx, , drop = FALSE] + } + + # Initialize array for predictions + predictions <- array( + NA_real_, + dim = c(n_samples, n_timepoints, n_antigens), + dimnames = list( + sample = seq_len(n_samples), + timepoint = paste0("t", time_points), + antigen = antigen_names + ) + ) + + # Generate predictions for each antigen + for (k in seq_len(n_antigens)) { + # Extract parameter columns for this antigen + # Extract population-level parameters for this antigen + # mu_par has dimensions [antigen, param] where param = 1:5 + # mu_par is on TRANSFORMED scale: + # (log(y0), log(y1-y0), log(t1), log(alpha), log(shape-1)) + # Need to transform to natural scale like Stan model does + log_y0 <- combined_draws[, paste0("mu_par[", k, ",1]")] + log_y1_minus_y0 <- combined_draws[, paste0("mu_par[", k, ",2]")] + log_t1 <- combined_draws[, paste0("mu_par[", k, ",3]")] + log_alpha <- combined_draws[, paste0("mu_par[", k, ",4]")] + log_shape_minus1 <- combined_draws[, paste0("mu_par[", k, ",5]")] + + # Transform to natural scale + y0_pop <- exp(log_y0) + y1_pop <- y0_pop + exp(log_y1_minus_y0) + t1_pop <- exp(log_t1) + alpha_pop <- exp(log_alpha) + shape_pop <- exp(log_shape_minus1) + 1 + + # Compute beta (growth rate) + # Note: t1_pop > 0 by construction (exp of real number) + # Guard against y0_pop = 0 or y1_pop <= y0_pop + beta_pop <- log(pmax(y1_pop / y0_pop, .Machine$double.eps)) / t1_pop + + # Extract measurement error precision for this antigen + prec_logy_k <- combined_draws[, paste0("prec_logy[", k, "]")] + sigma_logy_k <- 1 / sqrt(prec_logy_k) # Convert precision to SD + + # Generate posterior predictive samples for each time point + for (t_idx in seq_along(time_points)) { + t <- time_points[t_idx] + + # Compute mu_logy directly (mean on log scale) matching Stan model + # This is NOT log(ab(...)), but actual log-scale mean from model + mu_logy <- ifelse( + t <= t1_pop, + # Active infection period: log(y0) + beta * t + log(y0_pop) + beta_pop * t, + # Recovery period: power-law decay formula on log scale + { + one_minus_shape <- 1 - shape_pop + # Compute the argument to log, ensuring it stays positive + log_arg <- y1_pop^one_minus_shape - + one_minus_shape * alpha_pop * (t - t1_pop) + # Use pmax to ensure log_arg > 0 (avoid NaN from log of negative) + (1 / one_minus_shape) * log(pmax(log_arg, .Machine$double.eps)) + } + ) + + # Add measurement error to get posterior predictive samples on log scale + logy_pred <- stats::rnorm( + n = length(mu_logy), + mean = mu_logy, + sd = sigma_logy_k + ) + + # Transform back to original scale + y_pred <- exp(logy_pred) + + predictions[, t_idx, k] <- y_pred + } + } + + # Compute summary statistics + summary_list <- list() + for (k in seq_len(n_antigens)) { + antigen_name <- dimnames(predictions)$antigen[k] + summary_list[[antigen_name]] <- data.frame( + time_point = time_points, + mean = apply(predictions[, , k], 2, mean), + median = apply(predictions[, , k], 2, median), + lower_95 = apply(predictions[, , k], 2, quantile, probs = 0.025), + upper_95 = apply(predictions[, , k], 2, quantile, probs = 0.975), + row.names = NULL + ) + } + + # Return results + result <- list( + samples = predictions, + time_points = time_points, + summary = summary_list + ) + + class(result) <- c("posterior_predictive_stan", "list") + return(result) +} diff --git a/README.Rmd b/README.Rmd index 58d2a8de7..dc4788866 100644 --- a/README.Rmd +++ b/README.Rmd @@ -16,20 +16,50 @@ knitr::opts_chunk$set( # `{serodynamics}` -[![Codecov test coverage](https://codecov.io/gh/UCD-SERG/serodynamics/graph/badge.svg)](https://app.codecov.io/gh/UCD-SERG/serodynamics) -[![CodeFactor](https://www.codefactor.io/repository/github/ucd-serg/serodynamics/badge)](https://www.codefactor.io/repository/github/ucd-serg/serodynamics) -[![R-CMD-check](https://github.com/UCD-SERG/dcm/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/UCD-SERG/dcm/actions/workflows/R-CMD-check.yaml) -[![CRAN status](https://www.r-pkg.org/badges/version/serodynamics)](https://CRAN.R-project.org/package=serodynamics) -[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) + +```{r badges, echo=FALSE, results='asis'} +# Use badger package to generate badges +cat( + badger::badge_codecov("ucdavis/serodynamics", branch = "main"), "\n", + "[![CodeFactor](https://www.codefactor.io/repository/github/ucdavis/serodynamics/badge)](https://www.codefactor.io/repository/github/ucdavis/serodynamics)\n", + badger::badge_github_actions("ucdavis/serodynamics", "R-CMD-check.yaml"), "\n", + badger::badge_cran_release("serodynamics", "orange"), "\n", + badger::badge_lifecycle("experimental"), "\n" +) +``` + The goal of `{serodynamics}` is to implement methods for modeling longitudinal antibody responses to infection. +The package provides Bayesian MCMC modeling capabilities using either: + +- **JAGS** (Just Another Gibbs Sampler) via `runjags` - the original implementation +- **Stan** via `cmdstanr` - a modern, efficient alternative (optional) + +Both interfaces use the same data preparation and analysis workflow, allowing users to choose their preferred Bayesian modeling framework. + ## Installation You can install the development version of `{serodynamics}` from [GitHub](https://github.com/) with: ``` r # install.packages("pak") -pak::pak("UCD-SERG/serodynamics") +pak::pak("ucdavis/serodynamics") ``` + +### Stan Support (Optional) + +To use Stan models (via `run_mod_stan()`), you'll also need to install `cmdstanr`: + +``` r +# Install cmdstanr from r-universe +install.packages("cmdstanr", + repos = c("https://stan-dev.r-universe.dev", + getOption("repos"))) + +# Then install CmdStan +cmdstanr::install_cmdstan() +``` + +See the [cmdstanr documentation](https://mc-stan.org/cmdstanr/) for more details. diff --git a/README.md b/README.md index 5182a51fe..9e5dd725c 100644 --- a/README.md +++ b/README.md @@ -5,19 +5,27 @@ -[![Codecov test -coverage](https://codecov.io/gh/UCD-SERG/serodynamics/graph/badge.svg)](https://app.codecov.io/gh/UCD-SERG/serodynamics) -[![CodeFactor](https://www.codefactor.io/repository/github/ucd-serg/serodynamics/badge)](https://www.codefactor.io/repository/github/ucd-serg/serodynamics) -[![R-CMD-check](https://github.com/UCD-SERG/dcm/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/UCD-SERG/dcm/actions/workflows/R-CMD-check.yaml) -[![CRAN -status](https://www.r-pkg.org/badges/version/serodynamics)](https://CRAN.R-project.org/package=serodynamics) -[![Lifecycle: -experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) +[![](https://codecov.io/gh/ucdavis/serodynamics/branch/main/graph/badge.svg)](https://app.codecov.io/gh/ucdavis/serodynamics) +[![CodeFactor](https://www.codefactor.io/repository/github/ucdavis/serodynamics/badge)](https://www.codefactor.io/repository/github/ucdavis/serodynamics) +[![R build +status](https://github.com/ucdavis/serodynamics/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ucdavis/serodynamics/actions) +[![](https://www.r-pkg.org/badges/version/serodynamics?color=orange)](https://cran.r-project.org/package=serodynamics) +[![](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) + The goal of `{serodynamics}` is to implement methods for modeling longitudinal antibody responses to infection. +The package provides Bayesian MCMC modeling capabilities using either: + +- **JAGS** (Just Another Gibbs Sampler) via `runjags` - the original + implementation +- **Stan** via `cmdstanr` - a modern, efficient alternative (optional) + +Both interfaces use the same data preparation and analysis workflow, +allowing users to choose their preferred Bayesian modeling framework. + ## Installation You can install the development version of `{serodynamics}` from @@ -25,5 +33,23 @@ You can install the development version of `{serodynamics}` from ``` r # install.packages("pak") -pak::pak("UCD-SERG/serodynamics") +pak::pak("ucdavis/serodynamics") ``` + +### Stan Support (Optional) + +To use Stan models (via `run_mod_stan()`), you’ll also need to install +`cmdstanr`: + +``` r +# Install cmdstanr from r-universe +install.packages("cmdstanr", + repos = c("https://stan-dev.r-universe.dev", + getOption("repos"))) + +# Then install CmdStan +cmdstanr::install_cmdstan() +``` + +See the [cmdstanr documentation](https://mc-stan.org/cmdstanr/) for more +details. diff --git a/inst/WORDLIST b/inst/WORDLIST index 8c34216f5..5d188de72 100644 --- a/inst/WORDLIST +++ b/inst/WORDLIST @@ -18,6 +18,8 @@ Wasserstein behaviour biomarker biomarkers +burnin +cmdstanr colour darwin dev @@ -49,6 +51,8 @@ rhat seroconversion seroresponse seroresponses +stan +stanfit strat stratifications tbl diff --git a/inst/examples/examples-prep_data_stan.R b/inst/examples/examples-prep_data_stan.R new file mode 100644 index 000000000..9804ace8c --- /dev/null +++ b/inst/examples/examples-prep_data_stan.R @@ -0,0 +1,5 @@ +set.seed(1) +raw_data <- + serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 5) +prepped_data <- prep_data_stan(raw_data) diff --git a/inst/examples/examples-prep_priors_stan.R b/inst/examples/examples-prep_priors_stan.R new file mode 100644 index 000000000..6cf9ea6a6 --- /dev/null +++ b/inst/examples/examples-prep_priors_stan.R @@ -0,0 +1,9 @@ + +prep_priors_stan(max_antigens = 2, + mu_hyp_param = c(1.0, 7.0, 1.0, -4.0, -1.0), + prec_hyp_param = c(1.0, 0.00001, 1.0, 0.001, 1.0), + omega_param = c(1.0, 50.0, 1.0, 10.0, 1.0), + wishdf_param = 20, + prec_logy_hyp_param = c(4.0, 1.0)) + +prep_priors_stan(max_antigens = 2) diff --git a/inst/examples/run_mod_stan-examples.R b/inst/examples/run_mod_stan-examples.R new file mode 100644 index 000000000..d8231afe9 --- /dev/null +++ b/inst/examples/run_mod_stan-examples.R @@ -0,0 +1,35 @@ +# This example requires cmdstanr and CmdStan to be installed +# See ?run_mod_stan for installation instructions + +if (requireNamespace("cmdstanr", quietly = TRUE)) { + # Check if CmdStan is installed + cmdstan_available <- tryCatch( + { + cmdstanr::cmdstan_version() + TRUE + }, + error = function(e) FALSE + ) + + if (cmdstan_available) { + library(dplyr) + set.seed(1) + strat1 <- serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 100) |> + mutate(strat = "stratum 2") + strat2 <- serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 100) |> + mutate(strat = "stratum 1") + + dataset <- bind_rows(strat1, strat2) + + fitted_model <- run_mod_stan( + data = dataset, # The data set input + file_mod = serodynamics_example("model.stan"), + nchain = 4, # Number of mcmc chains to run + nadapt = 500, # Number of warmup iterations + niter = 1000, # Number of sampling iterations + strat = "strat" # Variable to be stratified + ) + } +} diff --git a/inst/extdata/model.dobson.stan b/inst/extdata/model.dobson.stan new file mode 100644 index 000000000..b6d8e922e --- /dev/null +++ b/inst/extdata/model.dobson.stan @@ -0,0 +1,19 @@ +// Simple Bernoulli model (Dobson example) +// Translated from model.dobson.jags + +data { + int N; // number of observations + array[N] int r; // binary outcomes +} + +parameters { + real p; // probability parameter +} + +model { + // Prior: Beta(1, 1) = Uniform(0, 1) + p ~ beta(1, 1); + + // Likelihood: Bernoulli trials + r ~ bernoulli(p); +} diff --git a/inst/extdata/model.stan b/inst/extdata/model.stan new file mode 100644 index 000000000..58cd339fe --- /dev/null +++ b/inst/extdata/model.stan @@ -0,0 +1,118 @@ +// Stan model for antibody dynamics +// Translated from JAGS model.jags +// Models longitudinal antibody responses with hierarchical priors + +data { + int nsubj; // number of subjects + int n_antigen_isos; // number of biomarkers + int n_params; // number of parameters (5) + array[nsubj] int nsmpl; // number of samples per subject (can be 0) + int max_nsmpl; // maximum number of samples + array[nsubj, max_nsmpl] real smpl_t; // sample times + array[nsubj, max_nsmpl, n_antigen_isos] real logy; // log antibody measurements + + // Hyperpriors + array[n_antigen_isos] vector[n_params] mu_hyp; // hyperprior means + array[n_antigen_isos] matrix[n_params, n_params] prec_hyp; // hyperprior precision + array[n_antigen_isos] matrix[n_params, n_params] omega; // Wishart scale matrix + array[n_antigen_isos] real wishdf; // Wishart degrees of freedom + array[n_antigen_isos, 2] real prec_logy_hyp; // gamma hyperpriors for precision +} + +transformed data { + // Convert omega precision matrix to covariance for inverse Wishart + array[n_antigen_isos] matrix[n_params, n_params] omega_inv; + + for (k in 1:n_antigen_isos) { + omega_inv[k] = inverse(omega[k]); + } +} + +parameters { + // Subject-level parameters (transformed scale) + array[nsubj, n_antigen_isos] vector[n_params] par; + + // Population-level parameters + array[n_antigen_isos] vector[n_params] mu_par; + + // Covariance matrices (Stan uses covariance, not precision) + array[n_antigen_isos] cov_matrix[n_params] Sigma_par; + + // Observation precision (measurement error) + array[n_antigen_isos] real prec_logy; +} + +transformed parameters { + // Subject-level parameters on natural scale + array[nsubj, n_antigen_isos] real y0; + array[nsubj, n_antigen_isos] real y1; + array[nsubj, n_antigen_isos] real t1; + array[nsubj, n_antigen_isos] real alpha; + array[nsubj, n_antigen_isos] real shape; + + // Growth rate parameter + array[nsubj, n_antigen_isos] real beta; + + // Transform parameters + for (subj in 1:nsubj) { + for (k in 1:n_antigen_isos) { + y0[subj, k] = exp(par[subj, k][1]); + y1[subj, k] = y0[subj, k] + exp(par[subj, k][2]); // par[,,2] is log(y1-y0) + t1[subj, k] = exp(par[subj, k][3]); + alpha[subj, k] = exp(par[subj, k][4]); + shape[subj, k] = exp(par[subj, k][5]) + 1; + + // Compute growth rate + beta[subj, k] = log(y1[subj, k] / y0[subj, k]) / t1[subj, k]; + } + } +} + +model { + // Hyperpriors + for (k in 1:n_antigen_isos) { + mu_par[k] ~ multi_normal_prec(mu_hyp[k], prec_hyp[k]); + Sigma_par[k] ~ inv_wishart(wishdf[k], omega_inv[k]); + prec_logy[k] ~ gamma(prec_logy_hyp[k, 1], prec_logy_hyp[k, 2]); + } + + // Subject-level priors + for (subj in 1:nsubj) { + for (k in 1:n_antigen_isos) { + par[subj, k] ~ multi_normal(mu_par[k], Sigma_par[k]); + } + } + + // Likelihood + for (subj in 1:nsubj) { + // Only process subjects with observations + if (nsmpl[subj] > 0) { + for (obs in 1:nsmpl[subj]) { + for (k in 1:n_antigen_isos) { + real mu_logy; + real sigma_logy; + + sigma_logy = 1.0 / sqrt(prec_logy[k]); + + // Determine phase: active infection or recovery + if (smpl_t[subj, obs] <= t1[subj, k]) { + // Active infection period (t <= t1) + mu_logy = log(y0[subj, k]) + beta[subj, k] * smpl_t[subj, obs]; + } else { + // Recovery period (t > t1) + real shape_term = shape[subj, k]; + real y1_val = y1[subj, k]; + real t_diff = smpl_t[subj, obs] - t1[subj, k]; + real alpha_val = alpha[subj, k]; + + mu_logy = (1.0 / (1.0 - shape_term)) * + log(y1_val^(1.0 - shape_term) - + (1.0 - shape_term) * alpha_val * t_diff); + } + + logy[subj, obs, k] ~ normal(mu_logy, sigma_logy); + } + } + } + } +} diff --git a/man/prep_data_stan.Rd b/man/prep_data_stan.Rd new file mode 100644 index 000000000..03a54c08b --- /dev/null +++ b/man/prep_data_stan.Rd @@ -0,0 +1,38 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/prep_data_stan.R +\name{prep_data_stan} +\alias{prep_data_stan} +\title{Prepare data for Stan} +\usage{ +prep_data_stan( + dataframe, + biomarker_column = get_biomarker_names_var(dataframe), + verbose = FALSE +) +} +\arguments{ +\item{dataframe}{a \link{data.frame} containing case data} + +\item{biomarker_column}{\link{character} string indicating +which column contains antigen-isotype names} + +\item{verbose}{whether to produce verbose messaging} +} +\value{ +a \code{prepped_stan_data} object (a \link{list} with Stan-formatted data) +} +\description{ +Prepare data for Stan +} +\examples{ +set.seed(1) +raw_data <- + serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 5) +prepped_data <- prep_data_stan(raw_data) + +} +\seealso{ +\code{\link[=sample_predictive_stan]{sample_predictive_stan()}} for posterior predictive +sampling with Stan models +} diff --git a/man/prep_priors.Rd b/man/prep_priors.Rd index 48fa1dea2..7e4988674 100644 --- a/man/prep_priors.Rd +++ b/man/prep_priors.Rd @@ -19,14 +19,14 @@ antigen-isotypes (biomarkers) will be modeled.} \item{mu_hyp_param}{A \link{numeric} \link{vector} of 5 values representing the prior mean for the population level parameters -parameters (y0, y1, t1, r, alpha) for each biomarker. +parameters (y0, y1, t1, alpha, shape) for each biomarker. If specified, must be 5 values long, representing the following parameters: \itemize{ \item y0 = baseline antibody concentration (default = 1.0) \item y1 = peak antibody concentration (default = 7.0) \item t1 = time to peak (default = 1.0) -\item r = shape parameter (default = -4.0) -\item alpha = decay rate (default = -1.0) +\item alpha = decay rate (default = -4.0) +\item shape = shape parameter (default = -1.0) }} \item{prec_hyp_param}{A \link{numeric} \link{vector} of 5 values corresponding to @@ -34,7 +34,7 @@ hyperprior diagonal entries for the precision matrix (i.e. inverse variance) representing prior covariance of uncertainty around \code{mu_hyp_param}. If specified, must be 5 values long: \itemize{ -\item defaults: y0 = 1.0, y1 = 0.00001, t1 = 1.0, r = 0.001, alpha = 1.0 +\item defaults: y0 = 1.0, y1 = 0.00001, t1 = 1.0, alpha = 0.001, shape = 1.0 }} \item{omega_param}{A \link{numeric} \link{vector} of 5 values corresponding to the @@ -43,7 +43,7 @@ distributions of \code{prec_hyp_param}, describing how much we expect parameters to vary between individuals. If specified, must be 5 values long: \itemize{ -\item defaults: y0 = 1.0, y1 = 50.0, t1 = 1.0, r = 10.0, alpha = 1.0 +\item defaults: y0 = 1.0, y1 = 50.0, t1 = 1.0, alpha = 10.0, shape = 1.0 }} \item{wishdf_param}{An \link{integer} \link{vector} of 1 value specifying the degrees @@ -74,8 +74,8 @@ elements: \item "n_params": Corresponds to the 5 parameters being estimated. \item "mu.hyp": A \link{matrix} of hyperpriors with dimensions \code{max_antigens} x 5 (# of parameters), representing the mean of the -hyperprior distribution for the five seroresponse parameters: y0, y1, t1, r, -and alpha). +hyperprior distribution for the five seroresponse parameters: y0, y1, t1, +alpha, and shape). \item "prec.hyp": A three-dimensional \link{numeric} \link{array} with dimensions \code{max_antigens} x 5 (# of parameters), containing the precision matrices of the hyperprior distributions of diff --git a/man/prep_priors_stan.Rd b/man/prep_priors_stan.Rd new file mode 100644 index 000000000..d1f29a5aa --- /dev/null +++ b/man/prep_priors_stan.Rd @@ -0,0 +1,89 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/prep_priors_stan.R +\name{prep_priors_stan} +\alias{prep_priors_stan} +\title{Prepare priors for Stan} +\usage{ +prep_priors_stan( + max_antigens, + mu_hyp_param = c(1, 7, 1, -4, -1), + prec_hyp_param = c(1, 1e-05, 1, 0.001, 1), + omega_param = c(1, 50, 1, 10, 1), + wishdf_param = 20, + prec_logy_hyp_param = c(4, 1) +) +} +\arguments{ +\item{max_antigens}{An \link{integer} specifying how many +antigen-isotypes (biomarkers) will be modeled.} + +\item{mu_hyp_param}{A \link{numeric} \link{vector} of 5 values representing the prior +mean for the population level parameters +parameters (y0, y1, t1, alpha, shape) for each biomarker. +If specified, must be 5 values long, representing the following parameters: +\itemize{ +\item y0 = baseline antibody concentration (default = 1.0) +\item y1 = peak antibody concentration (default = 7.0) +\item t1 = time to peak (default = 1.0) +\item alpha = decay rate (default = -4.0) +\item shape = shape parameter (default = -1.0) +}} + +\item{prec_hyp_param}{A \link{numeric} \link{vector} of 5 values corresponding to +hyperprior diagonal entries for the precision matrix (i.e. inverse variance) +representing prior covariance of uncertainty around \code{mu_hyp_param}. +If specified, must be 5 values long: +\itemize{ +\item defaults: y0 = 1.0, y1 = 0.00001, t1 = 1.0, alpha = 0.001, shape = 1.0 +}} + +\item{omega_param}{A \link{numeric} \link{vector} of 5 values corresponding to the +diagonal entries representing the Wishart hyperprior +distributions of \code{prec_hyp_param}, describing how much we expect parameters +to vary between individuals. +If specified, must be 5 values long: +\itemize{ +\item defaults: y0 = 1.0, y1 = 50.0, t1 = 1.0, alpha = 10.0, shape = 1.0 +}} + +\item{wishdf_param}{An \link{integer} \link{vector} of 1 value specifying the degrees +of freedom for the Wishart hyperprior distribution of \code{prec_hyp_param}. +If specified, must be 1 value long. +\itemize{ +\item default = 20.0 +\item The value of \code{wishdf_param} controls how informative the Wishart prior +is. Higher values lead to tighter priors on individual variation. +Lower values (e.g., 5–10) make the prior more weakly informative, +which can help improve convergence if the model is over-regularized. +}} + +\item{prec_logy_hyp_param}{A \link{numeric} \link{vector} of 2 values corresponding to +hyperprior diagonal entries on the log-scale for the precision matrix +(i.e. inverse variance) representing prior beliefs of individual variation. +If specified, must be 2 values long: +\itemize{ +\item defaults = 4.0, 1.0 +}} +} +\value{ +A "curve_params_priors_stan" object +(a subclass of \link{list} with the inputs to \code{prep_priors_stan()} attached +as \link{attributes} entry named \code{"used_priors"}), containing Stan-formatted +priors. +} +\description{ +Takes multiple \link{vector} inputs to allow for modifiable priors for Stan +models. Returns Stan-compatible prior specifications (matrix inversions +are handled in the Stan model itself). +} +\examples{ + +prep_priors_stan(max_antigens = 2, + mu_hyp_param = c(1.0, 7.0, 1.0, -4.0, -1.0), + prec_hyp_param = c(1.0, 0.00001, 1.0, 0.001, 1.0), + omega_param = c(1.0, 50.0, 1.0, 10.0, 1.0), + wishdf_param = 20, + prec_logy_hyp_param = c(4.0, 1.0)) + +prep_priors_stan(max_antigens = 2) +} diff --git a/man/run_mod.Rd b/man/run_mod.Rd index b35d4977b..4666420c6 100644 --- a/man/run_mod.Rd +++ b/man/run_mod.Rd @@ -49,21 +49,21 @@ Note: These objects can be large.} antigen-isotypes (biomarkers) will be modeled.} \item{\code{mu_hyp_param}}{A \link{numeric} \link{vector} of 5 values representing the prior mean for the population level parameters -parameters (y0, y1, t1, r, alpha) for each biomarker. +parameters (y0, y1, t1, alpha, shape) for each biomarker. If specified, must be 5 values long, representing the following parameters: \itemize{ \item y0 = baseline antibody concentration (default = 1.0) \item y1 = peak antibody concentration (default = 7.0) \item t1 = time to peak (default = 1.0) -\item r = shape parameter (default = -4.0) -\item alpha = decay rate (default = -1.0) +\item alpha = decay rate (default = -4.0) +\item shape = shape parameter (default = -1.0) }} \item{\code{prec_hyp_param}}{A \link{numeric} \link{vector} of 5 values corresponding to hyperprior diagonal entries for the precision matrix (i.e. inverse variance) representing prior covariance of uncertainty around \code{mu_hyp_param}. If specified, must be 5 values long: \itemize{ -\item defaults: y0 = 1.0, y1 = 0.00001, t1 = 1.0, r = 0.001, alpha = 1.0 +\item defaults: y0 = 1.0, y1 = 0.00001, t1 = 1.0, alpha = 0.001, shape = 1.0 }} \item{\code{omega_param}}{A \link{numeric} \link{vector} of 5 values corresponding to the diagonal entries representing the Wishart hyperprior @@ -71,7 +71,7 @@ distributions of \code{prec_hyp_param}, describing how much we expect parameters to vary between individuals. If specified, must be 5 values long: \itemize{ -\item defaults: y0 = 1.0, y1 = 50.0, t1 = 1.0, r = 10.0, alpha = 1.0 +\item defaults: y0 = 1.0, y1 = 50.0, t1 = 1.0, alpha = 10.0, shape = 1.0 }} \item{\code{wishdf_param}}{An \link{integer} \link{vector} of 1 value specifying the degrees of freedom for the Wishart hyperprior distribution of \code{prec_hyp_param}. diff --git a/man/run_mod_stan.Rd b/man/run_mod_stan.Rd new file mode 100644 index 000000000..ca87acef1 --- /dev/null +++ b/man/run_mod_stan.Rd @@ -0,0 +1,146 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/run_mod_stan.R +\name{run_mod_stan} +\alias{run_mod_stan} +\title{Run Stan Model} +\usage{ +run_mod_stan( + data, + file_mod = serodynamics_example("model.stan"), + nchain = 4, + nadapt = 1000, + niter = 1000, + strat = NA, + with_post = FALSE, + ... +) +} +\arguments{ +\item{data}{A \code{\link[base:data.frame]{base::data.frame()}} with the required columns (see details).} + +\item{file_mod}{The name of the file that contains model structure +(a .stan file).} + +\item{nchain}{A positive \link{integer} specifying the number of MCMC chains +to be run per Stan model.} + +\item{nadapt}{An \link{integer} specifying the number of warmup/adaptation +iterations per chain (Stan equivalent of JAGS adapt + burnin).} + +\item{niter}{An \link{integer} specifying the number of post-warmup iterations.} + +\item{strat}{A \link{character} string specifying the stratification variable, +entered in quotes.} + +\item{with_post}{A \link{logical} value specifying whether the raw CmdStanR fit +object(s) should be stored as an attribute (\code{stan.fit}) of the returned +\code{sr_model} object. When \code{TRUE}, the posterior draws are accessible for +functions like \code{\link[=sample_predictive_stan]{sample_predictive_stan()}}. Note: These objects can be large.} + +\item{...}{ + Arguments passed on to \code{\link{prep_priors_stan}} + \describe{ + \item{\code{max_antigens}}{An \link{integer} specifying how many +antigen-isotypes (biomarkers) will be modeled.} + \item{\code{mu_hyp_param}}{A \link{numeric} \link{vector} of 5 values representing the prior +mean for the population level parameters +parameters (y0, y1, t1, alpha, shape) for each biomarker. +If specified, must be 5 values long, representing the following parameters: +\itemize{ +\item y0 = baseline antibody concentration (default = 1.0) +\item y1 = peak antibody concentration (default = 7.0) +\item t1 = time to peak (default = 1.0) +\item alpha = decay rate (default = -4.0) +\item shape = shape parameter (default = -1.0) +}} + \item{\code{prec_hyp_param}}{A \link{numeric} \link{vector} of 5 values corresponding to +hyperprior diagonal entries for the precision matrix (i.e. inverse variance) +representing prior covariance of uncertainty around \code{mu_hyp_param}. +If specified, must be 5 values long: +\itemize{ +\item defaults: y0 = 1.0, y1 = 0.00001, t1 = 1.0, alpha = 0.001, shape = 1.0 +}} + \item{\code{omega_param}}{A \link{numeric} \link{vector} of 5 values corresponding to the +diagonal entries representing the Wishart hyperprior +distributions of \code{prec_hyp_param}, describing how much we expect parameters +to vary between individuals. +If specified, must be 5 values long: +\itemize{ +\item defaults: y0 = 1.0, y1 = 50.0, t1 = 1.0, alpha = 10.0, shape = 1.0 +}} + \item{\code{wishdf_param}}{An \link{integer} \link{vector} of 1 value specifying the degrees +of freedom for the Wishart hyperprior distribution of \code{prec_hyp_param}. +If specified, must be 1 value long. +\itemize{ +\item default = 20.0 +\item The value of \code{wishdf_param} controls how informative the Wishart prior +is. Higher values lead to tighter priors on individual variation. +Lower values (e.g., 5–10) make the prior more weakly informative, +which can help improve convergence if the model is over-regularized. +}} + \item{\code{prec_logy_hyp_param}}{A \link{numeric} \link{vector} of 2 values corresponding to +hyperprior diagonal entries on the log-scale for the precision matrix +(i.e. inverse variance) representing prior beliefs of individual variation. +If specified, must be 2 values long: +\itemize{ +\item defaults = 4.0, 1.0 +}} + }} +} +\value{ +An \code{sr_model} class object: a subclass of \link[dplyr:tbl_df]{dplyr::tbl_df} that +contains MCMC samples from the joint posterior distribution of the model +parameters, conditional on the provided input \code{data}, +including the same structure as \code{run_mod()}. When \code{with_post = TRUE}, +the raw CmdStanR fit object(s) are stored as an attribute named \code{stan.fit}. +} +\description{ +\code{run_mod_stan()} takes a data frame and adjustable MCMC inputs to fit a +Bayesian model using Stan (via cmdstanr) to estimate antibody dynamic +curve parameters. The model estimates seroresponse dynamics to an +infection. The antibody dynamic curve includes the following parameters: +\itemize{ +\item y0 = baseline antibody concentration +\item y1 = peak antibody concentration +\item t1 = time to peak +\item alpha = decay rate +\item shape = shape parameter +} +} +\examples{ +# This example requires cmdstanr and CmdStan to be installed +# See ?run_mod_stan for installation instructions + +if (requireNamespace("cmdstanr", quietly = TRUE)) { + # Check if CmdStan is installed + cmdstan_available <- tryCatch( + { + cmdstanr::cmdstan_version() + TRUE + }, + error = function(e) FALSE + ) + + if (cmdstan_available) { + library(dplyr) + set.seed(1) + strat1 <- serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 100) |> + mutate(strat = "stratum 2") + strat2 <- serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 100) |> + mutate(strat = "stratum 1") + + dataset <- bind_rows(strat1, strat2) + + fitted_model <- run_mod_stan( + data = dataset, # The data set input + file_mod = serodynamics_example("model.stan"), + nchain = 4, # Number of mcmc chains to run + nadapt = 500, # Number of warmup iterations + niter = 1000, # Number of sampling iterations + strat = "strat" # Variable to be stratified + ) + } +} +} diff --git a/man/sample_predictive_stan.Rd b/man/sample_predictive_stan.Rd new file mode 100644 index 000000000..abe60cd5e --- /dev/null +++ b/man/sample_predictive_stan.Rd @@ -0,0 +1,78 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/sample_predictive_stan.R +\name{sample_predictive_stan} +\alias{sample_predictive_stan} +\title{Sample from posterior predictive distribution (Stan models)} +\usage{ +sample_predictive_stan( + stan_model_output, + time_points = c(5, 30, 90), + n_samples = NULL +) +} +\arguments{ +\item{stan_model_output}{Output from \code{\link[=run_mod_stan]{run_mod_stan()}}, an object of class +\code{sr_model} containing the fitted Stan model. \strong{Important}: The model must +have been fit with \code{with_post = TRUE} to include the posterior draws +required for prediction. Note that storing posterior draws increases memory +usage.} + +\item{time_points}{Numeric vector of time points (in days) at which to +generate predictions. Default is \code{c(5, 30, 90)}.} + +\item{n_samples}{Number of posterior samples to draw. If \code{NULL} (default), +uses all available posterior samples from the model.} +} +\value{ +A list of class \code{posterior_predictive_stan} containing: +\itemize{ +\item \code{samples}: Array of posterior predictive samples with dimensions +\verb{[n_samples, n_timepoints, n_antigens]}. These include measurement +error and represent plausible new observations. +\item \code{time_points}: The time points used for prediction +\item \code{summary}: Summary statistics (mean, median, 95\% credible intervals) +for each antigen at each time point +} +} +\description{ +Generate posterior predictive samples for new observations using a fitted +Stan model. This function samples from the population-level parameter +distribution and includes measurement error to generate true posterior +predictive samples (not just mean curve draws). Predictions are made on +the original antibody concentration scale. +} +\details{ +This function generates true posterior predictive samples by: +\enumerate{ +\item Extracting population-level parameter draws (mu_par, prec_logy) +from the fitted model +\item Computing the mean log-antibody concentration (mu_logy) directly +using the Stan model formula for each time point +\item Adding measurement error sampled from Normal(0, sigma_logy) where +sigma_logy = 1/sqrt(prec_logy) +\item Transforming back to the original antibody concentration scale +} + +The resulting samples represent plausible new observations, not just the +mean curve. For stratified models, draws from all strata are combined. +} +\examples{ +\dontrun{ +# Fit a Stan model with posterior samples +model_output <- run_mod_stan( + data = my_data, + file_mod = "model.stan", + nchain = 4, + with_post = TRUE +) + +# Generate posterior predictive samples +predictions <- sample_predictive_stan( + model_output, + time_points = c(5, 30, 90) +) + +# Access summary statistics +print(predictions$summary) +} +} diff --git a/man/serodynamics-package.Rd b/man/serodynamics-package.Rd index 0eb607209..e836de3f9 100644 --- a/man/serodynamics-package.Rd +++ b/man/serodynamics-package.Rd @@ -11,9 +11,9 @@ Modeling longitudinal immune seroresponses to pathogens. \seealso{ Useful links: \itemize{ - \item \url{https://github.com/UCD-SERG/serodynamics} - \item \url{https://ucd-serg.github.io/serodynamics/} - \item Report bugs at \url{https://github.com/UCD-SERG/serodynamics/issues} + \item \url{https://github.com/ucdavis/serodynamics} + \item \url{https://ucdavis.github.io/serodynamics/} + \item Report bugs at \url{https://github.com/ucdavis/serodynamics/issues} } } diff --git a/pkgdown/_pkgdown.yml b/pkgdown/_pkgdown.yml index 61c982025..bdc72576b 100644 --- a/pkgdown/_pkgdown.yml +++ b/pkgdown/_pkgdown.yml @@ -26,9 +26,17 @@ reference: contents: - prep_priors - initsfunction +- title: Prepare auxiliary Stan inputs + contents: + - prep_priors_stan + - prep_data_stan - title: Model seroreponse contents: - run_mod + - run_mod_stan +- title: Posterior predictive sampling + contents: + - sample_predictive_stan - title: Model diagnostics contents: - plot_jags_dens diff --git a/tests/testthat/_snaps/as_case_data.md b/tests/testthat/_snaps/as_case_data.md deleted file mode 100644 index 2b43c83ae..000000000 --- a/tests/testthat/_snaps/as_case_data.md +++ /dev/null @@ -1,14 +0,0 @@ -# results are consistent with simulated data - - list(names = c("id", "visit_num", "timeindays", "iter", "antigen_iso", - "y0", "y1", "t1", "alpha", "r", "value"), class = c("case_data", - "tbl_df", "tbl", "data.frame"), id_var = "id", biomarker_var = "antigen_iso", - timeindays = "timeindays", value_var = "value") - -# results are consistent with SEES data - - list(names = c("Country", "id", "sample_id", "bldculres", "antigen_iso", - "studyvisit", "dayssincefeveronset", "result", "visit_num"), - class = c("case_data", "spec_tbl_df", "tbl_df", "tbl", "data.frame" - ), id_var = "id", biomarker_var = "antigen_iso", timeindays = "dayssincefeveronset", - value_var = "result") diff --git a/tests/testthat/_snaps/stan-functions.md b/tests/testthat/_snaps/stan-functions.md new file mode 100644 index 000000000..ac4d221a2 --- /dev/null +++ b/tests/testthat/_snaps/stan-functions.md @@ -0,0 +1,110 @@ +# prep_priors_stan results are consistent + + structure(list(n_params = 5, mu_hyp = structure(c(1, 1, 7, 7, + 1, 1, -4, -4, -1, -1), dim = c(2L, 5L)), prec_hyp = structure(c(1, + 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1e-05, 1e-05, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.001, 0.001, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1), dim = c(2L, 5L, 5L)), omega = structure(c(1, + 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 50, 50, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 10, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 1, 1), dim = c(2L, 5L, 5L)), wishdf = c(20, + 20), prec_logy_hyp = structure(c(4, 4, 1, 1), dim = c(2L, 2L))), class = c("curve_params_priors_stan", + "list"), used_priors = list(mu_hyp_param = c(1, 7, 1, -4, -1), + prec_hyp_param = c(1, 1e-05, 1, 0.001, 1), omega_param = c(1, + 50, 1, 10, 1), wishdf_param = 20, prec_logy_hyp_param = c(4, + 1))) + +# prep_priors_stan priors are modifiable + + structure(list(n_params = 5, mu_hyp = structure(c(1, 1, 5, 5, + 0, 0, -2, -2, -3, -3), dim = c(2L, 5L)), prec_hyp = structure(c(0.01, + 0.01, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.01, 0.01, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0.01, 0.01, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.01, + 0.01, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.01, 0.01), dim = c(2L, + 5L, 5L)), omega = structure(c(1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 50, 50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1), dim = c(2L, + 5L, 5L)), wishdf = c(15, 15), prec_logy_hyp = structure(c(4, + 4, 1, 1), dim = c(2L, 2L))), class = c("curve_params_priors_stan", + "list"), used_priors = list(mu_hyp_param = c(1, 5, 0, -2, -3), + prec_hyp_param = c(0.01, 0.01, 0.01, 0.01, 0.01), omega_param = c(1, + 50, 1, 5, 1), wishdf_param = 15, prec_logy_hyp_param = c(4, + 1))) + +# prep_data_stan results are consistent + + WAoAAAACAAQGAAACAwAAAAMTAAAABwAAAA0AAAABAAAAFAAAAA0AAAABAAAAAQAAAA4AAAAB + QBQAAAAAAAAAAAANAAAAFAAAAAQAAAAFAAAABwAAAAEAAAAEAAAACgAAAAEAAAAIAAAACAAA + AAIAAAAEAAAAAwAAAAgAAAAFAAAACQAAAAYAAAAJAAAAAQAAAAUAAAAJAAAADQAAAAEAAAAK + AAACDgAAAMgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA + AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA + AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA + AAAAAAAAQBwAAAAAAABAHAAAAAAAAEAYAAAAAAAAAAAAAAAAAABAIAAAAAAAAEAcAAAAAAAA + AAAAAAAAAABAIAAAAAAAAEAYAAAAAAAAQBgAAAAAAABAGAAAAAAAAEAYAAAAAAAAQBgAAAAA + AABAIAAAAAAAAEAgAAAAAAAAQBwAAAAAAABAIAAAAAAAAAAAAAAAAAAAQBwAAAAAAABAIAAA + AAAAAEAsAAAAAAAAQCoAAAAAAABAKAAAAAAAAAAAAAAAAAAAQC4AAAAAAABALAAAAAAAAAAA + AAAAAAAAQC4AAAAAAABAKAAAAAAAAAAAAAAAAAAAQCoAAAAAAABAKAAAAAAAAEAsAAAAAAAA + QC4AAAAAAABALgAAAAAAAEAqAAAAAAAAQC4AAAAAAAAAAAAAAAAAAEAqAAAAAAAAQDAAAAAA + AABANgAAAAAAAEA1AAAAAAAAQDQAAAAAAAAAAAAAAAAAAEA2AAAAAAAAQDQAAAAAAAAAAAAA + AAAAAEA3AAAAAAAAQDIAAAAAAAAAAAAAAAAAAEAzAAAAAAAAAAAAAAAAAABANQAAAAAAAEA2 + AAAAAAAAQDYAAAAAAABANQAAAAAAAEA1AAAAAAAAAAAAAAAAAABANQAAAAAAAEA2AAAAAAAA + AAAAAAAAAABAPAAAAAAAAEA7AAAAAAAAAAAAAAAAAAAAAAAAAAAAAEA7AAAAAAAAAAAAAAAA + AABAPwAAAAAAAEA4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQDsAAAAAAABAPQAA + AAAAAEA+AAAAAAAAQD0AAAAAAABAPAAAAAAAAAAAAAAAAAAAQDwAAAAAAABAPAAAAAAAAAAA + AAAAAAAAAAAAAAAAAABAQQAAAAAAAAAAAAAAAAAAAAAAAAAAAABAQQAAAAAAAAAAAAAAAAAA + QEMAAAAAAABAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEBAgAAAAAAAAAAAAAAA + AABAQwAAAAAAAEBCAAAAAAAAQEEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQEGAAAAAAAAAAAAA + AAAAAAAAAAAAAAAAQEUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQESAAAAAAAAAAAAAAAAAAEBH + AAAAAAAAQEOAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABARIAAAAAAAAAAAAAAAAAA + QEcAAAAAAAAAAAAAAAAAAEBFAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEBFAAAAAAAAAAAAAAAA + AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEBIAAAAAAAAAAAAAAAAAABASwAA + AAAAAEBHgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQEiAAAAAAAAAAAAAAAAAAEBK + AAAAAAAAAAAAAAAAAABASQAAAAAAAAAAAAAAAAAAAAAAAAAAAABASIAAAAAAAAAAAAAAAAAA + AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABASwAAAAAAAAAAAAAAAAAAAAAAAAAA + AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABATYAA + AAAAAAAAAAAAAAAAQEwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQEuAAAAAAAAAAAAAAAAAAAAA + AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQE8AAAAAAAAAAAAAAAAAAAAAAAAAAAAA + AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA + AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEAgAAAAEABAAJ + AAAAA2RpbQAAAA0AAAACAAAAFAAAAAoAAAD+AAACDgAAAMg/3ahL5AQg9j/12uPmxMWXv8t0 + YCQtBfM/6P8fms/6fz/zYGt6ol2NP/l+Zw4sEq4/8An+hoM8YD/uOzhP0qYrv84UahpQDWE/ + 2d6gM+eOGj/yzDn/1g6Vv/OH3UQTVUdABFMg2ZRbbD/vodU83dbhP/RWTX8O09g/2d6gM+eO + GkAIqqOtGNJfv/Nuy/sVtXQ/3ahL5AQg9j/dqEvkBCD2P/624Sq5S9JAE5C7NaDLDUAFTu0n + PfELAAAAAAAAAABAEzJSH08BU0ANI5bcTFjYAAAAAAAAAABADfjad7XAR0AMcJg+lqnbQAI8 + RRlvm0pACP1/Ps5ZVEAS4OQF2MWMQAnGm8F8ezxAFXJSP9wpgEAZrJHRUpujQAS8V/HtYjZA + DjRZIJiwKgAAAAAAAAAAP/624Sq5S9JAAQV/T8U9B0ALAdeuOMe0QBLeGqY3HSFABSFNqEcW + 7AAAAAAAAAAAQBjWtb5ix51ACtoSxb4t7AAAAAAAAAAAQAps0oQQf/NAD5ULorb4wwAAAAAA + AAAAQAah/HY37NtAEtlj/mzNNUADpNkWSYTCQBTlYyEp+cVAFNcPR5sT/kANN+HcJX5VP/QF + f+BgXNAAAAAAAAAAAEAJV8jz0DCXQAr0UKH211lACqoY3hFXj0ASEnPiafYaQATlKryYTCkA + AAAAAAAAAEAYdAF/b7dNQAlRbhLIlYsAAAAAAAAAAEAH8Z1nuOW7QA56NdFBpVcAAAAAAAAA + AEAFK1bgpQjyAAAAAAAAAABAAXdKr89eG0AUY2y3N9GGQBJK4Px2OR5ADLAaHlCfuz/vJLbV + MgvlAAAAAAAAAABACrZf9bNhbEAKqhjeEVePAAAAAAAAAABAEXhtFz+Ej0AEsS8NeOujAAAA + AAAAAAAAAAAAAAAAAEAH2FmEOUzDAAAAAAAAAABABjBvq/4J7kANdTpPoARMAAAAAAAAAAAA + AAAAAAAAAAAAAAAAAAAAQAA1pneYAudAE+rZHALMAUAQiZk7IIgsQAwsK4FuDAs/6aM2s4tP + tQAAAAAAAAAAQAphL2JeQ2ZACmEvYl5DZgAAAAAAAAAAAAAAAAAAAABABH3H1AVZfgAAAAAA + AAAAAAAAAAAAAABABpvQRsfGYgAAAAAAAAAAQAT8F0Lse4FADDXrEtFavAAAAAAAAAAAAAAA + AAAAAAAAAAAAAAAAAD/+ezOI29P7AAAAAAAAAABADpwjJngcj0ALu7gIZR7yP+Zumb9L7g8A + AAAAAAAAAAAAAAAAAAAAQAoNthgJCqMAAAAAAAAAAAAAAAAAAAAAQARDuk32j8EAAAAAAAAA + AAAAAAAAAAAAQAWLCvl61qUAAAAAAAAAAEAD2ZT6IpP2QAs1TuxntJ8AAAAAAAAAAAAAAAAA + AAAAAAAAAAAAAAA//G/DUVF9UAAAAAAAAAAAQAy1UjeybNkAAAAAAAAAAD/jO2SIDcaMAAAA + AAAAAAAAAAAAAAAAAEAJu+K9nob3AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA + AAAAAAAAAEAEm2Rrx/71AAAAAAAAAABAAuUML4J5BUAKJhPg9kTeAAAAAAAAAAAAAAAAAAAA + AAAAAAAAAAAAP/rJ+TeaAvUAAAAAAAAAAEALhVvg/JV9AAAAAAAAAAA/4MC0QUqP9AAAAAAA + AAAAAAAAAAAAAABACWuk/AcsDgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA + AAAAAABAA+LVey+0FQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA + AAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAClHp59gi/wAAAAAAAAAAP95m4iPyC7kAAAAAAAAA + AAAAAAAAAAAAQAkoFOIHO0UAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA + AAAAQAME17eKonUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA + AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA + AAAAAAAAAAAAAAAAAAAAAAAEAgAAAf8AAAANAAAAAwAAABQAAAAKAAAAAQAAAP4AAAQCAAAA + AQAEAAkAAAAFbmFtZXMAAAAQAAAABwAEAAkAAAAFbnN1YmoABAAJAAAADm5fYW50aWdlbl9p + c29zAAQACQAAAAhuX3BhcmFtcwAEAAkAAAAFbnNtcGwABAAJAAAACW1heF9uc21wbAAEAAkA + AAAGc21wbF90AAQACQAAAARsb2d5AAAEAgAAAAEABAAJAAAABWNsYXNzAAAAEAAAAAIABAAJ + AAAAEXByZXBwZWRfc3Rhbl9kYXRhAAQACQAAAARsaXN0AAAEAgAAAAEABAAJAAAACGFudGln + ZW5zAAAAEAAAAAEABAAJAAAACEhseUVfSWdBAAAEAgAAAAEABAAJAAAACm5fYW50aWdlbnMA + AAANAAAAAQAAAAEAAAQCAAAAAQAEAAkAAAADaWRzAAAAEAAAABQABAAJAAAAATEABAAJAAAA + ATIABAAJAAAAATMABAAJAAAAATQABAAJAAAAATUABAAJAAAAATYABAAJAAAAATcABAAJAAAA + ATgABAAJAAAAATkABAAJAAAAAjEwAAQACQAAAAIxMQAEAAkAAAACMTIABAAJAAAAAjEzAAQA + CQAAAAIxNAAEAAkAAAACMTUABAAJAAAAAjE2AAQACQAAAAIxNwAEAAkAAAACMTgABAAJAAAA + AjE5AAQACQAAAAIyMAAAAP4= + diff --git a/tests/testthat/test-stan-functions.R b/tests/testthat/test-stan-functions.R new file mode 100644 index 000000000..2e01abf1d --- /dev/null +++ b/tests/testthat/test-stan-functions.R @@ -0,0 +1,306 @@ +# Tests for Stan backend functions +# These tests validate prep_data_stan(), prep_priors_stan(), and run_mod_stan() + +test_that("prep_priors_stan results are consistent", { + prep_priors_stan(max_antigens = 2) |> + expect_snapshot_value(style = "deparse") +}) + +test_that("prep_priors_stan priors are modifiable", { + prep_priors_stan( + max_antigens = 2, + mu_hyp_param = c(1.0, 5.0, 0.0, -2.0, -3.0), + prec_hyp_param = c(0.01, 0.01, 0.01, 0.01, 0.01), + omega_param = c(1.0, 50.0, 1.0, 5.0, 1.0), + wishdf_param = 15, + prec_logy_hyp_param = c(4.0, 1.0) + ) |> + expect_snapshot_value(style = "deparse") +}) + +test_that("prep_data_stan validates NA values in input data", { + # Create data with NA values + case_data_with_na <- serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 10, antigen_isos = "HlyE_IgA") + + # Introduce NA in value column + case_data_with_na$value[1] <- NA + + expect_error( + suppressWarnings(prep_data_stan(case_data_with_na)), + "Stan data cannot contain NA values" + ) +}) + +test_that("prep_data_stan results are consistent", { + withr::local_seed(1) + case_data <- serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 20, antigen_isos = "HlyE_IgA") + + prepped_data <- suppressWarnings(prep_data_stan(case_data)) + + # Check structure + expect_type(prepped_data, "list") + expect_true( + all(c("nsubj", "n_antigen_isos", "n_params", "nsmpl", + "max_nsmpl", "smpl_t", "logy") %in% names(prepped_data)) + ) + + # Check that data is padded (rectangular arrays) + expect_equal(length(dim(prepped_data$logy)), 3) + expect_equal(length(dim(prepped_data$smpl_t)), 2) + + # Snapshot the structure + prepped_data |> expect_snapshot_value(style = "serialize") +}) + +test_that("run_mod_stan errors when cmdstanr not available", { + skip_if(requireNamespace("cmdstanr", quietly = TRUE), + "cmdstanr is available, skipping unavailability test") + + case_data <- serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 10, antigen_isos = "HlyE_IgA") + + expect_error( + run_mod_stan(case_data), + "cmdstanr" + ) +}) + +test_that("run_mod_stan works with cmdstanr installed", { + skip_if_not_installed("cmdstanr") + skip_if( + is.null(tryCatch(cmdstanr::cmdstan_version(), error = function(e) NULL)), + "CmdStan not installed" + ) + + withr::local_seed(1) + case_data <- serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 20, antigen_isos = "HlyE_IgA") + + results <- run_mod_stan( + data = case_data, + file_mod = serodynamics_example("model.stan"), + nchain = 2, + nadapt = 100, + niter = 10 + ) |> + suppressWarnings() + + # Check output structure + expect_s3_class(results, "sr_model") + expect_s3_class(results, "tbl_df") + + # Check required columns + expect_true( + all(c("Parameter", "value", "Stratification") %in% names(results)) + ) + + # Check attributes + attrs <- attributes(results) + expect_true("priors" %in% names(attrs)) + expect_true("fitted_residuals" %in% names(attrs)) + + # Snapshot attributes (excluding row.names and fitted_residuals for stability) + attrs |> + rlist::list.remove(c("row.names", "fitted_residuals")) |> + expect_snapshot_value(style = "deparse") +}) + +test_that("run_mod_stan works with stratification", { + skip_if_not_installed("cmdstanr") + skip_if( + is.null(tryCatch(cmdstanr::cmdstan_version(), error = function(e) NULL)), + "CmdStan not installed" + ) + + withr::local_seed(1) + strat1 <- serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 20, antigen_isos = "HlyE_IgA") |> + dplyr::mutate(strat = "stratum 1") + + withr::local_seed(2) + strat2 <- serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 20, antigen_isos = "HlyE_IgA") |> + dplyr::mutate(strat = "stratum 2") + + dataset <- dplyr::bind_rows(strat1, strat2) + + results <- run_mod_stan( + data = dataset, + file_mod = serodynamics_example("model.stan"), + nchain = 2, + nadapt = 100, + niter = 10, + strat = "strat" + ) |> + suppressWarnings() + + # Check stratification column exists and has correct values + expect_true("Stratification" %in% names(results)) + expect_true(all(c("stratum 1", "stratum 2") %in% results$Stratification)) + + # Check that both strata have results + strat_counts <- table(results$Stratification) + expect_equal(length(strat_counts), 2) + expect_true(all(strat_counts > 0)) +}) + +test_that("sample_predictive_stan works with fitted Stan model", { + skip_if_not_installed("cmdstanr") + skip_if( + is.null(tryCatch(cmdstanr::cmdstan_version(), error = function(e) NULL)), + "CmdStan not installed" + ) + + withr::local_seed(1) + case_data <- serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 20, antigen_isos = "HlyE_IgA") + + # Fit model with posterior samples + model_output <- run_mod_stan( + data = case_data, + file_mod = serodynamics_example("model.stan"), + nchain = 2, + nadapt = 100, + niter = 10, + with_post = TRUE + ) |> + suppressWarnings() + + # Generate predictions + predictions <- sample_predictive_stan( + model_output, + time_points = c(5, 30, 90), + n_samples = 10 + ) + + # Check structure + expect_s3_class(predictions, "posterior_predictive_stan") + expect_type(predictions, "list") + expect_true( + all(c("samples", "time_points", "summary") %in% names(predictions)) + ) + + # Check samples array dimensions + # 10 samples, 3 timepoints, 1 antigen + expect_equal(dim(predictions$samples), c(10, 3, 1)) + + # Check time points + expect_equal(predictions$time_points, c(5, 30, 90)) + + # Check summary structure + expect_type(predictions$summary, "list") + # 1 antigen + expect_equal(length(predictions$summary), 1) + expect_true("HlyE_IgA" %in% names(predictions$summary)) + + # Check summary data frame + summary_df <- predictions$summary$HlyE_IgA + expect_s3_class(summary_df, "data.frame") + # 3 timepoints + expect_equal(nrow(summary_df), 3) + expect_true( + all( + c("time_point", "mean", "median", "lower_95", "upper_95") %in% + names(summary_df) + ) + ) +}) + +test_that("sample_predictive_stan works with multiple antigens", { + skip_if_not_installed("cmdstanr") + skip_if( + is.null(tryCatch(cmdstanr::cmdstan_version(), error = function(e) NULL)), + "CmdStan not installed" + ) + + withr::local_seed(2) + # Use two antigens to test multi-antigen indexing + # n=15 provides sufficient data while keeping test runtime reasonable + # Two antigens (HlyE_IgA, HlyE_IgG) test parameter extraction for each antigen + case_data <- serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 15, antigen_isos = c("HlyE_IgA", "HlyE_IgG")) + + # Fit model with posterior samples + model_output <- run_mod_stan( + data = case_data, + file_mod = serodynamics_example("model.stan"), + nchain = 2, + nadapt = 100, + niter = 10, + with_post = TRUE + ) |> + suppressWarnings() + + # Generate predictions + n_samples <- 8 + time_points <- c(10, 50) + antigen_isos <- c("HlyE_IgA", "HlyE_IgG") + + predictions <- sample_predictive_stan( + model_output, + time_points = time_points, + n_samples = n_samples + ) + + # Check structure + expect_s3_class(predictions, "posterior_predictive_stan") + + # Check samples array dimensions + # Dimensions: [n_samples, n_timepoints, n_antigens] + expect_equal( + dim(predictions$samples), + c(n_samples, length(time_points), length(antigen_isos)) + ) + + # Check time points + expect_equal(predictions$time_points, time_points) + + # Check summary structure - should have 2 antigens + expect_type(predictions$summary, "list") + expect_equal(length(predictions$summary), length(antigen_isos)) + expect_true(all(antigen_isos %in% names(predictions$summary))) + + # Check that each antigen has summary stats for each time point + for (antigen in antigen_isos) { + summary_df <- predictions$summary[[antigen]] + expect_s3_class(summary_df, "data.frame") + expect_equal(nrow(summary_df), length(time_points)) + expect_true( + all( + c("time_point", "mean", "median", "lower_95", "upper_95") %in% + names(summary_df) + ) + ) + } +}) + +test_that("sample_predictive_stan errors without posterior samples", { + skip_if_not_installed("cmdstanr") + skip_if( + is.null(tryCatch(cmdstanr::cmdstan_version(), error = function(e) NULL)), + "CmdStan not installed" + ) + + withr::local_seed(1) + case_data <- serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 20, antigen_isos = "HlyE_IgA") + + # Fit model WITHOUT posterior samples + model_output <- run_mod_stan( + data = case_data, + file_mod = serodynamics_example("model.stan"), + nchain = 2, + nadapt = 100, + niter = 10, + with_post = FALSE + ) |> + suppressWarnings() + + # Should error when trying to generate predictions + expect_error( + sample_predictive_stan(model_output), + "Posterior samples not found" + ) +}) diff --git a/vignettes/.gitignore b/vignettes/.gitignore index 5bb381a15..bc923a3b7 100644 --- a/vignettes/.gitignore +++ b/vignettes/.gitignore @@ -4,3 +4,5 @@ *_files /.quarto/ + +**/*.quarto_ipynb