Conversation
There was a problem hiding this comment.
Pull request overview
Implements the variance decomposition step of the uncertainty analysis pipeline by combining Sobol (global SA) indices with ensemble variance and local OAT sensitivity results, and adds an accompanying Quarto report for interpretation.
Changes:
- Adds an R pipeline script to compute ensemble variance, partition it by uncertainty source, and write summary CSVs and plots.
- Adds core R functions for variance partitioning, local parameter attribution, and plotting.
- Adds a Quarto report to visualize and interpret the decomposition outputs.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| scripts/031_partition_variance.R | New CLI script to run the variance decomposition workflow and write outputs/plots. |
| R/variance_decomposition.R | New implementation of ensemble variance calculation, variance partitioning, local parameter attribution, and plotting helpers. |
| analysis/variance_decomposition.qmd | New report that reads the generated CSVs and produces summary figures and narrative. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # --- config --- | ||
| cfg <- yaml::read_yaml(args$config) | ||
|
|
||
| data_dir <- cfg$default$paths$data_dir %||% "data" |
There was a problem hiding this comment.
%||% is not defined in this script (and isn’t namespaced), so this line will error at runtime. Use a namespaced fallback operator (e.g., rlang::%||%) or replace with an explicit NULL/length check before assigning data_dir.
| data_dir <- cfg$default$paths$data_dir %||% "data" | |
| data_dir <- cfg$default$paths$data_dir | |
| if (is.null(data_dir) || length(data_dir) == 0) { | |
| data_dir <- "data" | |
| } |
| if (!file.exists(path)) { | ||
| PEcAn.logger::logger.severe("File not found: ", path) | ||
| } | ||
| readr::read_csv(path, show_col_types = FALSE, ...) |
There was a problem hiding this comment.
safe_read_csv() hard-codes show_col_types = FALSE and also forwards .... If a caller passes show_col_types in ..., readr::read_csv() will receive the argument twice and error. Consider either removing the hard-coded show_col_types (documenting it instead), or only setting it when the caller did not supply it.
| readr::read_csv(path, show_col_types = FALSE, ...) | |
| dots <- list(...) | |
| if (is.null(dots$show_col_types)) { | |
| readr::read_csv(path, show_col_types = FALSE, ...) | |
| } else { | |
| readr::read_csv(path, ...) | |
| } |
| dplyr::filter( | ||
| .data$variable == var, | ||
| .data$category != "interaction" | ||
| ) |> |
There was a problem hiding this comment.
This plot excludes the interaction fraction but uses stacked bars of frac_of_total (which are fractions of the full total variance). As a result, bars will not sum to 1 and visually under-represent total variance while still being labeled as Fraction of total variance. Either include interaction in the plotted data, or switch to a normalized display (e.g., position = \"fill\") and/or adjust labels to clearly indicate interactions are excluded.
| plot_data, | ||
| ggplot2::aes(x = .data$runid, y = .data$frac_of_total, fill = .data$category) | ||
| ) + | ||
| ggplot2::geom_col(position = "stack") + |
There was a problem hiding this comment.
This plot excludes the interaction fraction but uses stacked bars of frac_of_total (which are fractions of the full total variance). As a result, bars will not sum to 1 and visually under-represent total variance while still being labeled as Fraction of total variance. Either include interaction in the plotted data, or switch to a normalized display (e.g., position = \"fill\") and/or adjust labels to clearly indicate interactions are excluded.
| ymin = .data$mean_frac - .data$sd_frac, | ||
| ymax = .data$mean_frac + .data$sd_frac |
There was a problem hiding this comment.
Error bars can extend below 0 (and above 1) for fraction data, which can produce misleading plots. Consider clamping ymin/ymax to valid bounds (e.g., ymin = pmax(0, ...), and optionally ymax = pmin(1, ...)).
| ymin = .data$mean_frac - .data$sd_frac, | |
| ymax = .data$mean_frac + .data$sd_frac | |
| ymin = pmax(0, .data$mean_frac - .data$sd_frac), | |
| ymax = pmin(1, .data$mean_frac + .data$sd_frac) |
Closes: #153
This PR implements the uncertainty partitioning / variance decomposition workflow, the final step in the uncertainty analysis pipeline. It integrates the results from the global SA (Sobol) and local SA (OAT) to decompose predictive uncertainty into its constituent sources: parameters, initial conditions, meteorological drivers, and residual / process error ( not implemented yet )
Explained in detail in the comments.
Ref: Dietze (2017), Ecological forecasting