Skip to content

Implement variance decomposition#3

Open
divine7022 wants to merge 7 commits intoccmmf:mainfrom
divine7022:variance-decomposition
Open

Implement variance decomposition#3
divine7022 wants to merge 7 commits intoccmmf:mainfrom
divine7022:variance-decomposition

Conversation

@divine7022
Copy link
Collaborator

@divine7022 divine7022 commented Dec 8, 2025

Closes: #153

This PR implements the uncertainty partitioning / variance decomposition workflow, the final step in the uncertainty analysis pipeline. It integrates the results from the global SA (Sobol) and local SA (OAT) to decompose predictive uncertainty into its constituent sources: parameters, initial conditions, meteorological drivers, and residual / process error ( not implemented yet )

Explained in detail in the comments.

Ref: Dietze (2017), Ecological forecasting

@divine7022 divine7022 changed the title Implement variance decomposition [WIP] Implement variance decomposition Dec 8, 2025
@divine7022 divine7022 changed the title [WIP] Implement variance decomposition Implement variance decomposition Dec 11, 2025
@divine7022 divine7022 requested a review from dlebauer February 27, 2026 23:04
@dlebauer dlebauer requested a review from Copilot March 5, 2026 16:38
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the variance decomposition step of the uncertainty analysis pipeline by combining Sobol (global SA) indices with ensemble variance and local OAT sensitivity results, and adds an accompanying Quarto report for interpretation.

Changes:

  • Adds an R pipeline script to compute ensemble variance, partition it by uncertainty source, and write summary CSVs and plots.
  • Adds core R functions for variance partitioning, local parameter attribution, and plotting.
  • Adds a Quarto report to visualize and interpret the decomposition outputs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
scripts/031_partition_variance.R New CLI script to run the variance decomposition workflow and write outputs/plots.
R/variance_decomposition.R New implementation of ensemble variance calculation, variance partitioning, local parameter attribution, and plotting helpers.
analysis/variance_decomposition.qmd New report that reads the generated CSVs and produces summary figures and narrative.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# --- config ---
cfg <- yaml::read_yaml(args$config)

data_dir <- cfg$default$paths$data_dir %||% "data"
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%||% is not defined in this script (and isn’t namespaced), so this line will error at runtime. Use a namespaced fallback operator (e.g., rlang::%||%) or replace with an explicit NULL/length check before assigning data_dir.

Suggested change
data_dir <- cfg$default$paths$data_dir %||% "data"
data_dir <- cfg$default$paths$data_dir
if (is.null(data_dir) || length(data_dir) == 0) {
data_dir <- "data"
}

Copilot uses AI. Check for mistakes.
if (!file.exists(path)) {
PEcAn.logger::logger.severe("File not found: ", path)
}
readr::read_csv(path, show_col_types = FALSE, ...)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safe_read_csv() hard-codes show_col_types = FALSE and also forwards .... If a caller passes show_col_types in ..., readr::read_csv() will receive the argument twice and error. Consider either removing the hard-coded show_col_types (documenting it instead), or only setting it when the caller did not supply it.

Suggested change
readr::read_csv(path, show_col_types = FALSE, ...)
dots <- list(...)
if (is.null(dots$show_col_types)) {
readr::read_csv(path, show_col_types = FALSE, ...)
} else {
readr::read_csv(path, ...)
}

Copilot uses AI. Check for mistakes.
Comment on lines +353 to +356
dplyr::filter(
.data$variable == var,
.data$category != "interaction"
) |>
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plot excludes the interaction fraction but uses stacked bars of frac_of_total (which are fractions of the full total variance). As a result, bars will not sum to 1 and visually under-represent total variance while still being labeled as Fraction of total variance. Either include interaction in the plotted data, or switch to a normalized display (e.g., position = \"fill\") and/or adjust labels to clearly indicate interactions are excluded.

Copilot uses AI. Check for mistakes.
plot_data,
ggplot2::aes(x = .data$runid, y = .data$frac_of_total, fill = .data$category)
) +
ggplot2::geom_col(position = "stack") +
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plot excludes the interaction fraction but uses stacked bars of frac_of_total (which are fractions of the full total variance). As a result, bars will not sum to 1 and visually under-represent total variance while still being labeled as Fraction of total variance. Either include interaction in the plotted data, or switch to a normalized display (e.g., position = \"fill\") and/or adjust labels to clearly indicate interactions are excluded.

Copilot uses AI. Check for mistakes.
Comment on lines +406 to +407
ymin = .data$mean_frac - .data$sd_frac,
ymax = .data$mean_frac + .data$sd_frac
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error bars can extend below 0 (and above 1) for fraction data, which can produce misleading plots. Consider clamping ymin/ymax to valid bounds (e.g., ymin = pmax(0, ...), and optionally ymax = pmin(1, ...)).

Suggested change
ymin = .data$mean_frac - .data$sd_frac,
ymax = .data$mean_frac + .data$sd_frac
ymin = pmax(0, .data$mean_frac - .data$sd_frac),
ymax = pmin(1, .data$mean_frac + .data$sd_frac)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Variance decomposition

2 participants