feat: make global burden of disease data sources optional#19
Merged
koen-vg merged 8 commits intoJul 1, 2026
Merged
Conversation
Default health.enabled to false so the workflow runs end to end without the manually-downloaded IHME GBD data. Add diet.anchor_groups_to_gbd (sentinel "match_health" follows health.enabled) to control GBD anchoring of the baseline diet independently of the health burden; anchoring was previously unconditional. Gate the health data-prep rules, health stores, and solve/analysis/ plotting inputs, the cluster manifest, and cluster-dependent plots on whether health is enabled (base config or any scenario) or anchoring is on. Wrap the health.smk include accordingly, guard the anchoring-off baseline-diet path, and raise a clear startup error when the GBD data is needed but absent. Pin health.enabled / anchoring on the configs that rely on it (validation, gsa, tutorials, doc figures, calibration) to preserve their behaviour. Calibration artefacts stay the GBD-anchored ones; regenerating them for the anchoring-off default is deferred (the dual-based cost/stability steps require Gurobi). Document the option, the quantitative baseline-diet differences, the refined-grain caveat, and the calibration coupling.
…it across calibration steps The provenance snapshot stored diet.anchor_groups_to_gbd verbatim, so the sentinel "match_health" -- which resolves through the solve-time-exempt health.enabled -- let two configs with different resolved anchoring (and thus different baseline diets) stamp identically. Snapshot the resolved boolean instead, via a shared resolve_gbd_anchoring() helper. The calibration step configs previously pinned anchoring on to reproduce the committed anchored artefacts, while the stamp is computed from the base config alone; the step configs also override health.enabled for solver performance, which would re-resolve the sentinel per step and mix baseline diets within one chain. Drop the pins and have tools/calibrate resolve anchoring from the base config once, pinning it in every step via a config overlay.
…erate The warm-start param pointed at the calibrated_yaml output itself, which Snakemake deletes before the job runs, so calibration always cold-started from the seeds. Point it at a side copy written next to the trace instead. When the iteration hits max_iter without converging, return the iterate with the smallest residual rather than the last one: with a discontinuous deviation response (LP basis switches near the target), the final Broyden step can land worse than an earlier iterate.
apply_kcal_normalisation unconditionally excluded the grain group from rescaling, justified by the cereal residual fix -- which only runs when whole_grains is GBD-anchored. With anchoring off, raw GDD-IA refined grain (the largest kcal group) stayed pinned while the entire kcal correction was forced onto the rest of the diet. In validation solves this produced a 246 Mt global dairy excess (demand squeezed to 585 Mt vs FAOSTAT supply ~919 Mt) and a 30 Mt rice-white shortage (demand pinned 38% above supply), and the distorted diet made the production-stability L1 calibration oscillate around a discontinuity without converging. Pin grain only when whole_grains is anchored, mirroring the residual-fix gating (and matching what docs/current_diets.rst already claimed). The anchored diet is unaffected. With the fix, the anchor-off grain composition lands within 1% of the anchored one, the dairy gap shrinks to the genuine GDD-vs-FBS disagreement (~176 Mt, absorbed by food_demand multipliers 1.23/1.31), rice-white needs no multiplier at all, and the stability calibration converges in 3 iterations.
…nchored set The committed calibration artefacts were fit against the GBD-anchored baseline diet, while the default config now runs with anchoring off -- the regeneration deferred when health was made optional. Resolve the tangle by splitting the artefacts into two git-tracked sets: - default: regenerated with tools/calibrate against the anchoring-off baseline diet (with the kcal-normalisation grain fix in place). The production-stability calibration converges in 3 iterations and the L1 centre stays close to the anchored one (cropland 1.47, grassland 0.163, feed 0.079), confirming no hidden supply/demand mismatch. - gbd-anchored: the previous artefacts, preserved bit-for-bit and stamped against the anchored structural config. The health-enabled configs (validation, gsa, gsa_fixed_diet, doc figures/validation) consume it via calibration.source, so the paper/GSA pipeline keeps the exact artefacts it was run with. Update the calibration/current-diets docs and CHANGELOG from the "regeneration deferred" story to the two-set layout.
Config and doc comments in this branch justified changes or contrasted
with previous behaviour ("default flipped to false", "anchoring used to
be unconditional"); rewrite them to state what the setting does now, or
drop them where the setting is self-explanatory. Record the principle in
AGENTS.md.
d301a1e to
f127e91
Compare
GDD-IA is developed by Marco Springmann; the manual-download checklist pointed requests at the Global Dietary Database team. Note that a public GDD-IA release is upcoming and will become the default input, with a temporary GDD-IA-free mode under development to bridge the gap.
Replace the per-input 'path if health_required() else []' conditionals in build_model, calibrate_deviation_penalty, and analyze_model with unpack() input functions, matching the existing build_model_*_input idiom. The seven health data-prep files are listed once, in solve_namespace.HEALTH_INPUT_FILES, shared by the Snakemake rules, the cluster manifest exporter, and solve_model_inputs. analyze_model's health inputs are renamed to the canonical health_-prefixed names, aligning with solve_and_analyze_model.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR makes global burden of disease (GBD) data optional to the GLADE workflow: only required when health burden is actually enabled in the model.
This entails changes to baseline diet construction as GBD can be used to calibrate health-relevant food group intake in order to accurately reproduce dietary health burden. Now, baseline diet can only optionally be anchored to GBD.