feat: make global burden of disease data sources optional by koen-vg · Pull Request #19 · Sustainable-Solutions-Lab/GLADE

koen-vg · 2026-07-01T22:46:19Z

This PR makes global burden of disease (GBD) data optional to the GLADE workflow: only required when health burden is actually enabled in the model.

This entails changes to baseline diet construction as GBD can be used to calibrate health-relevant food group intake in order to accurately reproduce dietary health burden. Now, baseline diet can only optionally be anchored to GBD.

Default health.enabled to false so the workflow runs end to end without the manually-downloaded IHME GBD data. Add diet.anchor_groups_to_gbd (sentinel "match_health" follows health.enabled) to control GBD anchoring of the baseline diet independently of the health burden; anchoring was previously unconditional. Gate the health data-prep rules, health stores, and solve/analysis/ plotting inputs, the cluster manifest, and cluster-dependent plots on whether health is enabled (base config or any scenario) or anchoring is on. Wrap the health.smk include accordingly, guard the anchoring-off baseline-diet path, and raise a clear startup error when the GBD data is needed but absent. Pin health.enabled / anchoring on the configs that rely on it (validation, gsa, tutorials, doc figures, calibration) to preserve their behaviour. Calibration artefacts stay the GBD-anchored ones; regenerating them for the anchoring-off default is deferred (the dual-based cost/stability steps require Gurobi). Document the option, the quantitative baseline-diet differences, the refined-grain caveat, and the calibration coupling.

…it across calibration steps The provenance snapshot stored diet.anchor_groups_to_gbd verbatim, so the sentinel "match_health" -- which resolves through the solve-time-exempt health.enabled -- let two configs with different resolved anchoring (and thus different baseline diets) stamp identically. Snapshot the resolved boolean instead, via a shared resolve_gbd_anchoring() helper. The calibration step configs previously pinned anchoring on to reproduce the committed anchored artefacts, while the stamp is computed from the base config alone; the step configs also override health.enabled for solver performance, which would re-resolve the sentinel per step and mix baseline diets within one chain. Drop the pins and have tools/calibrate resolve anchoring from the base config once, pinning it in every step via a config overlay.

…erate The warm-start param pointed at the calibrated_yaml output itself, which Snakemake deletes before the job runs, so calibration always cold-started from the seeds. Point it at a side copy written next to the trace instead. When the iteration hits max_iter without converging, return the iterate with the smallest residual rather than the last one: with a discontinuous deviation response (LP basis switches near the target), the final Broyden step can land worse than an earlier iterate.

apply_kcal_normalisation unconditionally excluded the grain group from rescaling, justified by the cereal residual fix -- which only runs when whole_grains is GBD-anchored. With anchoring off, raw GDD-IA refined grain (the largest kcal group) stayed pinned while the entire kcal correction was forced onto the rest of the diet. In validation solves this produced a 246 Mt global dairy excess (demand squeezed to 585 Mt vs FAOSTAT supply ~919 Mt) and a 30 Mt rice-white shortage (demand pinned 38% above supply), and the distorted diet made the production-stability L1 calibration oscillate around a discontinuity without converging. Pin grain only when whole_grains is anchored, mirroring the residual-fix gating (and matching what docs/current_diets.rst already claimed). The anchored diet is unaffected. With the fix, the anchor-off grain composition lands within 1% of the anchored one, the dairy gap shrinks to the genuine GDD-vs-FBS disagreement (~176 Mt, absorbed by food_demand multipliers 1.23/1.31), rice-white needs no multiplier at all, and the stability calibration converges in 3 iterations.

…nchored set The committed calibration artefacts were fit against the GBD-anchored baseline diet, while the default config now runs with anchoring off -- the regeneration deferred when health was made optional. Resolve the tangle by splitting the artefacts into two git-tracked sets: - default: regenerated with tools/calibrate against the anchoring-off baseline diet (with the kcal-normalisation grain fix in place). The production-stability calibration converges in 3 iterations and the L1 centre stays close to the anchored one (cropland 1.47, grassland 0.163, feed 0.079), confirming no hidden supply/demand mismatch. - gbd-anchored: the previous artefacts, preserved bit-for-bit and stamped against the anchored structural config. The health-enabled configs (validation, gsa, gsa_fixed_diet, doc figures/validation) consume it via calibration.source, so the paper/GSA pipeline keeps the exact artefacts it was run with. Update the calibration/current-diets docs and CHANGELOG from the "regeneration deferred" story to the two-set layout.

Config and doc comments in this branch justified changes or contrasted with previous behaviour ("default flipped to false", "anchoring used to be unconditional"); rewrite them to state what the setting does now, or drop them where the setting is self-explanatory. Record the principle in AGENTS.md.

GDD-IA is developed by Marco Springmann; the manual-download checklist pointed requests at the Global Dietary Database team. Note that a public GDD-IA release is upcoming and will become the default input, with a temporary GDD-IA-free mode under development to bridge the gap.

Replace the per-input 'path if health_required() else []' conditionals in build_model, calibrate_deviation_penalty, and analyze_model with unpack() input functions, matching the existing build_model_*_input idiom. The seven health data-prep files are listed once, in solve_namespace.HEALTH_INPUT_FILES, shared by the Snakemake rules, the cluster manifest exporter, and solve_model_inputs. analyze_model's health inputs are renamed to the canonical health_-prefixed names, aligning with solve_and_analyze_model.

koen-vg added 6 commits July 1, 2026 13:54

koen-vg force-pushed the feat/optional-health-data branch from d301a1e to f127e91 Compare July 1, 2026 22:53

koen-vg added 2 commits July 1, 2026 16:03

koen-vg merged commit 4779577 into Sustainable-Solutions-Lab:main Jul 1, 2026
4 checks passed

koen-vg deleted the feat/optional-health-data branch July 1, 2026 23:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: make global burden of disease data sources optional#19

feat: make global burden of disease data sources optional#19
koen-vg merged 8 commits into
Sustainable-Solutions-Lab:mainfrom
koen-vg:feat/optional-health-data

koen-vg commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

koen-vg commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant