Skip to content

feat: make global burden of disease data sources optional#19

Merged
koen-vg merged 8 commits into
Sustainable-Solutions-Lab:mainfrom
koen-vg:feat/optional-health-data
Jul 1, 2026
Merged

feat: make global burden of disease data sources optional#19
koen-vg merged 8 commits into
Sustainable-Solutions-Lab:mainfrom
koen-vg:feat/optional-health-data

Conversation

@koen-vg

@koen-vg koen-vg commented Jul 1, 2026

Copy link
Copy Markdown
Member

This PR makes global burden of disease (GBD) data optional to the GLADE workflow: only required when health burden is actually enabled in the model.

This entails changes to baseline diet construction as GBD can be used to calibrate health-relevant food group intake in order to accurately reproduce dietary health burden. Now, baseline diet can only optionally be anchored to GBD.

koen-vg added 6 commits July 1, 2026 13:54
Default health.enabled to false so the workflow runs end to end without
the manually-downloaded IHME GBD data. Add diet.anchor_groups_to_gbd
(sentinel "match_health" follows health.enabled) to control GBD anchoring
of the baseline diet independently of the health burden; anchoring was
previously unconditional.

Gate the health data-prep rules, health stores, and solve/analysis/
plotting inputs, the cluster manifest, and cluster-dependent plots on
whether health is enabled (base config or any scenario) or anchoring is
on. Wrap the health.smk include accordingly, guard the anchoring-off
baseline-diet path, and raise a clear startup error when the GBD data is
needed but absent.

Pin health.enabled / anchoring on the configs that rely on it (validation,
gsa, tutorials, doc figures, calibration) to preserve their behaviour.
Calibration artefacts stay the GBD-anchored ones; regenerating them for
the anchoring-off default is deferred (the dual-based cost/stability steps
require Gurobi). Document the option, the quantitative baseline-diet
differences, the refined-grain caveat, and the calibration coupling.
…it across calibration steps

The provenance snapshot stored diet.anchor_groups_to_gbd verbatim, so the
sentinel "match_health" -- which resolves through the solve-time-exempt
health.enabled -- let two configs with different resolved anchoring (and
thus different baseline diets) stamp identically. Snapshot the resolved
boolean instead, via a shared resolve_gbd_anchoring() helper.

The calibration step configs previously pinned anchoring on to reproduce
the committed anchored artefacts, while the stamp is computed from the
base config alone; the step configs also override health.enabled for
solver performance, which would re-resolve the sentinel per step and mix
baseline diets within one chain. Drop the pins and have tools/calibrate
resolve anchoring from the base config once, pinning it in every step via
a config overlay.
…erate

The warm-start param pointed at the calibrated_yaml output itself, which
Snakemake deletes before the job runs, so calibration always cold-started
from the seeds. Point it at a side copy written next to the trace instead.

When the iteration hits max_iter without converging, return the iterate
with the smallest residual rather than the last one: with a discontinuous
deviation response (LP basis switches near the target), the final Broyden
step can land worse than an earlier iterate.
apply_kcal_normalisation unconditionally excluded the grain group from
rescaling, justified by the cereal residual fix -- which only runs when
whole_grains is GBD-anchored. With anchoring off, raw GDD-IA refined
grain (the largest kcal group) stayed pinned while the entire kcal
correction was forced onto the rest of the diet. In validation solves
this produced a 246 Mt global dairy excess (demand squeezed to 585 Mt vs
FAOSTAT supply ~919 Mt) and a 30 Mt rice-white shortage (demand pinned 38%
above supply), and the distorted diet made the production-stability L1
calibration oscillate around a discontinuity without converging.

Pin grain only when whole_grains is anchored, mirroring the residual-fix
gating (and matching what docs/current_diets.rst already claimed). The
anchored diet is unaffected. With the fix, the anchor-off grain
composition lands within 1% of the anchored one, the dairy gap shrinks to
the genuine GDD-vs-FBS disagreement (~176 Mt, absorbed by food_demand
multipliers 1.23/1.31), rice-white needs no multiplier at all, and the
stability calibration converges in 3 iterations.
…nchored set

The committed calibration artefacts were fit against the GBD-anchored
baseline diet, while the default config now runs with anchoring off --
the regeneration deferred when health was made optional. Resolve the
tangle by splitting the artefacts into two git-tracked sets:

- default: regenerated with tools/calibrate against the anchoring-off
  baseline diet (with the kcal-normalisation grain fix in place). The
  production-stability calibration converges in 3 iterations and the L1
  centre stays close to the anchored one (cropland 1.47, grassland 0.163,
  feed 0.079), confirming no hidden supply/demand mismatch.
- gbd-anchored: the previous artefacts, preserved bit-for-bit and stamped
  against the anchored structural config. The health-enabled configs
  (validation, gsa, gsa_fixed_diet, doc figures/validation) consume it
  via calibration.source, so the paper/GSA pipeline keeps the exact
  artefacts it was run with.

Update the calibration/current-diets docs and CHANGELOG from the
"regeneration deferred" story to the two-set layout.
Config and doc comments in this branch justified changes or contrasted
with previous behaviour ("default flipped to false", "anchoring used to
be unconditional"); rewrite them to state what the setting does now, or
drop them where the setting is self-explanatory. Record the principle in
AGENTS.md.
@koen-vg koen-vg force-pushed the feat/optional-health-data branch from d301a1e to f127e91 Compare July 1, 2026 22:53
koen-vg added 2 commits July 1, 2026 16:03
GDD-IA is developed by Marco Springmann; the manual-download checklist
pointed requests at the Global Dietary Database team. Note that a public
GDD-IA release is upcoming and will become the default input, with a
temporary GDD-IA-free mode under development to bridge the gap.
Replace the per-input 'path if health_required() else []' conditionals in
build_model, calibrate_deviation_penalty, and analyze_model with unpack()
input functions, matching the existing build_model_*_input idiom. The
seven health data-prep files are listed once, in
solve_namespace.HEALTH_INPUT_FILES, shared by the Snakemake rules, the
cluster manifest exporter, and solve_model_inputs.

analyze_model's health inputs are renamed to the canonical
health_-prefixed names, aligning with solve_and_analyze_model.
@koen-vg koen-vg merged commit 4779577 into Sustainable-Solutions-Lab:main Jul 1, 2026
4 checks passed
@koen-vg koen-vg deleted the feat/optional-health-data branch July 1, 2026 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant