Sustainable-Solutions-Lab · koen-vg · Jul 1, 2026 · Jul 1, 2026
diff --git a/.claude/skills/model-calibration/SKILL.md b/.claude/skills/model-calibration/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: model-calibration
-description: Run, refresh, or diagnose the model's calibration pipeline (feed -> food_waste -> food_demand -> cost -> stability) that produces the git-tracked artefacts under `data/curated/calibration/`. Covers the dependency order, the `tools/calibrate` wrapper, realistic runtime expectations, when each kind of upstream change forces a re-run, and how to diagnose the most common failure mode: a hidden supply/demand mismatch that inflates the production-stability L1 cost. Use whenever calibration is relevant -- the user touches inputs/build logic that feed the calibration solves, calibration artefacts look off, or a refresh of the artefacts is needed after a model/data change.
+description: Run, refresh, or diagnose the model's calibration pipeline (feed -> food_waste -> food_demand -> cost -> stability) that produces the per-config artefact sets under `data/curated/calibration/<source>/` (the default set is git-tracked). Covers the dependency order, the `tools/calibrate` wrapper, realistic runtime expectations, when each kind of upstream change forces a re-run, and how to diagnose the most common failure mode: a hidden supply/demand mismatch that inflates the production-stability L1 cost. Use whenever calibration is relevant -- the user touches inputs/build logic that feed the calibration solves, calibration artefacts look off, or a refresh of the artefacts is needed after a model/data change.
 ---
 
 <!--
@@ -11,12 +11,19 @@ SPDX-License-Identifier: CC-BY-4.0
 
 # Model Calibration
 
-The default workflow consumes five git-tracked calibration artefacts under
-`data/curated/calibration/`. Each is produced by a dedicated validation-mode
-solve and absorbs a specific class of residual mismatch so that ordinary
-solves don't have to. Without these files in place, production-stability,
+The default workflow consumes five calibration artefact groups organized
+in per-config *sets* under `data/curated/calibration/<source>/`, selected
+by the `calibration.source` config key (the `default` set is
+git-tracked). Each is produced by a dedicated validation-mode solve and
+absorbs a specific class of residual mismatch so that ordinary solves
+don't have to. Without these files in place, production-stability,
 costs, and food/feed accounting drift from observed 2020 reality.
 
+Every set carries a `provenance.yaml` stamp of the structural config it
+was fit against; workflow runs error at DAG time when their config
+differs structurally from the consumed set's stamp (see "Artefact sets
+and provenance" below).
+
 Authoritative reference: `docs/calibration.rst`. This skill is the operational
 companion: when to run, how to run, what to expect, what to watch out for.
 
@@ -58,12 +65,20 @@ tools/calibrate food_waste
 tools/calibrate food_demand
 tools/calibrate cost
 tools/calibrate stability
-tools/calibrate --check      # per-step staleness probe (dry-run, no execution)
+tools/calibrate --check      # per-step staleness + provenance probe (no execution)
+tools/calibrate --base config/<name>.yaml [all|<step>|--check]
+                             # calibrate a dedicated set for another config
 ```
 
 The wrapper defaults to `pixi -e gurobi` -- all calibration configs use
 Gurobi. HiGHS is too slow here. Override with `CALIBRATE_PIXI_ENV=<env>`.
 
+With `--base`, the base config must declare its own `calibration.source`
+(refusing to overwrite the shared `default` set); a fresh set is seeded
+from `default` and regenerated in order, and the `all` chain uses
+`name: calibration-<source>` so processing trees don't thrash. After any
+successful run the set is (re)stamped with `provenance.yaml`.
+
 Pass extra flags through positionally:
 
 ```bash
@@ -153,13 +168,21 @@ sequential (each Broyden iteration depends on the previous solve).
 
 ## Output landing zones
 
-- `data/curated/calibration/*` -- the five artefacts, **git-tracked**. Commit them together as a refresh; mixed-vintage artefacts are the most common cause of confusing downstream solves.
-- `processing/calibration/*` -- shared upstream prep, NOT committed.
+- `data/curated/calibration/<source>/*` -- one artefact set per base config, plus its `provenance.yaml` stamp; the `default` set is **git-tracked**. Commit a set together as a refresh; mixed-vintage artefacts are the most common cause of confusing downstream solves.
+- `processing/calibration/*` (or `processing/calibration-<source>/*` for non-default bases) -- shared upstream prep, NOT committed.
 - `results/calibration/*` -- per-iteration solve logs, NOT committed.
 - `results/calibration/calibration/deviation_penalty_trace.csv` -- per-iter Broyden trace (per-component lambda, achieved deviations, residual norm). Inspect when stability behaves oddly.
 
+## Artefact sets and provenance
+
+- A config selects its set with `calibration.source` (default: `default`); all artefact paths resolve through the `{calibration_source}` placeholder at config-load time.
+- Structurally divergent configs must either calibrate their own set (`calibration.source: <name>` + `tools/calibrate --base config/<name>.yaml`), point at a compatible set, or set `calibration.accept_provenance_mismatch: true` (test/tutorial-grade escape hatch: warning instead of error).
+- The provenance check covers config drift only; code/data staleness remains `tools/calibrate --check`'s job. Both run from `tools/calibrate --check`.
+- The stamp compares all non-solve-time leaves minus exempt machinery keys (see `PROVENANCE_EXEMPT_PREFIXES` in `workflow/validation/calibration_provenance.py`). Solve-time knobs (GHG price, value_per_yll, deviation_penalty, scenario overrides) never trip it.
+- `tests/test_calibration_provenance.py::TestDefaultStampConsistency` fails when `config/default.yaml` changes structurally without a recalibration/restamp -- that is the intended forcing function.
+
 The currently calibrated L1 centre lives in
-`data/curated/calibration/deviation_penalty.yaml` under
+`data/curated/calibration/default/deviation_penalty.yaml` under
 `l1_costs.<component>`. Solves that set
 `deviation_penalty.{land,feed,diet}.l1_cost: "calibrated"` resolve the
 sentinel from this file at solve time. Per-component
@@ -289,7 +312,7 @@ tools/smk --configfile config/validation.yaml -- \
     results/validation/solved/model_scen-default.nc
 
 # Current calibrated L1 centre
-cat data/curated/calibration/deviation_penalty.yaml
+cat data/curated/calibration/default/deviation_penalty.yaml
 
 # Per-iter Broyden trace (after a stability run)
 cat results/calibration/calibration/deviation_penalty_trace.csv

diff --git a/AGENTS.md b/AGENTS.md
@@ -325,14 +325,15 @@ pixi run -e dev pytest -v         # verbose output
 
 ## Calibration
 
-Five calibrations feed the default workflow. Their outputs live under
-`data/curated/calibration/` and are git-tracked; builds depend on them.
-When upstream data or build logic changes materially, regenerate in
-this order:
+Five calibrations feed the default workflow. Their outputs are organized
+in per-config artefact *sets* under `data/curated/calibration/<source>/`
+(selected by the `calibration.source` config key; the `default` set is
+git-tracked) and builds depend on them. When upstream data or build
+logic changes materially, regenerate in this order:
 
 1. **feed** — `config/calibration/feed.yaml` → `grassland_yield.csv`,
    `fodder_conversion.csv`, `exogenous_forage.csv`,
-   `exogenous_protein.csv`.
+   `exogenous_feed.csv`.
 2. **food_waste** — `config/calibration/food_waste.yaml` →
    `food_waste.yaml` (per-food-group consumer-side waste multipliers).
 3. **food_demand** — `config/calibration/food_demand.yaml` →
@@ -354,7 +355,15 @@ Single entrypoint: `tools/calibrate` (`all` by default; `feed`,
 `food_waste`, `food_demand`, `cost`, `stability`, or `--check` for
 staleness). `tools/smk` prints a one-line reminder when
 `data/curated/` inputs are newer than the oldest calibration artefact.
-See `docs/calibration.rst` for the full story.
+
+Each artefact set carries a `provenance.yaml` stamp of the structural
+config it was fit against (written by `tools/calibrate`); every workflow
+run checks its config against the stamp of the set it consumes and
+errors on structural mismatch. Configs with different structural
+assumptions must declare their own `calibration.source` and run
+`tools/calibrate --base config/<name>.yaml`, or set
+`calibration.accept_provenance_mismatch: true` (test/tutorial configs
+only). See `docs/calibration.rst` for the full story.
 
 ## Configuration Validation
 

diff --git a/REUSE.toml b/REUSE.toml
@@ -38,6 +38,14 @@ precedence = "aggregate"
 SPDX-FileCopyrightText = "2026 Koen van Greevenbroek"
 SPDX-License-Identifier = "PDDL-1.0"
 
+# Machine-generated calibration provenance stamps (written by
+# workflow/scripts/write_calibration_provenance.py via tools/calibrate).
+[[annotations]]
+path = "data/curated/calibration/**/provenance.yaml"
+precedence = "aggregate"
+SPDX-FileCopyrightText = "2026 Koen van Greevenbroek"
+SPDX-License-Identifier = "CC-BY-4.0"
+
 # Root LICENSE copy so GitHub detects the project license (it only scans the
 # repository root, not LICENSES/). The canonical text lives in LICENSES/.
 [[annotations]]

diff --git a/config/default.yaml b/config/default.yaml
@@ -51,6 +51,23 @@ paths:
   logs_root: "logs"
   benchmarks_root: "benchmarks"
 
+# --- section: calibration ---
+# Which calibration artefact set to use. The five calibration artefact
+# groups (feed, food_waste, food_demand, cost, stability) live under
+# ``data/curated/calibration/<source>/``; every artefact path below that
+# contains the ``{calibration_source}`` placeholder resolves against this
+# key at config-load time. A config whose structural assumptions differ
+# from the set it consumes must either regenerate its own set
+# (``tools/calibrate --base <config>`` with its own ``source`` name) or
+# point ``source`` at a compatible existing set. Compatibility is checked
+# at workflow start against the set's ``provenance.yaml``.
+calibration:
+  source: "default"
+  # Downgrade a provenance mismatch from an error to a warning. Only for
+  # configs that knowingly reuse a set calibrated under different
+  # structural assumptions (e.g. coarse test/tutorial resolutions).
+  accept_provenance_mismatch: false
+
 # --- section: netcdf ---
 # NetCDF export settings for PyPSA network files (build and solve outputs)
 netcdf:
@@ -165,7 +182,7 @@ deviation_penalty:
     tolerance: 0.02
     max_iter: 8
     trust_region_log: 0.693  # log(2): caps |dx|_inf in log-coords per iteration
-    calibrated_yaml: "data/curated/calibration/deviation_penalty.yaml"
+    calibrated_yaml: "data/curated/calibration/{calibration_source}/deviation_penalty.yaml"
     trace_csv: "<results>/{name}/calibration/deviation_penalty_trace.csv"
     seeds:
       cropland: 1.0   # crop deviation reaches the 5% target near L1~1.1
@@ -656,9 +673,9 @@ grazing:
   grassland_forage_calibration:
     enabled: true
     generate: false
-    grassland_yield_correction: "data/curated/calibration/grassland_yield.csv"
-    fodder_conversion_correction: "data/curated/calibration/fodder_conversion.csv"
-    exogenous_forage: "data/curated/calibration/exogenous_forage.csv"
+    grassland_yield_correction: "data/curated/calibration/{calibration_source}/grassland_yield.csv"
+    fodder_conversion_correction: "data/curated/calibration/{calibration_source}/fodder_conversion.csv"
+    exogenous_forage: "data/curated/calibration/{calibration_source}/exogenous_forage.csv"
     scenario: "default"
 
 # Protein-feed calibration: per-country exogenous monogastric/ruminant
@@ -673,7 +690,7 @@ grazing:
 exogenous_feed_calibration:
   enabled: true
   generate: false
-  exogenous_feed: "data/curated/calibration/exogenous_feed.csv"
+  exogenous_feed: "data/curated/calibration/{calibration_source}/exogenous_feed.csv"
   scenario: "default"
 
 # Food waste calibration: a per-food-group multiplier on (1 - waste_fraction)
@@ -685,7 +702,7 @@ exogenous_feed_calibration:
 food_loss_waste_calibration:
   enabled: true
   generate: false
-  calibration_file: "data/curated/calibration/food_waste.yaml"
+  calibration_file: "data/curated/calibration/{calibration_source}/food_waste.yaml"
   food_groups:
   # Groups with documented FBS-vs-GDD gap that the SDG-based defaults
   # under- or over-state. The SDG global all-foods 10% waste rate fits
@@ -716,7 +733,7 @@ food_loss_waste_calibration:
 food_demand_calibration:
   enabled: true
   generate: false
-  calibration_file: "data/curated/calibration/food_demand.csv"
+  calibration_file: "data/curated/calibration/{calibration_source}/food_demand.csv"
   # Bounds on the per-food multiplier. Tight on purpose: anything that
   # would fall outside flags a structural data issue worth investigating
   # rather than being silently absorbed.
@@ -1687,9 +1704,9 @@ cost_calibration:
   enabled: true       # Apply calibration corrections to production costs
   generate: false     # Generate calibration from solved model (breaks DAG cycle when true)
   scenario: "calibration"  # Scenario name used for calibration solve
-  crop_correction_csv: "data/curated/calibration/crop_cost.csv"
-  grassland_correction_csv: "data/curated/calibration/grassland_cost.csv"
-  animal_correction_csv: "data/curated/calibration/animal_cost.csv"
+  crop_correction_csv: "data/curated/calibration/{calibration_source}/crop_cost.csv"
+  grassland_correction_csv: "data/curated/calibration/{calibration_source}/grassland_cost.csv"
+  animal_correction_csv: "data/curated/calibration/{calibration_source}/animal_cost.csv"
 
 # --- section: solving ---
 solving:

diff --git a/config/schemas/config.schema.yaml b/config/schemas/config.schema.yaml
@@ -14,6 +14,7 @@ required:
   - currency_base_year
   - downloads
   - paths
+  - calibration
   - netcdf
   - validation
   - consumer_values
@@ -122,6 +123,20 @@ properties:
         minLength: 1
         description: "Root directory for Snakemake benchmark TSV files"
 
+  calibration:
+    type: object
+    required: [source, accept_provenance_mismatch]
+    additionalProperties: false
+    description: "Selection of the calibration artefact set under data/curated/calibration/<source>/"
+    properties:
+      source:
+        type: string
+        pattern: "^[a-zA-Z0-9_-]+$"
+        description: "Name of the calibration artefact set to read (and write, for generation runs)"
+      accept_provenance_mismatch:
+        type: boolean
+        description: "Downgrade a calibration provenance mismatch from an error to a warning"
+
   netcdf:
     type: object
     description: "NetCDF export settings for PyPSA network files"

diff --git a/config/tutorial/01_ghg_prices.yaml b/config/tutorial/01_ghg_prices.yaml
@@ -13,6 +13,11 @@
 
 name: "tutorial_01"
 
+# Tutorials knowingly reuse the default calibration artefacts at a coarser
+# regional resolution.
+calibration:
+  accept_provenance_mismatch: true
+
 # Reduced spatial resolution so the tutorial completes in a few minutes on a
 # laptop after the one-off data download. 200 is the smallest value the
 # default country list admits without enabling cross-border clustering.

diff --git a/config/tutorial/02_consumer_values.yaml b/config/tutorial/02_consumer_values.yaml
@@ -24,6 +24,11 @@
 
 name: "tutorial_02"
 
+# Tutorials knowingly reuse the default calibration artefacts at a coarser
+# regional resolution.
+calibration:
+  accept_provenance_mismatch: true
+
 # See config/tutorial/01_ghg_prices.yaml for the rationale on target_count.
 aggregation:
   regions:

diff --git a/data/curated/calibration/animal_cost.csv → ...rated/calibration/default/animal_cost.csv b/data/curated/calibration/animal_cost.csv → ...rated/calibration/default/animal_cost.csv
diff --git a/data/curated/calibration/crop_cost.csv → ...curated/calibration/default/crop_cost.csv b/data/curated/calibration/crop_cost.csv → ...curated/calibration/default/crop_cost.csv
diff --git a/...urated/calibration/deviation_penalty.yaml → ...alibration/default/deviation_penalty.yaml b/...urated/calibration/deviation_penalty.yaml → ...alibration/default/deviation_penalty.yaml
diff --git a/data/curated/calibration/exogenous_feed.csv → ...ed/calibration/default/exogenous_feed.csv b/data/curated/calibration/exogenous_feed.csv → ...ed/calibration/default/exogenous_feed.csv
diff --git a/.../curated/calibration/exogenous_forage.csv → .../calibration/default/exogenous_forage.csv b/.../curated/calibration/exogenous_forage.csv → .../calibration/default/exogenous_forage.csv
diff --git a/...curated/calibration/fodder_conversion.csv → ...calibration/default/fodder_conversion.csv b/...curated/calibration/fodder_conversion.csv → ...calibration/default/fodder_conversion.csv
diff --git a/data/curated/calibration/food_demand.csv → ...rated/calibration/default/food_demand.csv b/data/curated/calibration/food_demand.csv → ...rated/calibration/default/food_demand.csv
diff --git a/data/curated/calibration/food_waste.yaml → ...rated/calibration/default/food_waste.yaml b/data/curated/calibration/food_waste.yaml → ...rated/calibration/default/food_waste.yaml
diff --git a/data/curated/calibration/grassland_cost.csv → ...ed/calibration/default/grassland_cost.csv b/data/curated/calibration/grassland_cost.csv → ...ed/calibration/default/grassland_cost.csv
diff --git a/data/curated/calibration/grassland_yield.csv → ...d/calibration/default/grassland_yield.csv b/data/curated/calibration/grassland_yield.csv → ...d/calibration/default/grassland_yield.csv