Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 2 additions & 5 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,7 @@ jobs:
"$d/IHME_GBD_2019_RELATIVE_RISKS_Y2020M10D15.XLSX"

# Validates the full Snakemake DAG without executing any rule (see
# tests/test_integration.py::test_workflow_dryrun). The credential gate
# at workflow startup is presence-only, so dummy values suffice; no rule
# that would actually call these APIs is ever run.
# tests/test_integration.py::test_workflow_dryrun). No credentials are
# needed: the DAG resolves from the staged source files alone.
- name: Snakemake dryrun test
env:
USDA_API_KEY: dummy
run: pixi run --environment dev pytest -m integration -k dryrun -v
43 changes: 6 additions & 37 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -304,8 +304,8 @@ pixi run -e dev pytest -v # verbose output

### Notes

- The **dryrun test** (`test_workflow_dryrun`) validates full DAG construction with `forceall=True` without executing any rule. It makes no API calls, but the startup credential gate (presence-only, so dummy values work) and the manually-downloaded source files must still be satisfied for the DAG to resolve. See `.github/workflows/test.yml` for how CI stubs both.
- The **execution test** (`test_build_solve_analyze`) runs the actual pipeline and requires USDA/ECMWF credentials for data downloads on first run.
- The **dryrun test** (`test_workflow_dryrun`) validates full DAG construction with `forceall=True` without executing any rule. It makes no API calls; the manually-downloaded source files must be present for the DAG to resolve. See `.github/workflows/test.yml` for how CI stages them.
- The **execution test** (`test_build_solve_analyze`) runs the actual pipeline and downloads public input data on first run (network access required).
- Tests never delete `results/test/` or `.snakemake/`; Snakemake detects up-to-date outputs and skips them automatically. Subsequent runs are near-instant when code hasn't changed.
- New unit tests go in `tests/test_*.py` alongside integration tests.

Expand Down Expand Up @@ -370,43 +370,12 @@ The project uses automatic configuration validation via JSON Schema to ensure al

## Secrets Management

API credentials for external data sources (USDA, ECMWF) are managed separately from the main configuration to avoid committing secrets to version control.
API credentials are kept out of the main configuration and out of version control, and each is tied to one specific task.

### Setup Options
Credentials can be supplied either in `config/secrets.yaml` (copy `config/secrets.yaml.example`; the file is gitignored) or via environment variables, which take precedence. They are loaded in `workflow/validation/secrets.py` and merged into `config["credentials"]`.

**Option 1: Secrets File (Recommended for local development)**

1. Copy the template:
```bash
cp config/secrets.yaml.example config/secrets.yaml
```

2. Edit `config/secrets.yaml` and fill in your API credentials:
- **USDA API key**: Get from https://fdc.nal.usda.gov/api-guide.html
- **ECMWF credentials**: Get from https://cds.climate.copernicus.eu/api-how-to
- Register at https://cds.climate.copernicus.eu/user/register
- Accept dataset licenses at https://cds.climate.copernicus.eu/datasets/satellite-land-cover
- Get your UID and API key from your profile page

3. The file is excluded from git - never commit real credentials!

**Option 2: Environment Variables (Recommended for CI/CD)**

Set these environment variables before running the workflow:

```bash
export USDA_API_KEY="your-usda-api-key"
export ECMWF_DATASTORES_URL="https://cds.climate.copernicus.eu/api"
export ECMWF_DATASTORES_KEY="your-ecmwf-key"
```

### Precedence

Environment variables take precedence over the secrets file. This allows you to override file-based credentials in CI/CD or testing environments.

### Validation

The workflow validates that all required credentials are present at startup (before any rules execute). If credentials are missing, you'll see a clear error message with instructions on how to configure them.
- **USDA FoodData Central key** (`USDA_API_KEY`, or `credentials.usda.api_key`): the one build-time credential, read by the `retrieve_usda_nutrition` rule when `data.usda.retrieve_nutrition: true`. That rule raises a clear error if the key is absent. Get a free key at https://fdc.nal.usda.gov/api-guide.html.
- **Copernicus CDS credentials** (`ECMWF_DATASTORES_URL` / `ECMWF_DATASTORES_KEY`) and **Zenodo token** (`ZENODO_TOKEN`): used by `tools/mirror_land_cover.py` to refresh the land-cover data mirrored on Zenodo, which regular builds fetch from that mirror.

## When Implementing Changes

Expand Down
11 changes: 3 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,19 +41,14 @@ pixi install

### Setup (required before first run)

1. **API credentials**: Copy and configure the secrets file:
```bash
cp config/secrets.yaml.example config/secrets.yaml
# Edit config/secrets.yaml with your ECMWF Climate Data Store credentials
# Get credentials at: https://cds.climate.copernicus.eu/user/register
```

2. **Manual downloads**: Three datasets require free registration and manual download:
**Manual downloads**: Three datasets require free registration and manual download:
- IHME GBD mortality rates and relative risks (https://vizhub.healthdata.org/)
- Global Dietary Database (https://globaldietarydatabase.org/)

See the [Data Sources documentation](https://sustainable-solutions-lab.github.io/GLADE/data_sources.html#manual-download-checklist) for detailed instructions. Place files in `data/manually_downloaded/`.

A free USDA FoodData Central key is needed only to refresh nutritional data (`data.usda.retrieve_nutrition: true`); see [`config/secrets.yaml.example`](config/secrets.yaml.example).

### Run the model

```bash
Expand Down
8 changes: 5 additions & 3 deletions config/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1554,9 +1554,11 @@ data:
water_requirement_var: "RES05-WDC" # Water deficit/net irrigation requirement during crop cycle, current cropland
suitability_var: "RES05-SX1" # Share of grid cell assessed as VS or S (very suitable or suitable)
usda:
# API credentials: configure in config/secrets.yaml or via USDA_API_KEY environment variable
# See config/secrets.yaml.example for setup instructions
retrieve_nutrition: true # Set to true to fetch nutrition data from USDA instead of using the provided data
# When false, the build uses the bundled data/curated/nutrition.csv. When
# true, nutrition data is fetched from USDA FoodData Central (relevant after
# adding a food) and a USDA API key is required, via USDA_API_KEY or
# config/secrets.yaml (see config/secrets.yaml.example).
retrieve_nutrition: false
# Nutrient mapping: internal name -> USDA FoodData Central name
# USDA names must match nutrient names in FoodData Central exactly
nutrients:
Expand Down
5 changes: 1 addition & 4 deletions config/schemas/config.schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,6 @@ required:
- solving
- remote_solve
- plotting
- credentials
additionalProperties: false

properties:
Expand Down Expand Up @@ -2166,13 +2165,11 @@ properties:

credentials:
type: object
required: [usda]
additionalProperties: false
description: "API credentials required by the build (configure via config/secrets.yaml or environment variables). Copernicus CDS credentials are NOT needed for builds; they are only used by tools/mirror_land_cover.py to refresh the mirrored land-cover data."
description: "Optional build-time API credentials, injected at runtime from config/secrets.yaml or environment variables. The one build-time credential is the USDA key, read by the retrieve_usda_nutrition rule (data.usda.retrieve_nutrition: true); the Copernicus CDS credentials are maintainer-only (tools/mirror_land_cover.py)."
properties:
usda:
type: object
required: [api_key]
additionalProperties: false
properties:
api_key:
Expand Down
21 changes: 9 additions & 12 deletions config/secrets.yaml.example
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,20 @@

# API Credentials Template
#
# Copy this file to config/secrets.yaml and fill in your credentials.
# IMPORTANT: config/secrets.yaml is excluded from git - never commit real credentials!
#
# Alternatively, you can set environment variables instead:
# export USDA_API_KEY="your-key"
# Each entry below is for a specific maintenance task; configure one only if you
# run that task. Each can also be set via an environment variable instead of
# this file (the env var takes precedence).
#
# Only the `usda` credential is needed to build and solve the model. The
# `ecmwf` and `zenodo` credentials below are MAINTAINER-ONLY: they are used
# only by tools/mirror_land_cover.py to refresh the Copernicus land-cover data
# mirrored on Zenodo.
# Copy this file to config/secrets.yaml and fill in what you need.
# IMPORTANT: config/secrets.yaml is excluded from git - never commit real credentials!

credentials:
# Needed only to refresh nutrition data from USDA FoodData Central
# (data.usda.retrieve_nutrition: true), i.e. after adding a food.
# Or set the USDA_API_KEY environment variable instead.
usda:
# USDA FoodData Central API key
# Get your API key from: https://fdc.nal.usda.gov/api-guide.html
# For testing, you can use "DEMO_KEY" but it has very limited rate limits
api_key: "DEMO_KEY" # Replace with your actual key
api_key: "API_KEY" # Replace with your actual key

# MAINTAINER-ONLY (tools/mirror_land_cover.py). Safe to omit for builds.
ecmwf:
Expand Down
12 changes: 6 additions & 6 deletions docs/data_sources.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Several licensed datasets cannot be fetched automatically. While their use is fr
3. Download the IHME 2023 dietary risk exposure estimates (two archives, ``IHME_GBD_2023_RISK_EXPOSURE_DIET_1`` and ``_2``) (:ref:`ihme-diet-risk-exposure`).
4. Obtain the **GDD-IA** intake CSVs by personal request to the Global Dietary Database team and place them as ``data/manually_downloaded/GDD-IA-intake_grams_{year}.csv`` and ``data/manually_downloaded/GDD-IA-intake_kcals_{year}.csv`` (:ref:`gdd-ia-dietary-intake`).

No Copernicus/ECMWF API key is required: the land cover data is fetched from a Zenodo mirror (:ref:`copernicus-land-cover`). The only API credential needed for an automated build is the USDA FoodData Central key (see :doc:`introduction`).
The one build-time credential is a free USDA FoodData Central key, used only to refresh the nutritional data (see :doc:`introduction`). Everything else is fetched from public downloads, the Zenodo land-cover mirror (:ref:`copernicus-land-cover`), or bundled data.


.. _weight-bases:
Expand Down Expand Up @@ -564,15 +564,15 @@ Copernicus Satellite Land Cover
* Spatial: Global (Plate Carree projection), 300 m resolution
* Temporal: Annual (with approximately one-year publication delay)

**Access**: Original source: https://cds.climate.copernicus.eu/datasets/satellite-land-cover. For builds, GLADE downloads a mirror of the single year/version it needs from Zenodo (see *Retrieval* below), so no Copernicus account or API key is required.
**Access**: Original source: https://cds.climate.copernicus.eu/datasets/satellite-land-cover. For builds, GLADE downloads a mirror of the single year/version it needs from Zenodo (see *Retrieval* below).

**License**: CC-BY-4.0. The 2016-onwards C3S maps (which is what GLADE uses, since ``baseline_year`` is 2020) are released under the Creative Commons Attribution 4.0 International licence, as stated in the authoritative C3S/Copernicus metadata. This permits redistribution provided the Copernicus attribution and source DOI are retained; both are embedded in the Zenodo deposition. (The CDS download page also bundles the ESA CCI licence -- which governs the pre-2016 v2.0.7 maps that GLADE does not use -- and the VITO licence, which restricts only near-real-time PROBA-V products, not historical annual maps.)

**Required attribution**: "Generated using Copernicus Climate Change Service information 2020. Neither the European Commission nor ECMWF is responsible for any use that may be made of the Copernicus information or data it contains."

**Citation**: Copernicus Climate Change Service, Climate Data Store, (2019): Land cover classification gridded maps from 1992 to present derived from satellite observation. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). https://doi.org/10.24381/cds.006f2c9a

**Retrieval**: Automatic via the ``download_land_cover`` Snakemake rule, which uses ``curl`` to fetch the pre-extracted land cover classification (``lccs_class`` only, ~320 MB NetCDF) from our Zenodo mirror -- no API key needed. The rule writes ``data/downloads/land_cover_lccs_class.nc``. The mirror itself is produced from the upstream CDS dataset by the maintainer tool ``tools/mirror_land_cover.py`` (see :ref:`redistributing-datasets`).
**Retrieval**: Automatic via the ``download_land_cover`` Snakemake rule, which uses ``curl`` to fetch the pre-extracted land cover classification (``lccs_class`` only, ~320 MB NetCDF) from our Zenodo mirror. The rule writes ``data/downloads/land_cover_lccs_class.nc``. The mirror itself is produced from the upstream CDS dataset by the maintainer tool ``tools/mirror_land_cover.py`` (see :ref:`redistributing-datasets`).

**Configuration**: The land cover year is derived from the top-level ``baseline_year`` parameter, and the version from ``config['data']['land_cover']['version']`` (default: v2_1_1). The mirror to download from is pinned by ``config['data']['land_cover']['zenodo_record']`` (the numeric Zenodo record id); the download URL and file name are derived from these three values.

Expand Down Expand Up @@ -749,7 +749,7 @@ IMF World Economic Outlook -- GDP per Capita
**Access**: https://www.imf.org/external/datamapper/NGDPDPC@WEO (`API documentation <https://www.imf.org/external/datamapper/api/help>`__) |
**License**: Free to use with attribution (`Terms of use <https://www.imf.org/en/about/copyright-and-terms#data>`__)

GDP per capita estimates (current prices, USD) from the World Economic Outlook database (indicator ``NGDPDPC``). Retrieved automatically via the IMF DataMapper API (no API key required). Output: ``processing/{name}/gdp_per_capita.csv``. Used by ``prepare_health_costs`` for multi-objective country clustering based on geography, GDP similarity, and population balance.
GDP per capita estimates (current prices, USD) from the World Economic Outlook database (indicator ``NGDPDPC``). Retrieved automatically via the IMF DataMapper API. Output: ``processing/{name}/gdp_per_capita.csv``. Used by ``prepare_health_costs`` for multi-objective country clustering based on geography, GDP similarity, and population balance.

Health and Epidemiology Data
-----------------------------
Expand Down Expand Up @@ -1027,9 +1027,9 @@ USDA FoodData Central

**Citation**: U.S. Department of Agriculture, Agricultural Research Service. FoodData Central. https://fdc.nal.usda.gov/

**Retrieval**: Optional via ``retrieve_usda_nutrition`` rule (using the API with included API key). Set ``data.usda.retrieve_nutrition: true`` in config to fetch fresh data. By default, the repository includes pre-fetched data in ``data/curated/nutrition.csv``.
**Retrieval**: The build uses the pre-fetched ``data/curated/nutrition.csv``. Set ``data.usda.retrieve_nutrition: true`` to instead fetch fresh data via the ``retrieve_usda_nutrition`` rule, which requires a USDA API key.

**API Key**: The repository includes a shared API key for convenience. Users can optionally obtain their own API key (free, instant signup) at https://fdc.nal.usda.gov/api-key-signup and update the ``data.usda.api_key`` value in the config.
**API Key**: Free, instant signup at https://fdc.nal.usda.gov/api-key-signup. Provide the key via the ``USDA_API_KEY`` environment variable or ``credentials.usda.api_key`` in ``config/secrets.yaml``; it is read only when ``retrieve_nutrition`` is enabled.

**Usage**: Nutritional composition of model foods (protein, carbohydrates, fat, energy). The mapping from model foods to USDA FoodData Central IDs is maintained in ``data/curated/usda_food_mapping.csv``.

Expand Down
4 changes: 2 additions & 2 deletions docs/development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -206,8 +206,8 @@ How It Works

Tests call a shared helper ``run_snakemake_target()`` in ``tests/conftest.py`` that invokes the Snakemake Python API directly (no subprocess). The helper layers ``tests/config/test.yaml`` on top of ``config/default.yaml`` and targets specific output files.

* **Dryrun test** (``test_workflow_dryrun``): Validates full DAG construction with ``forceall=True`` without executing any rule. Makes no API calls, but the startup credential gate (presence-only, so dummy values suffice) and the manually-downloaded source files must still be satisfied for the DAG to resolve. Catches missing inputs, broken rules, and invalid wildcard patterns.
* **Execution test** (``test_build_solve_analyze``): Runs the actual pipeline through analysis for the default scenario. Requires a USDA credential for data downloads on first run.
* **Dryrun test** (``test_workflow_dryrun``): Validates full DAG construction with ``forceall=True`` without executing any rule. Makes no API calls; the manually-downloaded source files must be present for the DAG to resolve. Catches missing inputs, broken rules, and invalid wildcard patterns.
* **Execution test** (``test_build_solve_analyze``): Runs the actual pipeline through analysis for the default scenario. On first run it downloads public input data (network access required).
* **Plot test** (``test_plots``): Generates representative plots from solved model outputs.

Tests never delete ``results/test/`` or ``.snakemake/``; Snakemake detects up-to-date outputs and skips them automatically, so subsequent runs are near-instant when code hasn't changed.
Expand Down
34 changes: 7 additions & 27 deletions docs/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -119,17 +119,11 @@ manually:
publication (will be released under CC-BY-NC). See :doc:`current_diets`
and the :ref:`gdd-ia-dietary-intake` entry in :doc:`data_sources`.

Only one API credential matters for automatic downloads:

* **USDA FoodData Central** — a free key from
https://fdc.nal.usda.gov/api-key-signup. The repository ships pre-fetched
nutritional data, so this is only needed if you want to refresh it; for a
default build ``DEMO_KEY`` suffices.

No Copernicus/ECMWF account is required: the satellite land-cover data is
fetched from a Zenodo mirror (see :ref:`copernicus-land-cover`). A Copernicus
CDS token is only needed by maintainers refreshing that mirror with
``tools/mirror_land_cover.py`` (see :ref:`redistributing-datasets`).
The one build-time credential is an optional, free `USDA FoodData Central
<https://fdc.nal.usda.gov/api-key-signup>`_ key, used to refresh the nutritional
data (``data.usda.retrieve_nutrition: true``) after adding a food to the model.
Maintainers refreshing the Zenodo land-cover mirror additionally need a
Copernicus CDS token (see :ref:`redistributing-datasets`).

Installation
------------
Expand Down Expand Up @@ -166,25 +160,11 @@ Installation

Replace ``"2.17"`` with the version reported by ``ldd --version``.

3. **Set up API credentials**:

.. code-block:: bash

cp config/secrets.yaml.example config/secrets.yaml

Edit ``config/secrets.yaml`` and fill in your USDA key (or leave the
``DEMO_KEY`` default for a standard build). Alternatively, set the
equivalent environment variable:

.. code-block:: bash

export USDA_API_KEY="your-usda-api-key"

4. **Download the manually-licensed datasets**: follow the
3. **Download the manually-licensed datasets**: follow the
:ref:`manual-download-checklist` in :doc:`data_sources` to place the three
IHME/GDD files under ``data/manually_downloaded/``.

5. **Verify the setup** with a dry run:
4. **Verify the setup** with a dry run:

.. code-block:: bash

Expand Down
Loading
Loading