Coffee Suitability and Deforestation Analysis

YAML-driven workflow for a rapid coffee-suitability and forest-loss analysis over a user-defined area of interest.

The pipeline can:

generate deterministic synthetic preprocessed inputs for an offline demo;
download supported real datasets with Python where possible;
preprocess raw data to a shared monthly time, lat, lon grid;
apply simple coffee-suitability thresholds for NDVI, rainfall, and soil moisture;
summarize overlap between suitable areas and forest-loss signals;
write time-series plots, spatial maps, CSV tables, NetCDF outputs, and summary text.

Repository Layout

coffee_analysis/
├── create_synthetic_preprocessed_data.py   # Offline demo data generator
├── download_data.py                        # Download supported raw datasets
├── preprocess_data.py                      # Harmonize raw data to NetCDF
├── run_analysis.py                         # Run suitability + deforestation analysis
├── validate_data.py                        # Validate/quarantine corrupt HDF5 files
├── data_pipeline.py                        # Shared pipeline utilities
├── pipeline_config.yml                     # Default synthetic demo config
├── pipeline_config.synthetic_preprocessed.yml
├── pipeline_config.real_few_days_l4_download.yml
├── pipeline_config.real_few_days_l4_analysis.yml
├── pipeline_config.real_12mo_download.yml
├── pipeline_config.real_12mo_analysis.yml
├── configs/legacy/                      # Older exploratory/example configs
├── data/
│   ├── README.md
│   └── hansen_forest_loss_gee_export.js
└── results/
    └── README.md

Large downloaded data and most generated results are intentionally ignored by Git. The small synthetic preprocessed inputs and synthetic showcase results are kept so the framework can be demonstrated immediately.

Install

Use Python 3.9+.

python3 -m pip install -r requirements.txt

On macOS, pyhdf is usually more reliable from conda-forge:

conda install -c conda-forge pyhdf
python3 -m pip install -r requirements.txt

Quick Start: Offline Synthetic Demo

This is the recommended first run because it does not require NASA Earthdata credentials or network downloads.

python3 create_synthetic_preprocessed_data.py --config pipeline_config.yml
python3 run_analysis.py --config pipeline_config.yml

This creates:

synthetic preprocessed NetCDFs in data/synthetic_preprocessed/;
plots, tables, masks, and summaries in results/synthetic_preprocessed/.

These synthetic showcase files are included in the shared project. You can regenerate them at any time with the two commands above.

The synthetic files match the expected preprocessed format: one NetCDF per variable with time, lat, lon, spatial_ref, and one data variable named ndvi, rainfall, soil_moisture, or forest_loss.

Real Data Workflow

For a tiny real-data smoke test using SMAP L4 over a few days:

python3 download_data.py --config pipeline_config.real_few_days_l4_download.yml
python3 validate_data.py --config pipeline_config.real_few_days_l4_download.yml --dataset soil_moisture
python3 preprocess_data.py --config pipeline_config.real_few_days_l4_download.yml --force
python3 run_analysis.py --config pipeline_config.real_few_days_l4_analysis.yml --force-preprocess

For a full 2023 run:

python3 download_data.py --config pipeline_config.real_12mo_download.yml
python3 validate_data.py --config pipeline_config.real_12mo_download.yml --dataset soil_moisture
python3 preprocess_data.py --config pipeline_config.real_12mo_download.yml --force
python3 run_analysis.py --config pipeline_config.real_12mo_analysis.yml --force-preprocess

The full-year run is much larger because SMAP L4 is sub-daily. Use the few-day run first to confirm credentials and file parsing.

Supported Datasets

The real-data configs currently use:

CHIRPS rainfall from a public monthly GeoTIFF URL pattern;
MODIS monthly NDVI from MOD13A3.061 via earthaccess;
SMAP L4 surface soil moisture from SPL4SMGP.008 via earthaccess;
Hansen forest loss from a manual Google Earth Engine export.

Hansen forest loss is not downloaded automatically. Paste data/hansen_forest_loss_gee_export.js into the Google Earth Engine Code Editor, export the lossyear GeoTIFF, then place it at the raw_path used by the YAML config.

Earthdata Login

MODIS and SMAP downloads use earthaccess. Credentials are not stored in YAML.

The downloader tries:

a valid ~/.netrc;
EARTHDATA_USERNAME and EARTHDATA_PASSWORD;
an interactive terminal prompt.

Example environment-variable setup:

export EARTHDATA_USERNAME="your_username"
export EARTHDATA_PASSWORD="your_password"

YAML Modes

Each dataset has an input mode:

preprocessed: use the configured NetCDF file and skip download/preprocess;
raw: read local raw files and run preprocessing;
download: download supported raw files, then run preprocessing.

The main preprocessing behavior is:

clip to the AOI and time window from the YAML;
reproject raw inputs to analysis.grid.target_crs;
choose a coarsest common grid when analysis.grid.strategy: "coarsest";
aggregate to monthly time steps;
save standardized NetCDF files for the analysis step.

Preprocessed File Format

Each preprocessed NetCDF should contain:

dimensions: time, lat, lon;
CRS coordinate: spatial_ref;
one data variable matching the dataset key, such as ndvi;
monthly timestamps matching the analysis period.

Example:

Dimensions:  time, lat, lon
Coordinates: time, lat, lon, spatial_ref
Variables:   ndvi
Attributes:  input_source_mode, dataset_key, grid_strategy, target_crs

Outputs

Each analysis run writes a results folder containing:

01_timeseries_trends.png
02_spatial_maps.png
03_rainfall_soilmoisture_maps.png
timeseries_indicators.csv
annual_forest_loss.csv
processed_dataset.nc
coffee_suitability_mask.npy
cumulative_forest_loss_map.npy
overlap_mask.npy
ndvi_trend_map.npy
ANALYSIS_SUMMARY.txt
RESULTS_SUMMARY.txt
analysis_metadata.json

Troubleshooting

If SMAP preprocessing reports truncated .h5 files:

python3 validate_data.py --config pipeline_config.real_few_days_l4_download.yml --dataset soil_moisture --quarantine-corrupt
python3 download_data.py --config pipeline_config.real_few_days_l4_download.yml

Then rerun preprocessing.

If pyhdf installation fails with pip, install it with conda-forge:

conda install -c conda-forge pyhdf

If a dataset cannot be downloaded or parsed automatically, provide either:

a preprocessed NetCDF and set mode: preprocessed; or
local raw files and set mode: raw.

Notes

The maintained workflow is script-first and YAML-driven.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coffee Suitability and Deforestation Analysis

Repository Layout

Install

Quick Start: Offline Synthetic Demo

Real Data Workflow

Supported Datasets

Earthdata Login

YAML Modes

Preprocessed File Format

Outputs

Troubleshooting

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs/legacy		configs/legacy
data		data
results		results
.gitignore		.gitignore
QUICKSTART.txt		QUICKSTART.txt
README.md		README.md
create_synthetic_preprocessed_data.py		create_synthetic_preprocessed_data.py
data_pipeline.py		data_pipeline.py
download_data.py		download_data.py
pipeline_config.real_12mo_analysis.yml		pipeline_config.real_12mo_analysis.yml
pipeline_config.real_12mo_download.yml		pipeline_config.real_12mo_download.yml
pipeline_config.real_few_days_l4_analysis.yml		pipeline_config.real_few_days_l4_analysis.yml
pipeline_config.real_few_days_l4_download.yml		pipeline_config.real_few_days_l4_download.yml
pipeline_config.synthetic_preprocessed.yml		pipeline_config.synthetic_preprocessed.yml
pipeline_config.yml		pipeline_config.yml
preprocess_data.py		preprocess_data.py
requirements.txt		requirements.txt
run_analysis.py		run_analysis.py
validate_data.py		validate_data.py

Folders and files

Latest commit

History

Repository files navigation

Coffee Suitability and Deforestation Analysis

Repository Layout

Install

Quick Start: Offline Synthetic Demo

Real Data Workflow

Supported Datasets

Earthdata Login

YAML Modes

Preprocessed File Format

Outputs

Troubleshooting

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages