Skip to content

mfarmani95/coffee_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Coffee Suitability and Deforestation Analysis

YAML-driven workflow for a rapid coffee-suitability and forest-loss analysis over a user-defined area of interest.

The pipeline can:

  • generate deterministic synthetic preprocessed inputs for an offline demo;
  • download supported real datasets with Python where possible;
  • preprocess raw data to a shared monthly time, lat, lon grid;
  • apply simple coffee-suitability thresholds for NDVI, rainfall, and soil moisture;
  • summarize overlap between suitable areas and forest-loss signals;
  • write time-series plots, spatial maps, CSV tables, NetCDF outputs, and summary text.

Repository Layout

coffee_analysis/
├── create_synthetic_preprocessed_data.py   # Offline demo data generator
├── download_data.py                        # Download supported raw datasets
├── preprocess_data.py                      # Harmonize raw data to NetCDF
├── run_analysis.py                         # Run suitability + deforestation analysis
├── validate_data.py                        # Validate/quarantine corrupt HDF5 files
├── data_pipeline.py                        # Shared pipeline utilities
├── pipeline_config.yml                     # Default synthetic demo config
├── pipeline_config.synthetic_preprocessed.yml
├── pipeline_config.real_few_days_l4_download.yml
├── pipeline_config.real_few_days_l4_analysis.yml
├── pipeline_config.real_12mo_download.yml
├── pipeline_config.real_12mo_analysis.yml
├── configs/legacy/                      # Older exploratory/example configs
├── data/
│   ├── README.md
│   └── hansen_forest_loss_gee_export.js
└── results/
    └── README.md

Large downloaded data and most generated results are intentionally ignored by Git. The small synthetic preprocessed inputs and synthetic showcase results are kept so the framework can be demonstrated immediately.

Install

Use Python 3.9+.

python3 -m pip install -r requirements.txt

On macOS, pyhdf is usually more reliable from conda-forge:

conda install -c conda-forge pyhdf
python3 -m pip install -r requirements.txt

Quick Start: Offline Synthetic Demo

This is the recommended first run because it does not require NASA Earthdata credentials or network downloads.

python3 create_synthetic_preprocessed_data.py --config pipeline_config.yml
python3 run_analysis.py --config pipeline_config.yml

This creates:

  • synthetic preprocessed NetCDFs in data/synthetic_preprocessed/;
  • plots, tables, masks, and summaries in results/synthetic_preprocessed/.

These synthetic showcase files are included in the shared project. You can regenerate them at any time with the two commands above.

The synthetic files match the expected preprocessed format: one NetCDF per variable with time, lat, lon, spatial_ref, and one data variable named ndvi, rainfall, soil_moisture, or forest_loss.

Real Data Workflow

For a tiny real-data smoke test using SMAP L4 over a few days:

python3 download_data.py --config pipeline_config.real_few_days_l4_download.yml
python3 validate_data.py --config pipeline_config.real_few_days_l4_download.yml --dataset soil_moisture
python3 preprocess_data.py --config pipeline_config.real_few_days_l4_download.yml --force
python3 run_analysis.py --config pipeline_config.real_few_days_l4_analysis.yml --force-preprocess

For a full 2023 run:

python3 download_data.py --config pipeline_config.real_12mo_download.yml
python3 validate_data.py --config pipeline_config.real_12mo_download.yml --dataset soil_moisture
python3 preprocess_data.py --config pipeline_config.real_12mo_download.yml --force
python3 run_analysis.py --config pipeline_config.real_12mo_analysis.yml --force-preprocess

The full-year run is much larger because SMAP L4 is sub-daily. Use the few-day run first to confirm credentials and file parsing.

Supported Datasets

The real-data configs currently use:

  • CHIRPS rainfall from a public monthly GeoTIFF URL pattern;
  • MODIS monthly NDVI from MOD13A3.061 via earthaccess;
  • SMAP L4 surface soil moisture from SPL4SMGP.008 via earthaccess;
  • Hansen forest loss from a manual Google Earth Engine export.

Hansen forest loss is not downloaded automatically. Paste data/hansen_forest_loss_gee_export.js into the Google Earth Engine Code Editor, export the lossyear GeoTIFF, then place it at the raw_path used by the YAML config.

Earthdata Login

MODIS and SMAP downloads use earthaccess. Credentials are not stored in YAML.

The downloader tries:

  • a valid ~/.netrc;
  • EARTHDATA_USERNAME and EARTHDATA_PASSWORD;
  • an interactive terminal prompt.

Example environment-variable setup:

export EARTHDATA_USERNAME="your_username"
export EARTHDATA_PASSWORD="your_password"

YAML Modes

Each dataset has an input mode:

  • preprocessed: use the configured NetCDF file and skip download/preprocess;
  • raw: read local raw files and run preprocessing;
  • download: download supported raw files, then run preprocessing.

The main preprocessing behavior is:

  • clip to the AOI and time window from the YAML;
  • reproject raw inputs to analysis.grid.target_crs;
  • choose a coarsest common grid when analysis.grid.strategy: "coarsest";
  • aggregate to monthly time steps;
  • save standardized NetCDF files for the analysis step.

Preprocessed File Format

Each preprocessed NetCDF should contain:

  • dimensions: time, lat, lon;
  • CRS coordinate: spatial_ref;
  • one data variable matching the dataset key, such as ndvi;
  • monthly timestamps matching the analysis period.

Example:

Dimensions:  time, lat, lon
Coordinates: time, lat, lon, spatial_ref
Variables:   ndvi
Attributes:  input_source_mode, dataset_key, grid_strategy, target_crs

Outputs

Each analysis run writes a results folder containing:

  • 01_timeseries_trends.png
  • 02_spatial_maps.png
  • 03_rainfall_soilmoisture_maps.png
  • timeseries_indicators.csv
  • annual_forest_loss.csv
  • processed_dataset.nc
  • coffee_suitability_mask.npy
  • cumulative_forest_loss_map.npy
  • overlap_mask.npy
  • ndvi_trend_map.npy
  • ANALYSIS_SUMMARY.txt
  • RESULTS_SUMMARY.txt
  • analysis_metadata.json

Troubleshooting

If SMAP preprocessing reports truncated .h5 files:

python3 validate_data.py --config pipeline_config.real_few_days_l4_download.yml --dataset soil_moisture --quarantine-corrupt
python3 download_data.py --config pipeline_config.real_few_days_l4_download.yml

Then rerun preprocessing.

If pyhdf installation fails with pip, install it with conda-forge:

conda install -c conda-forge pyhdf

If a dataset cannot be downloaded or parsed automatically, provide either:

  • a preprocessed NetCDF and set mode: preprocessed; or
  • local raw files and set mode: raw.

Notes

The maintained workflow is script-first and YAML-driven.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors