Skip to content

JPLMLIA/OCO3_swath_bias_correction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OCO-3 Swath Bias Correction

This repository contains code for detecting and correcting swath-dependent biases in OCO-3 Snapshot Area Map (SAM) observations using a Random Forest-based approach.

Installation

Recommended (conda): handles the GEOS / PROJ / HDF5 / NetCDF system libraries required by cartopy, pyproj, netcdf4, and h5py automatically.

git clone https://github.com/your-username/oco3-swath-bias-correction.git
cd oco3-swath-bias-correction
conda env create -f environment.yml
conda activate oco3_bias

Alternative (pip): only works if GEOS, PROJ, and HDF5 are already installed on your system; otherwise the cartopy / pyproj / netcdf4 builds will fail.

git clone https://github.com/your-username/oco3-swath-bias-correction.git
cd oco3-swath-bias-correction
conda create -n oco3_bias python=3.9 && conda activate oco3_bias
pip install -r requirements.txt

Or run bash setup.sh, which picks the conda path when conda is available and falls back to pip otherwise.

Configuration

Recommended: copy src/utils/config_local.example.py to src/utils/config_local.py and set USER_DATA_DIR and USER_OUTPUT_DIR. If left unset, defaults are data/input and data/output under the project root.

πŸ“ Repository Structure

code/
β”œβ”€β”€ src/                    # Source code
β”‚   β”œβ”€β”€ analysis/          # Research, analysis, and visualization scripts
β”‚   β”‚   └── run_comprehensive_analysis.py # Central analysis runner
β”‚   β”œβ”€β”€ data_preparation/  # Data loading and preprocessing
β”‚   β”œβ”€β”€ evaluation_analysis/ # Model evaluation and metrics
β”‚   β”œβ”€β”€ modeling/          # Core bias correction models
β”‚   β”‚   └── Swath_BC_v3.py # Main RF pipeline
β”‚   β”œβ”€β”€ processing/        # Data processing utilities
β”‚   β”‚   └── apply_swath_bc_RF.py # Production processing
β”‚   β”œβ”€β”€ tools/            # Supporting tools and utilities
β”‚   └── utils/            # Core utility functions and configuration
β”‚       β”œβ”€β”€ config_paths.py # Centralized path configuration
β”‚       └── main_util.py   # Main utility functions

β”œβ”€β”€ data/                  # Data storage (contents excluded from git)
β”‚   β”œβ”€β”€ intermediate/     # Intermediate analysis data
β”‚   β”œβ”€β”€ labels/          # Manual bias labels (1,723 SAMs) - included
β”‚   β”œβ”€β”€ models/          # Trained model artifacts
β”‚   └── processed/       # Cross-validation results
β”œβ”€β”€ results/              # Analysis outputs (structure preserved)
β”‚   └── figures/         # Generated plots and figures
β”œβ”€β”€ docs/                # Documentation
β”‚   └── DATA_REQUIREMENTS.md # Detailed data setup guide
β”œβ”€β”€ tmp/                 # Temporary files (excluded from git)

# Configuration and setup files
β”œβ”€β”€ setup.sh            # Automated environment setup script
β”œβ”€β”€ src/utils/config_paths.py         # Centralized path configuration
β”œβ”€β”€ src/utils/config_local.example.py # Local override template (copy to config_local.py)
β”œβ”€β”€ requirements.txt    # Python dependencies
β”œβ”€β”€ LICENSE            # MIT license
β”œβ”€β”€ Paper.md          # Research paper draft
└── README.md         # This file

Workflow and Usage

The project follows a three-phase workflow:

1. Model Training & Optimization    β†’    2. Data Processing    β†’    3. Analysis & Visualization
   (Development/Research)                 (Production)               (Results & Insights)

πŸ”¬ Phase 1: Model Training & Optimization

Purpose: Develop and optimize the Random Forest model for swath bias detection.

Script: src/modeling/Swath_BC_v3.py

What it does:

  • Uses 1,279 labeled SAMs for training.
  • Performs hyperparameter optimization using Optuna.
  • Employs 4-fold cross-validation for robust evaluation.
  • Implements a reordered pipeline (RF decision first, then targeted corrections).
  • Saves the optimized model, CV results, and performance metrics.

Usage:

python -m src.modeling.Swath_BC_v3

Output: Trained model in data/models/ with an F1-score of ~0.67.


🏭 Phase 2: Data Processing (Production)

Purpose: Apply the trained swath bias correction model to operational OCO-3 Lite files.

Script: src/processing/apply_swath_bc_RF.py

What it does:

  • Loads the final trained RF model from Phase 1.
  • Processes OCO-3 Lite files (*.nc4) in batch.
  • Efficiently applies corrections only to SAMs identified by the RF model (~15% of total).
  • Creates new bias-corrected NetCDF files with added variables:
    • xco2_swath_bc: The bias-corrected XCOβ‚‚ value.
    • swath_bias_corrected: A flag indicating if a correction was applied (0 for no, 1 for yes).

Usage:

python -m src.processing.apply_swath_bc_RF

Input: OCO-3 Lite files (oco3_LtCO2_*B11072Ar*.nc4) from your data directory Output: Corrected files in your configured output directory


πŸ“Š Phase 3: Analysis & Visualization

Purpose: Generate comprehensive analysis and visualizations of the model's performance and the correction results.

Script: run_comprehensive_analysis.py (a new, unified runner script)

What it does:

  • Provides a single, powerful command-line interface to run all analysis and visualization scripts.
  • Offers pre-defined groups of scripts for common tasks (--core, --plots, --publication, --validation).
  • Allows for running individual scripts.
  • Tracks progress, estimates run times, and handles errors gracefully.

Usage:

# Run the core analysis suite (recommended for a standard check)
python -m src.analysis.run_comprehensive_analysis --core

# Generate all figures for the paper
python -m src.analysis.run_comprehensive_analysis --publication

# See all available analysis options
python -m src.analysis.run_comprehensive_analysis --list

Available Analysis Groups:

  • --core: Essential analysis (SHAP, bias plots, evaluation plots).
  • --plots: All visualization scripts.
  • --validation: In-depth model validation (RF test, core SHAP).
  • --publication: Scripts to generate publication-ready figures.
  • --all: The complete analysis suite (9 scripts, ~30-45 minutes).

🎯 Expected Performance

  • Detection Accuracy: F1-score ~0.67 (67% accurate bias identification).
  • Processing Efficiency: ~15% of SAMs receive corrections.
  • Physical Validation: Confirms AOD-bias correlation (r=0.33).
  • Bias Reduction: Significant reduction in swath-to-swath XCOβ‚‚ jumps.

βš™οΈ Centralized Configuration System

All paths and experiment settings are now centralized in src/utils/config_paths.py for easy experiment management.

Quick Experiment Setup

To start a new experiment, edit only 3 lines in src/utils/config_paths.py:

# In src/utils/config_paths.py
MODEL_VERSION = "v4.0"                           # Your new version
EXPERIMENT_NAME = "Swath_BC_v4.0_NewFeatures"    # Your experiment name  
PROCESSING_VERSION = "v4.0"                      # Processing version

All scripts automatically use the new configuration:

python -m src.modeling.Swath_BC_v3                        # Train model
python -m src.processing.apply_swath_bc_RF                # Process data
python -m src.analysis.run_comprehensive_analysis --core  # Analyze results

Example Usage:

# Run the complete pipeline
python -m src.modeling.Swath_BC_v3                        # Train model
python -m src.processing.apply_swath_bc_RF                # Process data
python -m src.analysis.run_comprehensive_analysis --core  # Analyze results

Directory Structure Created:

your-project/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ models/Swath_BC_v4.0/
β”‚   β”œβ”€β”€ processed/Swath_BC_v4.0/ 
β”‚   └── output/Lite_w_SwathBC_v4.0/
└── results/figures/

πŸ“Š Data Requirements

OCO-3 Level-2 Lite Files

This code requires OCO-3 Level-2 Lite files (Build B11 or later).

Download OCO-3 Data:

Minimum Dataset for Testing:

  • A few OCO-3 Lite files containing SAM observations
  • The included labels file (data/labels/Swath_Bias_labels.csv) for model training

Labeled Training Data

The repository includes manually labeled SAM data (data/labels/Swath_Bias_labels.csv) containing 1,723 expert-classified scenes for model training.

πŸ“– Documentation

  • src/analysis/README.md: Detailed documentation for all analysis scripts.
  • src/utils/config_paths.py: Centralized path configuration (⭐ KEY FILE ⭐).
  • Paper.md: Research paper draft with comprehensive results.

πŸ“ Citation

If you use this code in your research, please cite:

@article{mauceri2025oco3swath,
  title={Machine Learning Detection and Correction of Swath-Dependent Biases in OCO-3 Snapshot Area Map Observations},
  author={Mauceri, S. and others},
  journal={Under Review},
  year={2025}
}

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer: This bias correction dataset does not replace the official OCO-3 product and should be used for research applications only.

About

Allows to correct swath biases in OCO-3 retrieved XCO2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors