OCO-3 Swath Bias Correction

This repository contains code for detecting and correcting swath-dependent biases in OCO-3 Snapshot Area Map (SAM) observations using a Random Forest-based approach.

Installation

Recommended (conda): handles the GEOS / PROJ / HDF5 / NetCDF system libraries required by cartopy, pyproj, netcdf4, and h5py automatically.

git clone https://github.com/your-username/oco3-swath-bias-correction.git
cd oco3-swath-bias-correction
conda env create -f environment.yml
conda activate oco3_bias

Alternative (pip): only works if GEOS, PROJ, and HDF5 are already installed on your system; otherwise the cartopy / pyproj / netcdf4 builds will fail.

git clone https://github.com/your-username/oco3-swath-bias-correction.git
cd oco3-swath-bias-correction
conda create -n oco3_bias python=3.9 && conda activate oco3_bias
pip install -r requirements.txt

Or run bash setup.sh, which picks the conda path when conda is available and falls back to pip otherwise.

Configuration

Recommended: copy src/utils/config_local.example.py to src/utils/config_local.py and set USER_DATA_DIR and USER_OUTPUT_DIR. If left unset, defaults are data/input and data/output under the project root.

📁 Repository Structure

code/
├── src/                    # Source code
│   ├── analysis/          # Research, analysis, and visualization scripts
│   │   └── run_comprehensive_analysis.py # Central analysis runner
│   ├── data_preparation/  # Data loading and preprocessing
│   ├── evaluation_analysis/ # Model evaluation and metrics
│   ├── modeling/          # Core bias correction models
│   │   └── Swath_BC_v3.py # Main RF pipeline
│   ├── processing/        # Data processing utilities
│   │   └── apply_swath_bc_RF.py # Production processing
│   ├── tools/            # Supporting tools and utilities
│   └── utils/            # Core utility functions and configuration
│       ├── config_paths.py # Centralized path configuration
│       └── main_util.py   # Main utility functions

├── data/                  # Data storage (contents excluded from git)
│   ├── intermediate/     # Intermediate analysis data
│   ├── labels/          # Manual bias labels (1,723 SAMs) - included
│   ├── models/          # Trained model artifacts
│   └── processed/       # Cross-validation results
├── results/              # Analysis outputs (structure preserved)
│   └── figures/         # Generated plots and figures
├── docs/                # Documentation
│   └── DATA_REQUIREMENTS.md # Detailed data setup guide
├── tmp/                 # Temporary files (excluded from git)

# Configuration and setup files
├── setup.sh            # Automated environment setup script
├── src/utils/config_paths.py         # Centralized path configuration
├── src/utils/config_local.example.py # Local override template (copy to config_local.py)
├── requirements.txt    # Python dependencies
├── LICENSE            # MIT license
├── Paper.md          # Research paper draft
└── README.md         # This file

Workflow and Usage

The project follows a three-phase workflow:

1. Model Training & Optimization    →    2. Data Processing    →    3. Analysis & Visualization
   (Development/Research)                 (Production)               (Results & Insights)

🔬 Phase 1: Model Training & Optimization

Purpose: Develop and optimize the Random Forest model for swath bias detection.

Script: src/modeling/Swath_BC_v3.py

What it does:

Uses 1,279 labeled SAMs for training.
Performs hyperparameter optimization using Optuna.
Employs 4-fold cross-validation for robust evaluation.
Implements a reordered pipeline (RF decision first, then targeted corrections).
Saves the optimized model, CV results, and performance metrics.

Usage:

python -m src.modeling.Swath_BC_v3

Output: Trained model in data/models/ with an F1-score of ~0.67.

🏭 Phase 2: Data Processing (Production)

Purpose: Apply the trained swath bias correction model to operational OCO-3 Lite files.

Script: src/processing/apply_swath_bc_RF.py

What it does:

Loads the final trained RF model from Phase 1.
Processes OCO-3 Lite files (*.nc4) in batch.
Efficiently applies corrections only to SAMs identified by the RF model (~15% of total).
Creates new bias-corrected NetCDF files with added variables:
- xco2_swath_bc: The bias-corrected XCO₂ value.
- swath_bias_corrected: A flag indicating if a correction was applied (0 for no, 1 for yes).

Usage:

python -m src.processing.apply_swath_bc_RF

Input: OCO-3 Lite files (oco3_LtCO2_*B11072Ar*.nc4) from your data directory Output: Corrected files in your configured output directory

📊 Phase 3: Analysis & Visualization

Purpose: Generate comprehensive analysis and visualizations of the model's performance and the correction results.

Script: run_comprehensive_analysis.py (a new, unified runner script)

What it does:

Provides a single, powerful command-line interface to run all analysis and visualization scripts.
Offers pre-defined groups of scripts for common tasks (--core, --plots, --publication, --validation).
Allows for running individual scripts.
Tracks progress, estimates run times, and handles errors gracefully.

Usage:

# Run the core analysis suite (recommended for a standard check)
python -m src.analysis.run_comprehensive_analysis --core

# Generate all figures for the paper
python -m src.analysis.run_comprehensive_analysis --publication

# See all available analysis options
python -m src.analysis.run_comprehensive_analysis --list

Available Analysis Groups:

--core: Essential analysis (SHAP, bias plots, evaluation plots).
--plots: All visualization scripts.
--validation: In-depth model validation (RF test, core SHAP).
--publication: Scripts to generate publication-ready figures.
--all: The complete analysis suite (9 scripts, ~30-45 minutes).

🎯 Expected Performance

Detection Accuracy: F1-score ~0.67 (67% accurate bias identification).
Processing Efficiency: ~15% of SAMs receive corrections.
Physical Validation: Confirms AOD-bias correlation (r=0.33).
Bias Reduction: Significant reduction in swath-to-swath XCO₂ jumps.

⚙️ Centralized Configuration System

All paths and experiment settings are now centralized in src/utils/config_paths.py for easy experiment management.

Quick Experiment Setup

To start a new experiment, edit only 3 lines in src/utils/config_paths.py:

# In src/utils/config_paths.py
MODEL_VERSION = "v4.0"                           # Your new version
EXPERIMENT_NAME = "Swath_BC_v4.0_NewFeatures"    # Your experiment name  
PROCESSING_VERSION = "v4.0"                      # Processing version

All scripts automatically use the new configuration:

python -m src.modeling.Swath_BC_v3                        # Train model
python -m src.processing.apply_swath_bc_RF                # Process data
python -m src.analysis.run_comprehensive_analysis --core  # Analyze results

Example Usage:

# Run the complete pipeline
python -m src.modeling.Swath_BC_v3                        # Train model
python -m src.processing.apply_swath_bc_RF                # Process data
python -m src.analysis.run_comprehensive_analysis --core  # Analyze results

Directory Structure Created:

your-project/
├── data/
│   ├── models/Swath_BC_v4.0/
│   ├── processed/Swath_BC_v4.0/ 
│   └── output/Lite_w_SwathBC_v4.0/
└── results/figures/

📊 Data Requirements

OCO-3 Level-2 Lite Files

This code requires OCO-3 Level-2 Lite files (Build B11 or later).

Download OCO-3 Data:

Official Source: NASA Goddard Earth Sciences Data and Information Services Center (GES DISC)
Required Product: OCO-3 Level-2 geolocated XCO₂ retrievals (Lite files)
File Pattern: oco3_LtCO2_*B11072Ar*.nc4
Coverage: Snapshot Area Map (SAM) observations (operation_mode = 4)

Minimum Dataset for Testing:

A few OCO-3 Lite files containing SAM observations
The included labels file (data/labels/Swath_Bias_labels.csv) for model training

Labeled Training Data

The repository includes manually labeled SAM data (data/labels/Swath_Bias_labels.csv) containing 1,723 expert-classified scenes for model training.

📖 Documentation

src/analysis/README.md: Detailed documentation for all analysis scripts.
src/utils/config_paths.py: Centralized path configuration (⭐ KEY FILE ⭐).
Paper.md: Research paper draft with comprehensive results.

📝 Citation

If you use this code in your research, please cite:

@article{mauceri2025oco3swath,
  title={Machine Learning Detection and Correction of Swath-Dependent Biases in OCO-3 Snapshot Area Map Observations},
  author={Mauceri, S. and others},
  journal={Under Review},
  year={2025}
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer: This bias correction dataset does not replace the official OCO-3 product and should be used for research applications only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCO-3 Swath Bias Correction

Installation

Configuration

📁 Repository Structure

Workflow and Usage

🔬 Phase 1: Model Training & Optimization

🏭 Phase 2: Data Processing (Production)

📊 Phase 3: Analysis & Visualization

🎯 Expected Performance

⚙️ Centralized Configuration System

Quick Experiment Setup

Example Usage:

Directory Structure Created:

📊 Data Requirements

OCO-3 Level-2 Lite Files

Labeled Training Data

📖 Documentation

📝 Citation

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
src		src
tmp		tmp
.gitignore		.gitignore
DATA_REQUIREMENTS.md		DATA_REQUIREMENTS.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

OCO-3 Swath Bias Correction

Installation

Configuration

📁 Repository Structure

Workflow and Usage

🔬 Phase 1: Model Training & Optimization

🏭 Phase 2: Data Processing (Production)

📊 Phase 3: Analysis & Visualization

🎯 Expected Performance

⚙️ Centralized Configuration System

Quick Experiment Setup

Example Usage:

Directory Structure Created:

📊 Data Requirements

OCO-3 Level-2 Lite Files

Labeled Training Data

📖 Documentation

📝 Citation

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages