This repository contains code for detecting and correcting swath-dependent biases in OCO-3 Snapshot Area Map (SAM) observations using a Random Forest-based approach.
Recommended (conda): handles the GEOS / PROJ / HDF5 / NetCDF system libraries
required by cartopy, pyproj, netcdf4, and h5py automatically.
git clone https://github.com/your-username/oco3-swath-bias-correction.git
cd oco3-swath-bias-correction
conda env create -f environment.yml
conda activate oco3_biasAlternative (pip): only works if GEOS, PROJ, and HDF5 are already installed
on your system; otherwise the cartopy / pyproj / netcdf4 builds will fail.
git clone https://github.com/your-username/oco3-swath-bias-correction.git
cd oco3-swath-bias-correction
conda create -n oco3_bias python=3.9 && conda activate oco3_bias
pip install -r requirements.txtOr run bash setup.sh, which picks the conda path when conda is available and
falls back to pip otherwise.
Recommended: copy src/utils/config_local.example.py to src/utils/config_local.py and set USER_DATA_DIR and USER_OUTPUT_DIR.
If left unset, defaults are data/input and data/output under the project root.
code/
βββ src/ # Source code
β βββ analysis/ # Research, analysis, and visualization scripts
β β βββ run_comprehensive_analysis.py # Central analysis runner
β βββ data_preparation/ # Data loading and preprocessing
β βββ evaluation_analysis/ # Model evaluation and metrics
β βββ modeling/ # Core bias correction models
β β βββ Swath_BC_v3.py # Main RF pipeline
β βββ processing/ # Data processing utilities
β β βββ apply_swath_bc_RF.py # Production processing
β βββ tools/ # Supporting tools and utilities
β βββ utils/ # Core utility functions and configuration
β βββ config_paths.py # Centralized path configuration
β βββ main_util.py # Main utility functions
βββ data/ # Data storage (contents excluded from git)
β βββ intermediate/ # Intermediate analysis data
β βββ labels/ # Manual bias labels (1,723 SAMs) - included
β βββ models/ # Trained model artifacts
β βββ processed/ # Cross-validation results
βββ results/ # Analysis outputs (structure preserved)
β βββ figures/ # Generated plots and figures
βββ docs/ # Documentation
β βββ DATA_REQUIREMENTS.md # Detailed data setup guide
βββ tmp/ # Temporary files (excluded from git)
# Configuration and setup files
βββ setup.sh # Automated environment setup script
βββ src/utils/config_paths.py # Centralized path configuration
βββ src/utils/config_local.example.py # Local override template (copy to config_local.py)
βββ requirements.txt # Python dependencies
βββ LICENSE # MIT license
βββ Paper.md # Research paper draft
βββ README.md # This file
The project follows a three-phase workflow:
1. Model Training & Optimization β 2. Data Processing β 3. Analysis & Visualization
(Development/Research) (Production) (Results & Insights)
Purpose: Develop and optimize the Random Forest model for swath bias detection.
Script: src/modeling/Swath_BC_v3.py
What it does:
- Uses 1,279 labeled SAMs for training.
- Performs hyperparameter optimization using Optuna.
- Employs 4-fold cross-validation for robust evaluation.
- Implements a reordered pipeline (RF decision first, then targeted corrections).
- Saves the optimized model, CV results, and performance metrics.
Usage:
python -m src.modeling.Swath_BC_v3Output: Trained model in data/models/ with an F1-score of ~0.67.
Purpose: Apply the trained swath bias correction model to operational OCO-3 Lite files.
Script: src/processing/apply_swath_bc_RF.py
What it does:
- Loads the final trained RF model from Phase 1.
- Processes OCO-3 Lite files (
*.nc4) in batch. - Efficiently applies corrections only to SAMs identified by the RF model (~15% of total).
- Creates new bias-corrected NetCDF files with added variables:
xco2_swath_bc: The bias-corrected XCOβ value.swath_bias_corrected: A flag indicating if a correction was applied (0 for no, 1 for yes).
Usage:
python -m src.processing.apply_swath_bc_RFInput: OCO-3 Lite files (oco3_LtCO2_*B11072Ar*.nc4) from your data directory
Output: Corrected files in your configured output directory
Purpose: Generate comprehensive analysis and visualizations of the model's performance and the correction results.
Script: run_comprehensive_analysis.py (a new, unified runner script)
What it does:
- Provides a single, powerful command-line interface to run all analysis and visualization scripts.
- Offers pre-defined groups of scripts for common tasks (
--core,--plots,--publication,--validation). - Allows for running individual scripts.
- Tracks progress, estimates run times, and handles errors gracefully.
Usage:
# Run the core analysis suite (recommended for a standard check)
python -m src.analysis.run_comprehensive_analysis --core
# Generate all figures for the paper
python -m src.analysis.run_comprehensive_analysis --publication
# See all available analysis options
python -m src.analysis.run_comprehensive_analysis --listAvailable Analysis Groups:
--core: Essential analysis (SHAP, bias plots, evaluation plots).--plots: All visualization scripts.--validation: In-depth model validation (RF test, core SHAP).--publication: Scripts to generate publication-ready figures.--all: The complete analysis suite (9 scripts, ~30-45 minutes).
- Detection Accuracy: F1-score ~0.67 (67% accurate bias identification).
- Processing Efficiency: ~15% of SAMs receive corrections.
- Physical Validation: Confirms AOD-bias correlation (r=0.33).
- Bias Reduction: Significant reduction in swath-to-swath XCOβ jumps.
All paths and experiment settings are now centralized in src/utils/config_paths.py for easy experiment management.
To start a new experiment, edit only 3 lines in src/utils/config_paths.py:
# In src/utils/config_paths.py
MODEL_VERSION = "v4.0" # Your new version
EXPERIMENT_NAME = "Swath_BC_v4.0_NewFeatures" # Your experiment name
PROCESSING_VERSION = "v4.0" # Processing versionAll scripts automatically use the new configuration:
python -m src.modeling.Swath_BC_v3 # Train model
python -m src.processing.apply_swath_bc_RF # Process data
python -m src.analysis.run_comprehensive_analysis --core # Analyze results# Run the complete pipeline
python -m src.modeling.Swath_BC_v3 # Train model
python -m src.processing.apply_swath_bc_RF # Process data
python -m src.analysis.run_comprehensive_analysis --core # Analyze resultsyour-project/
βββ data/
β βββ models/Swath_BC_v4.0/
β βββ processed/Swath_BC_v4.0/
β βββ output/Lite_w_SwathBC_v4.0/
βββ results/figures/
This code requires OCO-3 Level-2 Lite files (Build B11 or later).
Download OCO-3 Data:
- Official Source: NASA Goddard Earth Sciences Data and Information Services Center (GES DISC)
- Required Product: OCO-3 Level-2 geolocated XCOβ retrievals (Lite files)
- File Pattern:
oco3_LtCO2_*B11072Ar*.nc4 - Coverage: Snapshot Area Map (SAM) observations (operation_mode = 4)
Minimum Dataset for Testing:
- A few OCO-3 Lite files containing SAM observations
- The included labels file (
data/labels/Swath_Bias_labels.csv) for model training
The repository includes manually labeled SAM data (data/labels/Swath_Bias_labels.csv) containing 1,723 expert-classified scenes for model training.
src/analysis/README.md: Detailed documentation for all analysis scripts.src/utils/config_paths.py: Centralized path configuration (β KEY FILE β).Paper.md: Research paper draft with comprehensive results.
If you use this code in your research, please cite:
@article{mauceri2025oco3swath,
title={Machine Learning Detection and Correction of Swath-Dependent Biases in OCO-3 Snapshot Area Map Observations},
author={Mauceri, S. and others},
journal={Under Review},
year={2025}
}This project is licensed under the MIT License - see the LICENSE file for details.
Disclaimer: This bias correction dataset does not replace the official OCO-3 product and should be used for research applications only.