OCO-2/3 Bias Correction and Filtering Pipeline

Overview

This project implements and explores the effect of applying a new bias correction and quality filtering approach to increase the accuracy of atmospheric CO2 measurements derived from the Orbiting Carbon Observatory-2 (OCO-2) satellite. This is not an official OCO-2 data product. For details on the approach, please refer to:

https://doi.org/10.22541/essoar.174164198.80749970/v1 and https://doi.org/10.22541/essoar.174164203.37422284/v1

License

The source code is licensed under the terms found in the LICENSE file.

Data Citations

This work utilizes several datasets which we cite below:

OCO-2 LiteFiles

https://doi.org/10.5067/8E4VLCK16O6Q

Total Carbon Column Observing Network

https://doi.org/10.14291/TCCON.GGG2020

Flux Inversions

CarbonTracker CT-NRT. v2022-1 Jacobson, A., Schuldt, K., Tans, P., Andrews, A., Miller, J., & Oda, T. (2023). CarbonTracker CT2022.

MACC v21r1 https://doi.org/10.1029/2010JD013887

LoFI m2ccv1bsim https://doi.org/10.5194/acp-21-9609-2021

UnivEd v5.2 https://doi.org/10.5194/acp-9-2619-2009

EDGAR

v6.0 http://data.europa.eu/89h/97a67d67-c62e-4826-b873-9d972c4f670b

MODIS Cloud Distance

https://doi.org/10.5281/zenodo.4008765

Setup

To set up the required Python environment, use the provided environment.yml file with Conda:

conda env create -f environment.yml
conda activate bias_filt

Data paths are managed via paths.py. For external data, you may need to set the OCO_DATA_BASE environment variable to point to your data directory. See paths.py for details.

Running the Pipeline

The main way to execute the processing pipeline is by using the run_bias_correction_pipeline.py script. This script manages the execution of individual processing steps in the correct order and can resume from the last completed step.

To run the pipeline starting from the beginning or resuming from the last completed step:

python run_bias_correction_pipeline.py

To clean the pipeline status file (forcing the pipeline to start from scratch on the next run):

python run_bias_correction_pipeline.py --clean-status

Processing Scripts Details

The following scripts constitute the processing pipeline and are called by run_bias_correction_pipeline.py. They are generally intended to be run in the order listed below if run manually.

Processing steps of data for bias correction and filtering of OCO2/3 data

#merge OCO2 Lite files to Parquet bias_correction/01_create_initial_parquet.py

Converts OCO-2/3 Lite files (netCDF format) to parquet files
Removes unnecessary variables and cleans up naming conventions
Optimizes for performance by removing redundant data
Handles different data dimensions and formats them into a pandas DataFrame

make SA and calculate SA bias

bias_correction/02_create_small_areas.py

Creates Small Area (SA) groupings of soundings
Calculates SA biases in XCO2 retrievals
Helps identify systematic biases in small geographic regions

flag SA on coast lines

bias_correction/03_flag_coastal_soundings.py

Flags small areas that cross from land to ocean
Identifies coastal regions where land-water transitions occur
Helps handle special cases in bias correction near coastlines

add TCCON data to dataset

bias_correction/04_integrate_tccon_data.py

Adds TCCON (Total Carbon Column Observing Network) data to the dataset
Matches OCO-2/3 soundings with nearby TCCON stations
Calculates distances to TCCON stations
Adds TCCON XCO2 values and station names to the dataset

add clouds

bias_correction/05_integrate_cloud_data.py

Adds cloud information to the dataset
Includes cloud distance and cloud fraction data
Helps identify and filter out cloud-contaminated soundings

add model to dataset

bias_correction/06_integrate_flux_model_data.py

Adds model data (like GEOS-Chem) to the dataset
Matches model output with OCO-2/3 soundings in time and space
Provides additional context for bias correction

flag strong emitters

bias_correction/07_filter_strong_emission_sources.py

data cleaning

bias_correction/08_remove_outliers.py

Performs initial data cleaning and quality filtering
Applies various quality flags based on retrieval parameters
Removes problematic soundings (e.g., snow-covered areas, high aerosol loading)
Has different filtering criteria for land and ocean soundings

allow for faster data loading

bias_correction/09_prepare_model_input_data.py

Creates preloaded data files for faster processing
Optimizes data loading for subsequent analysis
Reduces memory usage and processing time

(Optional) perform feature selection to optimize model inputs

bias_correction/10_feature_selection.py

Analyzes feature importance and selects an optimal set for model training
Helps improve model performance and reduce complexity

train model

bias_correction/11_train_bias_correction_model.py bias_correction/12_train_bias_correction_model_spatially_weighted.py bias_correction/13_train_bias_correction_model_kfold_validation.py bias_correction/14_train_bias_correction_model_spatially_weighted_kfold_validation.py

Trains machine learning models for bias correction
Uses Random Forest and other ML algorithms
Corrects systematic biases in XCO2 retrievals
Handles both TCCON and Small Area biases

Quality Filtering

run_filter_pipeline.py

Optionally runs run_bias_correction_pipeline.py
Optionally runs optimize_filter.py
Trains and applies ML subfilter and uncertainty filter as described in Part 2, for a given config.

optimizing filter models (should be run in directory where you want to save optuna trials and plots)

optimize_filter.py

Optimizes quality filtering parameters
Uses Optuna for hyperparameter optimization
Balances data quality and throughput
Has separate optimization for land and ocean soundings

Apply trained bias correction and filtering model

bias_correction/15_Export_Lite_Files.py

additional plots (optional)

visualization_scripts/vis_bias_corr.py

Creates visualization plots of bias correction results
Compares corrected data with TCCON measurements
Shows spatial patterns of biases
Analyzes performance at land-water crossings

viz_filter_notebook.ipynb -plots/figures for Part 2.

The pipeline follows a logical flow:

Data preparation (bias_correction/01_create_initial_parquet.py, bias_correction/02_create_small_areas.py, bias_correction/03_flag_coastal_soundings.py)
Integration of external datasets (bias_correction/04_integrate_tccon_data.py, bias_correction/05_integrate_cloud_data.py, bias_correction/06_integrate_flux_model_data.py)
Removing outliers (bias_correction/07_filter_strong_emission_sources.py, bias_correction/08_remove_outliers.py)
Data Preloading for ML (bias_correction/09_prepare_model_input_data.py)
(Optional) Feature Selection (bias_correction/10_feature_selection.py)
Model Training (e.g., bias_correction/11_train_bias_correction_model.py and its variants)
Filter Optimization (optimize_filter.py)
Visualization (visualization_scripts/vis_bias_corr.py)

Citation

If you use this code or the resulting data, please cite the following preprints:

https://doi.org/10.22541/essoar.174164198.80749970/v1 and https://doi.org/10.22541/essoar.174164203.37422284/v1

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
analyses		analyses
bias_corr_models		bias_corr_models
bias_correction		bias_correction
export		export
filter_models		filter_models
quality_filter		quality_filter
LICENSE.md		LICENSE.md
Mauceri_paths.py		Mauceri_paths.py
README.md		README.md
__init__.py		__init__.py
environment.yml		environment.yml
paths.py		paths.py
run_bias_correction_pipeline.py		run_bias_correction_pipeline.py
run_filter_pipeline.py		run_filter_pipeline.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCO-2/3 Bias Correction and Filtering Pipeline

Overview

License

Data Citations

OCO-2 LiteFiles

Total Carbon Column Observing Network

Flux Inversions

EDGAR

MODIS Cloud Distance

Setup

Running the Pipeline

Processing Scripts Details

Processing steps of data for bias correction and filtering of OCO2/3 data

make SA and calculate SA bias

flag SA on coast lines

add TCCON data to dataset

add clouds

add model to dataset

flag strong emitters

data cleaning

allow for faster data loading

(Optional) perform feature selection to optimize model inputs

train model

Quality Filtering

optimizing filter models (should be run in directory where you want to save optuna trials and plots)

Apply trained bias correction and filtering model

additional plots (optional)

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OCO-2/3 Bias Correction and Filtering Pipeline

Overview

License

Data Citations

OCO-2 LiteFiles

Total Carbon Column Observing Network

Flux Inversions

EDGAR

MODIS Cloud Distance

Setup

Running the Pipeline

Processing Scripts Details

Processing steps of data for bias correction and filtering of OCO2/3 data

make SA and calculate SA bias

flag SA on coast lines

add TCCON data to dataset

add clouds

add model to dataset

flag strong emitters

data cleaning

allow for faster data loading

(Optional) perform feature selection to optimize model inputs

train model

Quality Filtering

optimizing filter models (should be run in directory where you want to save optuna trials and plots)

Apply trained bias correction and filtering model

additional plots (optional)

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages