This project implements and explores the effect of applying a new bias correction and quality filtering approach to increase the accuracy of atmospheric CO2 measurements derived from the Orbiting Carbon Observatory-2 (OCO-2) satellite. This is not an official OCO-2 data product. For details on the approach, please refer to:
https://doi.org/10.22541/essoar.174164198.80749970/v1 and https://doi.org/10.22541/essoar.174164203.37422284/v1
The source code is licensed under the terms found in the LICENSE file.
This work utilizes several datasets which we cite below:
https://doi.org/10.5067/8E4VLCK16O6Q
https://doi.org/10.14291/TCCON.GGG2020
CarbonTracker CT-NRT. v2022-1 Jacobson, A., Schuldt, K., Tans, P., Andrews, A., Miller, J., & Oda, T. (2023). CarbonTracker CT2022.
MACC v21r1 https://doi.org/10.1029/2010JD013887
LoFI m2ccv1bsim https://doi.org/10.5194/acp-21-9609-2021
UnivEd v5.2 https://doi.org/10.5194/acp-9-2619-2009
v6.0 http://data.europa.eu/89h/97a67d67-c62e-4826-b873-9d972c4f670b
https://doi.org/10.5281/zenodo.4008765
To set up the required Python environment, use the provided environment.yml file with Conda:
conda env create -f environment.yml
conda activate bias_filtData paths are managed via paths.py. For external data, you may need to set the OCO_DATA_BASE environment variable to point to your data directory. See paths.py for details.
The main way to execute the processing pipeline is by using the run_bias_correction_pipeline.py script.
This script manages the execution of individual processing steps in the correct order and can resume from the last completed step.
To run the pipeline starting from the beginning or resuming from the last completed step:
python run_bias_correction_pipeline.pyTo clean the pipeline status file (forcing the pipeline to start from scratch on the next run):
python run_bias_correction_pipeline.py --clean-statusThe following scripts constitute the processing pipeline and are called by run_bias_correction_pipeline.py. They are generally intended to be run in the order listed below if run manually.
#merge OCO2 Lite files to Parquet bias_correction/01_create_initial_parquet.py
- Converts OCO-2/3 Lite files (netCDF format) to parquet files
- Removes unnecessary variables and cleans up naming conventions
- Optimizes for performance by removing redundant data
- Handles different data dimensions and formats them into a pandas DataFrame
bias_correction/02_create_small_areas.py
- Creates Small Area (SA) groupings of soundings
- Calculates SA biases in XCO2 retrievals
- Helps identify systematic biases in small geographic regions
bias_correction/03_flag_coastal_soundings.py
- Flags small areas that cross from land to ocean
- Identifies coastal regions where land-water transitions occur
- Helps handle special cases in bias correction near coastlines
bias_correction/04_integrate_tccon_data.py
- Adds TCCON (Total Carbon Column Observing Network) data to the dataset
- Matches OCO-2/3 soundings with nearby TCCON stations
- Calculates distances to TCCON stations
- Adds TCCON XCO2 values and station names to the dataset
bias_correction/05_integrate_cloud_data.py
- Adds cloud information to the dataset
- Includes cloud distance and cloud fraction data
- Helps identify and filter out cloud-contaminated soundings
bias_correction/06_integrate_flux_model_data.py
- Adds model data (like GEOS-Chem) to the dataset
- Matches model output with OCO-2/3 soundings in time and space
- Provides additional context for bias correction
bias_correction/07_filter_strong_emission_sources.py
bias_correction/08_remove_outliers.py
- Performs initial data cleaning and quality filtering
- Applies various quality flags based on retrieval parameters
- Removes problematic soundings (e.g., snow-covered areas, high aerosol loading)
- Has different filtering criteria for land and ocean soundings
bias_correction/09_prepare_model_input_data.py
- Creates preloaded data files for faster processing
- Optimizes data loading for subsequent analysis
- Reduces memory usage and processing time
bias_correction/10_feature_selection.py
- Analyzes feature importance and selects an optimal set for model training
- Helps improve model performance and reduce complexity
bias_correction/11_train_bias_correction_model.py bias_correction/12_train_bias_correction_model_spatially_weighted.py bias_correction/13_train_bias_correction_model_kfold_validation.py bias_correction/14_train_bias_correction_model_spatially_weighted_kfold_validation.py
- Trains machine learning models for bias correction
- Uses Random Forest and other ML algorithms
- Corrects systematic biases in XCO2 retrievals
- Handles both TCCON and Small Area biases
run_filter_pipeline.py
- Optionally runs run_bias_correction_pipeline.py
- Optionally runs optimize_filter.py
- Trains and applies ML subfilter and uncertainty filter as described in Part 2, for a given config.
optimizing filter models (should be run in directory where you want to save optuna trials and plots)
optimize_filter.py
- Optimizes quality filtering parameters
- Uses Optuna for hyperparameter optimization
- Balances data quality and throughput
- Has separate optimization for land and ocean soundings
bias_correction/15_Export_Lite_Files.py
visualization_scripts/vis_bias_corr.py
- Creates visualization plots of bias correction results
- Compares corrected data with TCCON measurements
- Shows spatial patterns of biases
- Analyzes performance at land-water crossings
viz_filter_notebook.ipynb -plots/figures for Part 2.
The pipeline follows a logical flow:
- Data preparation (bias_correction/01_create_initial_parquet.py, bias_correction/02_create_small_areas.py, bias_correction/03_flag_coastal_soundings.py)
- Integration of external datasets (bias_correction/04_integrate_tccon_data.py, bias_correction/05_integrate_cloud_data.py, bias_correction/06_integrate_flux_model_data.py)
- Removing outliers (bias_correction/07_filter_strong_emission_sources.py, bias_correction/08_remove_outliers.py)
- Data Preloading for ML (bias_correction/09_prepare_model_input_data.py)
- (Optional) Feature Selection (bias_correction/10_feature_selection.py)
- Model Training (e.g., bias_correction/11_train_bias_correction_model.py and its variants)
- Filter Optimization (optimize_filter.py)
- Visualization (visualization_scripts/vis_bias_corr.py)
If you use this code or the resulting data, please cite the following preprints:
https://doi.org/10.22541/essoar.174164198.80749970/v1 and https://doi.org/10.22541/essoar.174164203.37422284/v1