A machine learning library for the estimation of greenhouse gas baseline timeseries from high-frequency observations.
To run the code, the required dataset must first be created using baseline_setup.py. This collects the relevant meteorology, concentration data and baseline flags. The meteorological data were taken from the EMCWF ERA5 reanalyses, and the concentration from AGAGE.
This step can be ran for all sites and compounds by running setup_all.py.
The models are trained using the dataset described above. There is a model per site per algorithm (neural network MLP and random forest). The final models are saved and can be found in the final models folder. Summary statistics are also available.
This step can be ran for all models by running train_all.py.
The models are tested through quantitative and qualitative evaluation. A chosen subsample of trace species were evaluated, as defined in the configuration file. The model outcomes for this species subsample are saved for each site (e.g. neural network results at Mace Head, Ireland).
This step can be ran for all sites and compounds by running eval_all.py.