cosmo-learn is a Python package for learning cosmology using a combination of statistical inference and machine learning methods applied to real and simulated cosmological observables. It supports mock data generation, model training, reconstruction, and comparison across five observational probes — all through a single unified class, CosmoLearn.
See the tutorial notebook cosmo_tutorial.ipynb or the script minimal_example.py for basic usage, and arXiv:2508.20971 [astro-ph.CO] for full details.
Please cite the paper below when using cosmo-learn.
Tested on Linux, Mac and Windows WSL
Requires: python 3.10
Recommended installation (conda): conda env create -f cosmo_learn.yml
Quick install: pip install cosmo-learn
cosmo-learn provides tools to:
- Generate mock cosmological datasets based on a flat $w$CDM input cosmology, with realistic noise drawn from real observational uncertainties.
- Train machine learning models (Gaussian Processes, Bayesian Ridge Regression, Artificial Neural Networks) to reconstruct cosmological observables as functions of redshift — without assuming a cosmological model.
- Run statistical inference (MCMC, Genetic Algorithm + Fisher matrix) to constrain flat $w$CDM parameters.
- Visualize mock data, reconstructions, residuals, posterior samples, and training diagnostics.
- Evaluate and compare methods using a suite of quantitative metrics.
| Key | Observable | Data source |
|---|---|---|
'CosmicChronometers' |
Hdz_2020_CConly | |
'SuperNovae' |
|
Pantheon+SH0ES |
'BaryonAcousticOscillations' |
DESI Year 1 (arXiv:2404.03002) | |
'RedshiftSpaceDistorsions' |
Growth_tableII | |
'BrightSirens' |
LISA bright siren simulations |
| Method | Description |
|---|---|
MCMC |
Markov Chain Monte Carlo via emcee |
GAFisher |
Genetic Algorithm best-fit + Fisher matrix covariance |
GP |
Gaussian Process Regression (scikit-learn) |
BRR |
Bayesian Ridge Regression with polynomial features |
ANN |
Artificial Neural Network via refann (PyTorch backend) |
Recommended (conda environment):
conda env create -f cosmo_learn.yml
conda activate cosmo-learnQuick install via pip:
pip install cosmo-learnTest the installation:
python minimal_example.pyfrom cosmo_learn.cosmo_learn import CosmoLearn
# 1. Define input cosmology: [H0, Om0, w0, s8]
# (DESI Year 1 flat wCDM best-fit + Planck s8)
H0, Om0, w0, s8 = 67.74, 0.3095, -0.997, 0.834
cl = CosmoLearn([H0, Om0, w0, s8], seed=14000605)
# 2. Generate mock data for all probes
mock_keys = ['CosmicChronometers', 'SuperNovae', 'BaryonAcousticOscillations',
'BrightSirens', 'RedshiftSpaceDistorsions']
cl.make_mock(mock_keys=mock_keys)
# 3. Train ML models
cl.train_gp()
cl.train_brr()
cl.init_ann()
cl.train_ann()
# 4. Run MCMC
prior_dict = {'H0_min': 0, 'H0_max': 100, 'Om0_min': 0, 'Om0_max': 1,
'w0_min': -10, 'w0_max': 10, 's8_min': 0.2, 's8_max': 1.5}
rd_fid_prior = {'mu': 147.46, 'sigma': 0.28}
llprob = lambda x: cl.llprob_wcdm(x, prior_dict=prior_dict, rd_fid_prior=rd_fid_prior)
cl.get_mcmc_samples(nwalkers=15, dres=[0.05, 0.005, 0.01, 0.01, 0.005],
llprob=llprob, p0=[70, 0.3, -1, 0.8, 147], nburn=100, nmcmc=2000)
# 5. Visualize
import matplotlib.pyplot as plt
fig, ax = cl.show_mocks(show_input=True)
cl.show_trained_ml(ax=ax, method='GP', label='GP')
cl.show_trained_ml(ax=ax, method='BRR', color='blue', alpha=0.15, hatch='|', label='BRR')
fig.tight_layout()
plt.show()from cosmo_learn.cosmo_learn import CosmoLearncl = CosmoLearn(params, de_model='no pert', rd_fid=147.46, Tcmb0=2.725, seed=None)| Argument | Type | Description |
|---|---|---|
params |
list |
Input cosmology [H0, Om0, w0, s8]
|
de_model |
str |
Dark energy perturbation model: 'no pert' (default), 'static', or 'dynamic'
|
rd_fid |
float |
Fiducial sound horizon 147.46) |
Tcmb0 |
float |
CMB temperature in K (default: 2.725) |
seed |
int |
Random seed for reproducibility |
All mock data generation methods draw Gaussian noise around the true cosmological curve evaluated at the real survey redshifts, using the real observational uncertainties.
Data is automatically split into training (90%) and test (10%) sets, accessible via cl.mock_data[key]['train'] and cl.mock_data[key]['test'], each with sub-keys 'x', 'y', 'yerr'.
Generate mock data for multiple probes at once.
cl.make_mock(mock_keys=['CosmicChronometers', 'SuperNovae', 'BaryonAcousticOscillations',
'BrightSirens', 'RedshiftSpaceDistorsions'])| Argument | Description |
|---|---|
mock_keys |
List of probe keys (see table in Overview) |
pop_model |
LISA population model for bright sirens: 'Pop III', 'Delay', or 'No Delay' |
years |
Duration of LISA observations in years (for bright sirens) |
Individual generation methods are also available: make_cosmic_chronometers_like(), make_pantheon_plus_like(), make_desi1_like(), make_rsd_like(), make_bright_sirens_mock(years, pop_model).
cl.train_gp(kernel_key='RBF', n_restarts_optimizer=10)Trains one GP per probe. Available kernels (kernel_key): 'RBF', 'Matern', 'RationalQuadratic', 'ExpSineSquared', 'DotProduct'. The default 'RBF' uses a ConstantKernel * RBF + WhiteKernel combination.
cl.train_brr(n_order=3)Fits a polynomial of degree n_order with Bayesian regularization per probe.
cl.init_ann(mid_node=4096, hidden_layer=1, hp_model='rec_1',
loss_func='L1', iteration=30000)
cl.train_ann()Uses the refann (PyTorch) backend. init_ann configures the architecture; train_ann runs training and prints elapsed time per probe.
llprob = lambda x: cl.llprob_wcdm(x, prior_dict=prior_dict, rd_fid_prior=rd_fid_prior)
cl.get_mcmc_samples(nwalkers, dres, llprob, p0, nburn=100, nmcmc=500)Runs emcee ensemble sampler. The log-posterior llprob_wcdm uses flat priors on [H0, Om0, w0, s8] and a Gaussian prior on cl.mcmc_samples.
prior_dict key |
Description |
|---|---|
H0_min/max |
Flat prior bounds on |
Om0_min/max |
Flat prior bounds on |
w0_min/max |
Flat prior bounds on |
s8_min/max |
Flat prior bounds on |
rd_fid_prior: {'mu': 147.46, 'sigma': 0.28} — Gaussian prior on the sound horizon.
fitness_func = lambda x: -2 * llprob(x)
cl.get_gaFisher_samples(fitness_func, prior_ga, llprob=llprob, nsamples=10000)Finds the best-fit with a genetic algorithm, then approximates the posterior as a multivariate Gaussian using the Fisher (Hessian) information matrix. Samples are stored in cl.gaFisher_samples.
fig, ax = cl.show_mocks(show_input=True)
fig, ax = cl.show_mocks_and_residuals(show_input=True)cl.show_trained_ml(ax=ax, method='GP', label='GP')
cl.show_trained_ml(ax=ax, method='BRR', color='blue', alpha=0.15, hatch='|', label='BRR')
cl.show_trained_ml(ax=ax, method='ANN', color='darkgreen', alpha=0.15, hatch='x', label='ANN')cl.show_bestfit_curve(ax=ax, method='MCMC', label='MCMC', color='pink')
cl.show_bestfit_curve(ax=ax, method='GAFisher', color='orange', alpha=0.15, label='GA-Fisher')fig_corner = cl.show_param_posterior(method='MCMC')
cl.show_param_posterior(method='GAFisher', fig=fig_corner, color='blue', show_truth=True)fig, ax = cl.show_ann_loss()metrics.py provides functions to quantitatively compare reconstructed observables against test data:
| Function | Description |
|---|---|
D0(Qi, σi, Qj, σj) |
Normalized absolute deviation (target: ≈ 0.5) |
D1(Qi, σi, Qj, σj) |
Tension beyond quadrature (target: high) |
D2(Qi, σi, Qj, σj) |
Combined absolute + excess tension (target: ≈ 1) |
DWstat(residuals) |
Durbin-Watson statistic (target: ≈ 2, no serial correlation) |
Ch2_H0(H0, errH0, refH0) |
|
get_metrics(bgData, ptData, ...) |
Combined D0/D1/D2/DW for CC + RSD jointly |
rec_olympics.py — OlympicsMaster class scores methods head-to-head across multiple metrics by awarding points to the best-performing method on each metric.
| Module | Contents |
|---|---|
cosmo_learn/cosmo_learn.py |
CosmoLearn class, mock data, likelihoods, training, visualization |
cosmo_learn/metrics.py |
D0, D1, D2, DWstat, Ch2_H0, get_metrics |
cosmo_learn/rec_olympics.py |
OlympicsMaster scoring class |
cosmo_learn/LISA_bright.py |
LISA bright siren mock data generator (generate) |
- New data sets
- New methods
- New models
@article{Bernardo:2025pua,
author = "Bernardo, Reginald Christian and Grand{\'o}n, Daniela and Levi Said, Jackson and C{\'a}rdenas, V{\'\i}ctor H. and Belinario, Gene Carlo and Reyes, Reinabelle",
title = "{Cosmo-Learn: code for learning cosmology using different methods and mock data}",
eprint = "2508.20971",
archivePrefix = "arXiv",
primaryClass = "astro-ph.CO",
month = "8",
year = "2025"
}