cosmo_learn

cosmo-learn is a Python package for learning cosmology using a combination of statistical inference and machine learning methods applied to real and simulated cosmological observables. It supports mock data generation, model training, reconstruction, and comparison across five observational probes — all through a single unified class, CosmoLearn.

See the tutorial notebook cosmo_tutorial.ipynb or the script minimal_example.py for basic usage, and arXiv:2508.20971 [astro-ph.CO] for full details.

Please cite the paper below when using cosmo-learn.

Tested on Linux, Mac and Windows WSL
Requires: python 3.10
Recommended installation (conda): conda env create -f cosmo_learn.yml
Quick install: pip install cosmo-learn

Overview

cosmo-learn provides tools to:

Generate mock cosmological datasets based on a flat $w$CDM input cosmology, with realistic noise drawn from real observational uncertainties.
Train machine learning models (Gaussian Processes, Bayesian Ridge Regression, Artificial Neural Networks) to reconstruct cosmological observables as functions of redshift — without assuming a cosmological model.
Run statistical inference (MCMC, Genetic Algorithm + Fisher matrix) to constrain flat $w$CDM parameters.
Visualize mock data, reconstructions, residuals, posterior samples, and training diagnostics.
Evaluate and compare methods using a suite of quantitative metrics.

Supported observational probes

Key	Observable	Data source
`'CosmicChronometers'`	$H(z)$	Hdz_2020_CConly
`'SuperNovae'`	$\mu(z)$ (distance modulus)	Pantheon+SH0ES
`'BaryonAcousticOscillations'`	$D_V/r_d(z)$	DESI Year 1 (arXiv:2404.03002)
`'RedshiftSpaceDistorsions'`	$f\sigma_8(z)$	Growth_tableII
`'BrightSirens'`	$d_L(z)$	LISA bright siren simulations

Supported methods

Method	Description
`MCMC`	Markov Chain Monte Carlo via `emcee`
`GAFisher`	Genetic Algorithm best-fit + Fisher matrix covariance
`GP`	Gaussian Process Regression (`scikit-learn`)
`BRR`	Bayesian Ridge Regression with polynomial features
`ANN`	Artificial Neural Network via `refann` (PyTorch backend)

Installation

Recommended (conda environment):

conda env create -f cosmo_learn.yml
conda activate cosmo-learn

Quick install via pip:

pip install cosmo-learn

Test the installation:

python minimal_example.py

Quick Start

from cosmo_learn.cosmo_learn import CosmoLearn

# 1. Define input cosmology: [H0, Om0, w0, s8]
# (DESI Year 1 flat wCDM best-fit + Planck s8)
H0, Om0, w0, s8 = 67.74, 0.3095, -0.997, 0.834
cl = CosmoLearn([H0, Om0, w0, s8], seed=14000605)

# 2. Generate mock data for all probes
mock_keys = ['CosmicChronometers', 'SuperNovae', 'BaryonAcousticOscillations',
             'BrightSirens', 'RedshiftSpaceDistorsions']
cl.make_mock(mock_keys=mock_keys)

# 3. Train ML models
cl.train_gp()
cl.train_brr()
cl.init_ann()
cl.train_ann()

# 4. Run MCMC
prior_dict = {'H0_min': 0, 'H0_max': 100, 'Om0_min': 0, 'Om0_max': 1,
              'w0_min': -10, 'w0_max': 10, 's8_min': 0.2, 's8_max': 1.5}
rd_fid_prior = {'mu': 147.46, 'sigma': 0.28}
llprob = lambda x: cl.llprob_wcdm(x, prior_dict=prior_dict, rd_fid_prior=rd_fid_prior)
cl.get_mcmc_samples(nwalkers=15, dres=[0.05, 0.005, 0.01, 0.01, 0.005],
                    llprob=llprob, p0=[70, 0.3, -1, 0.8, 147], nburn=100, nmcmc=2000)

# 5. Visualize
import matplotlib.pyplot as plt

fig, ax = cl.show_mocks(show_input=True)
cl.show_trained_ml(ax=ax, method='GP', label='GP')
cl.show_trained_ml(ax=ax, method='BRR', color='blue', alpha=0.15, hatch='|', label='BRR')
fig.tight_layout()
plt.show()

Core Class: CosmoLearn

from cosmo_learn.cosmo_learn import CosmoLearn

Initialization

cl = CosmoLearn(params, de_model='no pert', rd_fid=147.46, Tcmb0=2.725, seed=None)

Argument	Type	Description
`params`	`list`	Input cosmology `[H0, Om0, w0, s8]`
`de_model`	`str`	Dark energy perturbation model: `'no pert'` (default), `'static'`, or `'dynamic'`
`rd_fid`	`float`	Fiducial sound horizon $r_d$ in Mpc (default: `147.46`)
`Tcmb0`	`float`	CMB temperature in K (default: `2.725`)
`seed`	`int`	Random seed for reproducibility

Mock Data Generation

All mock data generation methods draw Gaussian noise around the true cosmological curve evaluated at the real survey redshifts, using the real observational uncertainties.

Data is automatically split into training (90%) and test (10%) sets, accessible via cl.mock_data[key]['train'] and cl.mock_data[key]['test'], each with sub-keys 'x', 'y', 'yerr'.

`make_mock(mock_keys, pop_model='Pop III', years=5)`

Generate mock data for multiple probes at once.

cl.make_mock(mock_keys=['CosmicChronometers', 'SuperNovae', 'BaryonAcousticOscillations',
                        'BrightSirens', 'RedshiftSpaceDistorsions'])

Argument	Description
`mock_keys`	List of probe keys (see table in Overview)
`pop_model`	LISA population model for bright sirens: `'Pop III'`, `'Delay'`, or `'No Delay'`
`years`	Duration of LISA observations in years (for bright sirens)

Individual generation methods are also available: make_cosmic_chronometers_like(), make_pantheon_plus_like(), make_desi1_like(), make_rsd_like(), make_bright_sirens_mock(years, pop_model).

Learning Methods

Gaussian Process (`GP`)

cl.train_gp(kernel_key='RBF', n_restarts_optimizer=10)

Trains one GP per probe. Available kernels (kernel_key): 'RBF', 'Matern', 'RationalQuadratic', 'ExpSineSquared', 'DotProduct'. The default 'RBF' uses a ConstantKernel * RBF + WhiteKernel combination.

Bayesian Ridge Regression (`BRR`)

cl.train_brr(n_order=3)

Fits a polynomial of degree n_order with Bayesian regularization per probe.

Artificial Neural Network (`ANN`)

cl.init_ann(mid_node=4096, hidden_layer=1, hp_model='rec_1',
            loss_func='L1', iteration=30000)
cl.train_ann()

Uses the refann (PyTorch) backend. init_ann configures the architecture; train_ann runs training and prints elapsed time per probe.

MCMC

llprob = lambda x: cl.llprob_wcdm(x, prior_dict=prior_dict, rd_fid_prior=rd_fid_prior)
cl.get_mcmc_samples(nwalkers, dres, llprob, p0, nburn=100, nmcmc=500)

Runs emcee ensemble sampler. The log-posterior llprob_wcdm uses flat priors on [H0, Om0, w0, s8] and a Gaussian prior on $r_d$. The initial position is refined with a Nelder-Mead optimizer before sampling. Samples are stored in cl.mcmc_samples.

`prior_dict` key	Description
`H0_min/max`	Flat prior bounds on $H_0$
`Om0_min/max`	Flat prior bounds on $\Omega_{m0}$
`w0_min/max`	Flat prior bounds on $w_0$
`s8_min/max`	Flat prior bounds on $S_8$

rd_fid_prior: {'mu': 147.46, 'sigma': 0.28} — Gaussian prior on the sound horizon.

Genetic Algorithm + Fisher (`GA-Fisher`)

fitness_func = lambda x: -2 * llprob(x)
cl.get_gaFisher_samples(fitness_func, prior_ga, llprob=llprob, nsamples=10000)

Finds the best-fit with a genetic algorithm, then approximates the posterior as a multivariate Gaussian using the Fisher (Hessian) information matrix. Samples are stored in cl.gaFisher_samples.

Visualization

Mock data plots

fig, ax = cl.show_mocks(show_input=True)
fig, ax = cl.show_mocks_and_residuals(show_input=True)

ML reconstruction overlay

cl.show_trained_ml(ax=ax, method='GP', label='GP')
cl.show_trained_ml(ax=ax, method='BRR', color='blue', alpha=0.15, hatch='|', label='BRR')
cl.show_trained_ml(ax=ax, method='ANN', color='darkgreen', alpha=0.15, hatch='x', label='ANN')

Parametric reconstruction overlay

cl.show_bestfit_curve(ax=ax, method='MCMC', label='MCMC', color='pink')
cl.show_bestfit_curve(ax=ax, method='GAFisher', color='orange', alpha=0.15, label='GA-Fisher')

Posterior corner plots

fig_corner = cl.show_param_posterior(method='MCMC')
cl.show_param_posterior(method='GAFisher', fig=fig_corner, color='blue', show_truth=True)

ANN training loss

fig, ax = cl.show_ann_loss()

Metrics and Scoring

metrics.py provides functions to quantitatively compare reconstructed observables against test data:

Function	Description
`D0(Qi, σi, Qj, σj)`	Normalized absolute deviation (target: ≈ 0.5)
`D1(Qi, σi, Qj, σj)`	Tension beyond quadrature (target: high)
`D2(Qi, σi, Qj, σj)`	Combined absolute + excess tension (target: ≈ 1)
`DWstat(residuals)`	Durbin-Watson statistic (target: ≈ 2, no serial correlation)
`Ch2_H0(H0, errH0, refH0)`	$\chi^2$-like $H_0$ tension metric
`get_metrics(bgData, ptData, ...)`	Combined D0/D1/D2/DW for CC + RSD jointly

rec_olympics.py — OlympicsMaster class scores methods head-to-head across multiple metrics by awarding points to the best-performing method on each metric.

Module Reference

Module	Contents
`cosmo_learn/cosmo_learn.py`	`CosmoLearn` class, mock data, likelihoods, training, visualization
`cosmo_learn/metrics.py`	`D0`, `D1`, `D2`, `DWstat`, `Ch2_H0`, `get_metrics`
`cosmo_learn/rec_olympics.py`	`OlympicsMaster` scoring class
`cosmo_learn/LISA_bright.py`	LISA bright siren mock data generator (`generate`)

Upcoming

New data sets
New methods
New models

How to cite

@article{Bernardo:2025pua,
    author = "Bernardo, Reginald Christian and Grand{\'o}n, Daniela and Levi Said, Jackson and C{\'a}rdenas, V{\'\i}ctor H. and Belinario, Gene Carlo and Reyes, Reinabelle",
    title = "{Cosmo-Learn: code for learning cosmology using different methods and mock data}",
    eprint = "2508.20971",
    archivePrefix = "arXiv",
    primaryClass = "astro-ph.CO",
    month = "8",
    year = "2025"
}

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
build/lib/cosmo_learn		build/lib/cosmo_learn
cosmo_learn.egg-info		cosmo_learn.egg-info
cosmo_learn		cosmo_learn
demo		demo
dist		dist
github.com/PantheonPlusSH0ES/DataRelease/raw/main/Pantheon%2B_Data/4_DISTANCES_AND_COVAR		github.com/PantheonPlusSH0ES/DataRelease/raw/main/Pantheon%2B_Data/4_DISTANCES_AND_COVAR
raw.githubusercontent.com/reggiebernardo/datasets		raw.githubusercontent.com/reggiebernardo/datasets
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
cosmo_learn.yml		cosmo_learn.yml
cosmo_tutorial.ipynb		cosmo_tutorial.ipynb
installation.txt		installation.txt
minimal_example.py		minimal_example.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cosmo_learn

Table of Contents

Overview

Supported observational probes

Supported methods

Installation

Quick Start

Core Class: CosmoLearn

Initialization

Mock Data Generation

`make_mock(mock_keys, pop_model='Pop III', years=5)`

Learning Methods

Gaussian Process (`GP`)

Bayesian Ridge Regression (`BRR`)

Artificial Neural Network (`ANN`)

MCMC

Genetic Algorithm + Fisher (`GA-Fisher`)

Visualization

Mock data plots

ML reconstruction overlay

Parametric reconstruction overlay

Posterior corner plots

ANN training loss

Metrics and Scoring

Module Reference

Upcoming

How to cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cosmo_learn

Table of Contents

Overview

Supported observational probes

Supported methods

Installation

Quick Start

Core Class: CosmoLearn

Initialization

Mock Data Generation

make_mock(mock_keys, pop_model='Pop III', years=5)

Learning Methods

Gaussian Process (GP)

Bayesian Ridge Regression (BRR)

Artificial Neural Network (ANN)

MCMC

Genetic Algorithm + Fisher (GA-Fisher)

Visualization

Mock data plots

ML reconstruction overlay

Parametric reconstruction overlay

Posterior corner plots

ANN training loss

Metrics and Scoring

Module Reference

Upcoming

How to cite

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`make_mock(mock_keys, pop_model='Pop III', years=5)`

Gaussian Process (`GP`)

Bayesian Ridge Regression (`BRR`)

Artificial Neural Network (`ANN`)

Genetic Algorithm + Fisher (`GA-Fisher`)

Packages