Open source by Santander AI Lab — machine learning research code for causal inference: comparing competing structural causal models (SCMs) on the German Credit dataset.
Open source by Santander AI Lab. This repository is the machine learning research code for causal perception — comparing competing structural causal models (SCMs) through their interventional and counterfactual distributions, applied to fair credit decisions on the German Credit dataset.
It implements a linear Additive Noise Model with a configurable causal DAG
(Chiappa, 2019), Pearl's do-operator and counterfactual (abduction →
intervention → prediction) inference, three 1-D distribution distances
(Wasserstein-2, KL via KDE, Total Variation), bootstrap confidence intervals,
and a fair-decisions experiment (demographic-parity / equal-opportunity gaps,
ROC/PR, decision disagreement).
This is research code accompanying a (forthcoming) academic paper. Copyright is held by the author, José M. Álvarez; the open-source release is distributed and maintained by Santander AI Lab with the author's consent (see
NOTICE).
- Python 3.10, 3.11, or 3.12
- Core:
numpy,pandas,scipy,scikit-learn - Plotting experiments additionally require
matplotlib(installed via thevizextra) - Internet access on first run only, to fetch the German Credit dataset from OpenML
# Core engine only
pip install -e .
# With plotting support for the experiment scripts
pip install -e ".[viz]"
# Development (tests, linters, type checker)
pip install -e ".[dev,viz]"The German Credit (Statlog) dataset is not redistributed in this
repository. It is fetched from OpenML
(credit-g) and cached under data/ automatically on first use — both
src.data_prep.load_data() and every experiment script call it transparently.
To (re)generate the CSV splits explicitly:
python -m src.data_prepSee data/README.md for the variable mapping (Chiappa's
DAG), provenance, and citation.
Each experiment is runnable as a module from the repository root:
python -m src.run_fair_decisions # accuracy + fairness (DP/EO, ROC/PR)
python -m src.run_parametrical # parametrical perception (Δβ on A→Y)
python -m src.run_causal_dp_sweep # DP gap vs decision threshold
python -m src.run_structural_age # alternative disagreement on C→Y (age)
python -m src.run_structural_nonlinear # nonlinear (GBM) robustness check
python -m src.run_bootstrap_cis_all # 95% bootstrap CIs for all experiments
python -m src.plot_structural_combined # combined structural-perception figureMinimal programmatic example:
from src.data_prep import load_data
from src.linear_anm import LinearANM, CHIAPPA_FULL, CHIAPPA_NO_AY
from src.perception import fit_scms, run_perception
train, test = load_data()
scm1, scm2 = fit_scms(train, CHIAPPA_FULL, CHIAPPA_NO_AY)
result = run_perception(scm1, scm2, test, variable="A", values=[0, 1])
print(result["aggregated"]) # {"W2": ..., "KL": ..., "TV": ...}src/
data_prep.py # OpenML fetch + Chiappa variable mapping + load_data()
linear_anm.py # Linear ANM: DAGs, fit, do-operator, counterfactuals
distances.py # W2, KL (KDE), Total Variation between 1-D samples
perception.py # competing-SCM engine + bootstrap CIs
run_*.py # experiment entrypoints (require the `viz` extra)
plot_structural_combined.py
tests/ # pytest suite (OpenML fetch mocked — no network)
data/ # generated CSVs (git-ignored) + data/README.md
If you use this code, please cite the accompanying paper and the dataset (see
CITATION.cff and data/README.md). The
paper is forthcoming; the citation file will be updated with the venue and DOI
once published.
Contributions are welcome — see CONTRIBUTING.md. All
contributors must sign the CLA (handled automatically by the CLA Assistant bot)
and follow the Code of Conduct.
Please report vulnerabilities privately as described in
.github/SECURITY.md. Do not open public issues for
security reports.
Licensed under the Apache License 2.0. See NOTICE for
copyright and third-party data attribution.
Part of Santander AI Open Source — open source by Santander AI Lab.