OneEHR is a unified Python platform for longitudinal EHR experiments across ML, DL, and LLM agents. It provides shared infrastructure for preprocessing, modeling, testing, and analysis on one shared run contract — the first toolkit bridging classical machine learning, deep learning, and agentic AI for clinical prediction.
- 38 model architectures — tabular ML, recurrent/non-recurrent DL, irregular-time, KG-enhanced, and survival models
- Unified ML/DL/LLM comparison — all predictions in one
predictions.parquetwith bootstrap CI and statistical tests - Dataset converters — built-in support for MIMIC-III, MIMIC-IV, and eICU
- Medical code ontologies — ICD-9/10 mapping, CCS grouping, ATC drug hierarchy
- Survival analysis — DeepSurv, DeepHit, concordance index, Kaplan-Meier visualization
- Fairness & interpretability — demographic parity, equalized odds, SHAP, LIME, integrated gradients, attention visualization
- Publication-quality figures — ROC, PR, calibration, DCA, forest plots, KM curves with Nature/Lancet style presets
- Reproducibility by design — single TOML config = complete experiment specification
oneehr preprocess --config experiment.toml # Bin features, split patients
oneehr train --config experiment.toml # Train ML/DL models
oneehr test --config experiment.toml # Evaluate on test set
oneehr analyze --config experiment.toml # Cross-system comparison
oneehr plot --config experiment.toml # Publication figuresAll commands operate on the same run directory under {output.root}/{output.run_name}/.
OneEHR requires Python 3.12+.
pip install oneehr
# Or from source:
uv venv .venv --python 3.12
uv pip install -e .
oneehr --helpUse the bundled TJH COVID-19 ICU example:
# Convert source data (only needed once)
python examples/tjh/convert.py
# Run the full pipeline
oneehr preprocess --config examples/tjh/mortality_patient.toml
oneehr train --config examples/tjh/mortality_patient.toml
oneehr test --config examples/tjh/mortality_patient.toml
oneehr analyze --config examples/tjh/mortality_patient.tomlOr use the Python API:
import oneehr
config = oneehr.load_config("examples/tjh/mortality_patient.toml")
oneehr.preprocess(config)
oneehr.train(config)
oneehr.test(config)
oneehr.analyze(config)Convert standard clinical datasets into OneEHR's three-table format:
# MIMIC-III
oneehr convert --dataset mimic3 --raw-dir /path/to/mimic3 --output-dir data/mimic3/ --task mortality
# MIMIC-IV
oneehr convert --dataset mimic4 --raw-dir /path/to/mimic4 --output-dir data/mimic4/ --task mortality
# eICU
oneehr convert --dataset eicu --raw-dir /path/to/eicu --output-dir data/eicu/ --task mortalityEach converter produces labels for mortality, readmission, and length-of-stay tasks.
OneEHR ships 38 model architectures:
| Category | Models |
|---|---|
| Tabular ML | XGBoost, CatBoost, Random Forest, Decision Tree, GBDT, Logistic Regression |
| Recurrent | GRU, LSTM, RNN, GRU-D, Dipole, HiTANet, M3Care, PAI |
| Non-recurrent | CNN, TCN, Transformer, SAnD, MLP, Deepr, EHR-Mamba, Jamba, LSAN |
| Irregular-time | mTAND, Raindrop, ContiFormer, TECO |
| EHR-specialised | AdaCare, StageNet, RETAIN, ConCare, GRASP, MCGRU, DrAgent, PRISM, SAFARI |
| KG-enhanced | GraphCare, KerPrint, ProtoEHR |
| Survival | DeepSurv, DeepHit |
Models with static branches (ConCare, GRASP, MCGRU, DrAgent, PRISM, SAFARI, TECO) automatically use patient-level static features when static.csv is provided.
| Task | Config | Description |
|---|---|---|
| Binary classification | kind = "binary" |
Mortality, readmission, etc. |
| Multiclass | kind = "multiclass" |
Phenotyping, diagnosis groups |
| Regression | kind = "regression" |
Length of stay, lab value prediction |
| Survival | kind = "survival" |
Time-to-event with censoring |
| Multi-label | kind = "multilabel" |
ICD coding, multi-diagnosis |
from oneehr.medcode import ICD9, ICD10, CodeMapper, CCSGrouper, ATCHierarchy
# ICD code utilities
ICD9.chapter("401.9") # → "Circulatory system"
ICD10.category("I10.0") # → "I10"
# Aggregate codes by ontology for dimensionality reduction
mapper = CodeMapper()
mapper.add_icd_chapter_mapping(version=9)
mapped_events = mapper.apply(events_df)OneEHR uses TOML as the experiment contract:
[dataset]— input table paths (dynamic,static,label)[preprocess]— binning, feature engineering, preprocessing pipeline[task]— task kind and prediction mode (patientortime)[split]— patient-level train/val/test splitting[[models]]— model selection with per-modelparams[trainer]— DL training config (mixed precision, LR schedulers, early stopping)[[systems]]— LLM/agent system definitions[output]— run root and run name
| Tutorial | Description |
|---|---|
| 01 Quickstart | End-to-end TJH mortality prediction |
| 02 Custom Dataset | Bring your own data + medical code mapping |
| 03 Model Comparison | ML vs DL with bootstrap CI and statistical tests |
| 04 Fairness & Explainability | Bias detection + feature importance |
| 05 Survival Analysis | DeepSurv, C-index, Kaplan-Meier curves |
Full documentation: medxlab.github.io/OneEHR
- Installation
- Quickstart
- Data Model
- Core Workflows
- CLI Reference
- Configuration
- Models
- Artifacts
- Dataset Converters
- Medical Codes
Build docs locally:
uv pip install -e ".[docs]"
uv run mkdocs serveSee CONTRIBUTING.md for development setup and guidelines.
pytest tests/ -v # 114 tests
oneehr preprocess --config examples/tjh/mortality_patient.toml # End-to-end