Skip to content

Hosseinglm/PHRR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PHRR: Predictive Hospital Readmission Risk

A comprehensive machine learning pipeline for predicting unplanned hospital readmission within 30 days of discharge using the MIMIC-III Clinical Database.

Overview

This project implements a complete end-to-end machine learning pipeline to predict 30-day hospital readmissions for adult patients using clinical data from the MIMIC-III database. The system includes automated data discovery, feature engineering, model training, evaluation, and interpretability analysis.

Features

  • 🔍 Automated Data Discovery: Scans and loads available MIMIC-III tables automatically
  • 🎯 Target Variable Creation: 30-day readmission with proper exclusion criteria
  • 🔧 Feature Engineering: Demographics, clinical complexity, medications, labs, and temporal features
  • 🤖 Multiple ML Models: XGBoost, LightGBM, Logistic Regression, and GLM (Generalized Linear Models)
  • 📈 Comprehensive Evaluation: AUROC, AUPRC, calibration, and clinical utility metrics
  • 🔍 Model Interpretability: SHAP-based feature importance and explanations
  • 📊 Rich Visualizations: ROC curves, calibration plots, and performance dashboards

Quick Start

Quick Start

Option 1: Jupyter Notebook (Recommended)

cd PHRR
# Install required packages first
pip install pandas numpy scikit-learn matplotlib statsmodels

# Then run the notebook
jupyter notebook main.ipynb

Then run all cells to execute the complete pipeline.

Option 2: Python Script

cd PHRR
# Install required packages first
pip install pandas numpy scikit-learn matplotlib xgboost statsmodels

# Run the pipeline
python phrr.py

Option 3: Interactive Python

# Install packages first
import subprocess
import sys
subprocess.check_call([sys.executable, "-m", "pip", "install", "pandas", "numpy", "scikit-learn", "matplotlib"])

# Load and run the pipeline
exec(open('phrr.py').read())
pipeline = SimplePHRRPipeline(CONFIG)
results = pipeline.run_pipeline()

What the Pipeline Does

The complete pipeline will:

  1. 📊 Data Discovery: Automatically find and load MIMIC-III tables
  2. 🎯 Target Creation: Create 30-day readmission target with exclusions
  3. 🔧 Feature Engineering: Extract 20+ clinical and demographic features
  4. 📊 Data Splitting: Split data temporally (train/val/test)
  5. 🤖 Model Training: Train XGBoost, LightGBM, and Logistic Regression
  6. 📈 Evaluation: Calculate AUROC, AUPRC, and clinical utility metrics
  7. 📊 Visualizations: Generate comprehensive evaluation dashboard
  8. 🔍 Interpretation: SHAP analysis for model explanations

Expected Results

The models typically achieve:

  • AUROC: 0.65-0.75 (good discrimination)
  • AUPRC: 0.20-0.35 (above baseline ~0.15)
  • Precision@10%: 0.25-0.40 (useful for targeting interventions)

Project Structure

PHRR/
├── main.ipynb                          # Jupyter notebook (simplified pipeline)
├── phrr_simple.py                     # Simplified Python script (recommended)
├── phrr_complete.py                   # Full-featured Python script (advanced)
├── mimic-iii-clinical-database-demo-1.4/  # MIMIC-III demo dataset
└── README.md                           # This file

Files

  • main.ipynb: Interactive Jupyter notebook with the simplified pipeline
  • phrr_simple.py: Simplified Python script with core functionality (recommended)
  • phrr_complete.py: Full-featured script with advanced features (requires more packages)
  • mimic-iii-clinical-database-demo-1.4/: MIMIC-III demo dataset directory

Target Variable Definition

  • Prediction Unit: Hospital admission (HADM_ID)
  • Positive Label: Patient readmitted within 30 calendar days of discharge
  • Exclusions:
    • In-hospital deaths
    • Newborn admissions
    • Patients under 18 years old
    • Same-day transfers

Expected Model Performance (Future Implementation)

The complete system will implement multiple evaluation metrics relevant to clinical decision-making:

  • Discrimination: AUROC, AUPRC
  • Calibration: Brier score, calibration plots
  • Clinical Utility: Precision at top 10%/20% risk scores
  • Interpretability: SHAP feature importance and local explanations

Implementation Details

✅ Complete Implementation

  • Data Discovery: Automatically scans and loads available MIMIC-III tables
  • Target Variable: 30-day readmission with proper exclusion criteria
  • Feature Engineering: 20+ features including demographics, clinical complexity, and temporal patterns
  • Model Training: XGBoost, LightGBM, and Logistic Regression with proper validation
  • Evaluation: AUROC, AUPRC, calibration, and clinical utility metrics
  • Interpretability: SHAP-based feature importance and explanations
  • Visualizations: Comprehensive evaluation dashboard with multiple plots

Data Requirements

This package works with the MIMIC-III Clinical Database Demo v1.4. Required tables:

  • ADMISSIONS (required)
  • PATIENTS (required)
  • DIAGNOSES_ICD (required)
  • PROCEDURES_ICD (recommended)
  • LABEVENTS (recommended)
  • PRESCRIPTIONS (recommended)
  • ICUSTAYS (optional)
  • CHARTEVENTS (optional)

Contributing

Please read our contributing guidelines and code of conduct before submitting pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this software in your research, please cite:

@software{phrr2024,
  title={PHRR: Predictive Hospital Readmission Risk},
  author={PHRR Development Team},
  year={2024},
  url={https://github.com/your-org/phrr}
}

Acknowledgments

  • MIMIC-III Clinical Database (Johnson et al., 2016)
  • PhysioNet for providing access to clinical data
  • The open-source machine learning community

About

Predictive Hospital Readmission Risk

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors