This project implements a comprehensive machine learning pipeline for automatic classification of electrocardiogram (ECG) signals into different arrhythmia types based on the AAMI (Association for the Advancement of Medical Instrumentation) standards. The system combines multiple feature extraction techniques and ensemble learning methods to achieve high classification accuracy.
- Advanced Feature Extraction: Wavelet transforms, Higher Order Statistics (HOS), RR intervals, and morphological features
- Multiple Classifiers: Support Vector Machines (SVM), Random Forest, LightGBM, and ensemble methods
- Class Imbalance Handling: SMOTE, ADASYN, and class weighting techniques
- Comprehensive Evaluation: AAMI-standard performance metrics and visualizations
- Web Interface: Streamlit-based web application for easy interaction
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Random Forest | 86.66% | 0.96 | 0.87 | 0.91 |
| LightGBM | 81.49% | 0.96 | 0.81 | 0.88 |
| Advanced Ensemble | 94.22% | 0.96 | 0.90 | 0.93 |
The system classifies beats into five superclasses according to AAMI recommendations:
| SuperClass | Included Beat Types | Description |
|---|---|---|
| N (Normal) | N, L, R | Normal beats and bundle branch blocks |
| SVEB (Supraventricular) | A, a, J, S, e, j | Supraventricular ectopic beats |
| VEB (Ventricular) | V, E | Ventricular ectopic beats |
| F (Fusion) | F | Fusion beats |
| Q (Unknown) | /, f, Q | Unknown and unclassifiable beats |
- Python 3.8+
- pip package manager
pip install -r requirements.txtstreamlit==1.36.0
numpy==1.26.4
scipy==1.13.1
pywavelets==1.6.0
biosppy==0.7.2
scikit-learn==1.5.0
joblib==1.4.2
matplotlib==3.8.4
pandas==2.2.2
lightgbm==4.1.0
wfdb==4.3.0
imbalanced-learn==0.12.0
tqdm==4.66.0The MIT-BIH Arrhythmia Database can be obtained from:
- Kaggle: MIT-BIH Arrhythmia Database
- PhysioNet: Using WFDB tools:
rsync -Cavz physionet.org::mitdb /path/to/save/mitdb
# Initialize data loader
data_loader = ECGDataLoader("/path/to/mit-bih-arrhythmia-database")
# Load specific patients
train_patients = ['101', '106', '108', '109', '112', '114', '115', '116', '118', '119']
test_patients = ['100', '103', '105', '111', '113']
train_signals, train_annotations, train_rpeaks = data_loader.load_mit_bih_data(train_patients)
test_signals, test_annotations, test_rpeaks = data_loader.load_mit_bih_data(test_patients)
# Initialize classifier and extract features
classifier = ECGClassifier(sampling_rate=360)
X_train, y_train = classifier.extract_features_from_dataset(train_signals, train_annotations, train_rpeaks)
X_test, y_test = classifier.extract_features_from_dataset(test_signals, test_annotations, test_rpeaks)
# Train and evaluate
model, feature_selector = classifier.train_lightgbm_model_improved(X_train, y_train, "improved_model")
predictions = classifier.predict(X_test, "improved_model")
streamlit run ecg_app.pyThe application will be available at http://localhost:8501
ecg-classification/
├── ecg_app.py # Streamlit web application
├── requirements.txt # Python dependencies
├── models/ # Trained model files
│ └── ecg_classifier_advanced_pipeline.pkl
├── utils/
│ ├── data_loader.py # ECG data loading and preprocessing
│ ├── feature_extractor.py # Feature extraction methods
│ └── classifier.py # Machine learning models
└── notebooks/ # Jupyter notebooks for experimentation
The system extracts four types of features from each ECG beat:
- Raw signal values from a window of [-90, 90] samples around R-peak
- Downsampled to 90 features
- Daubechies 1 (db1) wavelet with 3 levels of decomposition
- Top 10 approximation coefficients
- Skewness and kurtosis calculated over 6 intervals
- Captures non-linear signal characteristics
- Pre-RR, post-RR, local-RR, and global-RR intervals
- Both raw and normalized versions
Total: 118 features per beat
- Baseline wander removal using median filters
- R-peak detection using BioSPPy
- Beat segmentation around R-peaks
- Z-score normalization of features
- Class weighting: Adjusting class weights in loss functions
- SMOTE: Synthetic Minority Over-sampling Technique
- ADASYN: Adaptive Synthetic Sampling
- Threshold optimization: Class-specific prediction thresholds
- LightGBM: Gradient boosting with class weighting
- Random Forest: Ensemble of decision trees with balanced subsampling
- Voting Classifier: Ensemble of multiple models
- Feature Selection: SelectKBest with ANOVA F-value
The system uses AAMI-standard evaluation metrics:
- Accuracy: Overall classification accuracy
- Precision: Positive predictive value for each class
- Recall: Sensitivity for each class
- F1-Score: Harmonic mean of precision and recall
- Confusion Matrix: Detailed class-wise performance
- JK Index: Comprehensive performance measure
The Streamlit web app provides:
- ECG Signal Input: Upload CSV files or paste raw values
- Real-time Analysis: Instant classification results
- Visualizations: ECG signal plots and confidence scores
- Sample Data: Generate demo ECG signals for testing
- Detailed Reports: Class-wise performance metrics
If you use this code in your research, please cite:
@article{MONDEJARGUERRA201941,
title = {Heartbeat classification fusing temporal and morphological information of ECGs via ensemble of classifiers},
author = {Mondéjar-Guerra, V and Novo, J and Rouco, J and Penedo, M G and Ortega, M},
journal = {Biomedical Signal Processing and Control},
volume = {47},
pages = {41--48},
year = {2019},
doi = {https://doi.org/10.1016/j.bspc.2018.08.007}
}- Sampling Rate: 360 Hz
- Duration: 30 minutes per record
- Leads: 2 leads (MLII usually preferred)
- Patients: 47 subjects
- Annotations: Beat-level and rhythm-level annotations
The dataset is split using the inter-patient scheme to ensure no patient overlap between training and testing:
Training Set (22 patients): 101, 106, 108, 109, 112, 114, 115, 116, 118, 119, 122, 124, 201, 203, 205, 207, 208, 209, 215, 220, 223, 230
Test Set (22 patients): 100, 103, 105, 111, 113, 117, 121, 123, 200, 202, 210, 212, 213, 214, 219, 221, 222, 228, 231, 232, 233, 234
- This is a research implementation and should not be used for clinical decision-making without proper validation
- The model performance may vary with different ECG recording devices and conditions
- Always consult healthcare professionals for medical diagnosis
Contributions are welcome! Please feel free to submit a Pull Request.
- MIT-BIH Arrhythmia Database providers
- PhysioNet for maintaining the database
- Contributors to the open-source libraries used in this project
Disclaimer: This software is intended for research purposes only. It should not be used for medical diagnosis or treatment without consultation with qualified healthcare professionals.