A privacy-preserving machine learning simulation demonstrating how multiple hospitals can collaboratively train a model without sharing sensitive patient data.
This project is published on AI Advances (Medium): https://medium.com/ai-advances/federated-learning-simulation-ff71e68ab1b5
This project simulates a federated learning scenario where three hospitals with different dataset sizes train individual logistic regression models for Parkinson's disease prediction. The models are then aggregated into a single federated model while preserving data privacy.
├── data/ # Original Parkinson's dataset
├── fake_hospitals_data/ # Simulated hospital datasets
├── shared_folder/ # the only folder that is shared between the notebooks
├── Hospital_1.ipynb
├── Hospital_2.ipynb
├── Hospital_3.ipynb
├── aggregation_and_testing.ipynb # Federated aggregation & evaluation
├── federated_learning.py # Core federated learning classes
├── hospital_model_trainer.py # Individual hospital training logic
└── Notebook 0 - Data Preparation.ipynb # Dataset splitting simulation
This is a simulation where each hospital notebook represents a different medical institution. They don't communicate directly - they only share model parameters through the shared_folder.
-
Data Preparation (Optional - datasets already provided)
jupyter notebook "Notebook 0 - Data Preparation.ipynb" -
Train Individual Hospital Models
jupyter notebook Hospital_1.ipynb jupyter notebook Hospital_2.ipynb jupyter notebook Hospital_3.ipynb
-
Federated Aggregation & Testing
jupyter notebook aggregation_and_testing.ipynb
- Source: Parkinson's Disease Dataset by gargmanas
- License: GNU Free Documentation License 1.3
- Features: 22 voice measurement features (jitter, shimmer, fundamental frequency, etc.)
- Total Samples: 195 (split into 3 hospitals + test set)
- Task: Binary classification (Healthy vs. Parkinson's)
- Hospital 1: 40 patients (7 healthy, 33 Parkinson's)
- Hospital 2: 42 patients (14 healthy, 28 Parkinson's)
- Hospital 3: 35 patients (6 healthy, 29 Parkinson's)
- Test Set: 78 patients (21 healthy, 57 Parkinson's)
✅ No raw patient data sharing - Each hospital keeps their data locally ✅ Only model parameters exchanged - Weights and bias shared via shared_folder ✅ Simulates real regulatory compliance - Mimics GDPR, HIPAA restrictions ✅ Collaborative without centralization - Hospitals work together while maintaining independence
This project demonstrates:
- Federated learning concepts and implementation
- Healthcare data privacy preservation
- Model aggregation techniques
- Performance comparison methodologies
- Real-world medical ML applications
This project is open source. The original Parkinson's dataset is licensed under GNU Free Documentation License 1.3.
Feel free to fork this project and submit pull requests for improvements!
- Original Dataset
- Federated Learning: Collaborative Machine Learning without Centralized Training Data