Skip to content

Benkapner/federated-learning-simulation

Repository files navigation

Federated Learning Simulation

A privacy-preserving machine learning simulation demonstrating how multiple hospitals can collaboratively train a model without sharing sensitive patient data.

This project is published on AI Advances (Medium): https://medium.com/ai-advances/federated-learning-simulation-ff71e68ab1b5

🎯 Project Overview

This project simulates a federated learning scenario where three hospitals with different dataset sizes train individual logistic regression models for Parkinson's disease prediction. The models are then aggregated into a single federated model while preserving data privacy.

🏗️ Project Structure

├── data/                          # Original Parkinson's dataset
├── fake_hospitals_data/           # Simulated hospital datasets
├── shared_folder/                 # the only folder that is shared between the notebooks
├── Hospital_1.ipynb             
├── Hospital_2.ipynb              
├── Hospital_3.ipynb             
├── aggregation_and_testing.ipynb # Federated aggregation & evaluation 
├── federated_learning.py         # Core federated learning classes
├── hospital_model_trainer.py     # Individual hospital training logic
└── Notebook 0 - Data Preparation.ipynb # Dataset splitting simulation

🚀 How to Run

This is a simulation where each hospital notebook represents a different medical institution. They don't communicate directly - they only share model parameters through the shared_folder.

Step-by-Step Execution

  1. Data Preparation (Optional - datasets already provided)

    jupyter notebook "Notebook 0 - Data Preparation.ipynb"
  2. Train Individual Hospital Models

    jupyter notebook Hospital_1.ipynb
    jupyter notebook Hospital_2.ipynb  
    jupyter notebook Hospital_3.ipynb
  3. Federated Aggregation & Testing

    jupyter notebook aggregation_and_testing.ipynb

🔬 Dataset Information

  • Source: Parkinson's Disease Dataset by gargmanas
  • License: GNU Free Documentation License 1.3
  • Features: 22 voice measurement features (jitter, shimmer, fundamental frequency, etc.)
  • Total Samples: 195 (split into 3 hospitals + test set)
  • Task: Binary classification (Healthy vs. Parkinson's)

Hospital Data Distribution

  • Hospital 1: 40 patients (7 healthy, 33 Parkinson's)
  • Hospital 2: 42 patients (14 healthy, 28 Parkinson's)
  • Hospital 3: 35 patients (6 healthy, 29 Parkinson's)
  • Test Set: 78 patients (21 healthy, 57 Parkinson's)

🛡️ Privacy-Preserving Features

✅ No raw patient data sharing - Each hospital keeps their data locally ✅ Only model parameters exchanged - Weights and bias shared via shared_folder ✅ Simulates real regulatory compliance - Mimics GDPR, HIPAA restrictions ✅ Collaborative without centralization - Hospitals work together while maintaining independence

🎓 Educational Value

This project demonstrates:

  • Federated learning concepts and implementation
  • Healthcare data privacy preservation
  • Model aggregation techniques
  • Performance comparison methodologies
  • Real-world medical ML applications

📄 License

This project is open source. The original Parkinson's dataset is licensed under GNU Free Documentation License 1.3.

🤝 Contributing

Feel free to fork this project and submit pull requests for improvements!

📚 References

  • Original Dataset
  • Federated Learning: Collaborative Machine Learning without Centralized Training Data

About

This repository simulate one round of horizontal federated learning and collaborative ML model training between three different sources without sharing raw data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors