ADHD Prediction and Gender Classification Model
π Overview
This repository contains a machine learning pipeline developed for the WiDS Datathon 2025, focusing on predicting ADHD and classifying gender based on neurophysiological data. The study explores feature engineering, model selection, and explainability techniques to enhance ADHD diagnosis, particularly in females.
π Why This Matters
ADHD diagnosis in females is understudied and often misdiagnosed due to symptom variability. This model leverages machine learning to identify distinct brain activity patterns, enabling personalized interventions and improved early diagnosis.
π Dataset
The dataset used in this project is part of the WiDS Datathon 2025. Register and download it from the competition page.
- Folders: Train, Test, and a Data Dictionary (available on Kaggle)
-Data Type: Neurophysiological features linked to ADHD diagnosis in connectome matrices
-Features Include: EEG signals, response times (quantitative) & demographic attributes (categorical)
-Preprocessing Steps: Handling missing values, outliers, and feature transformations
π Methodology & Approach
The model development follows a structured machine learning pipeline:
1.Exploratory Data Analysis (EDA) β Understanding feature distributions, correlations, and missing patterns.
2.Feature Engineering (FE) β Creating meaningful features and reducing dimensionality.
3.Data Preprocessing β Encoding categorical variables, scaling numerical data, handling imbalanced classes.
4.Model Selection β Training and comparing multiple models:
Logistic Regression
Random Forest
Gradient Boosting (Best Performing)
5.Hyperparameter Tuning β Optimized models using GridSearchCV / Optuna.
6.Evaluation Metrics β Performance measured using:
Accuracy, Precision, Recall, F1-score (classification)
ROC-AUC Score (for ADHD prediction)
SHAP & LIME for interpretability
π Results & Insights
Best Performing Model: Gradient Boosting Classifier
Key Findings:
Feature 22 in the brain matrices had the most impact on ADHD prediction.
Gender classification improved significantly with Feature Y.
Addressing class imbalance significantly enhanced model performance.
π How to Run the Project
πΉ Clone the Repository
git clone https://github.com/LABOSO123/adhd-prediction-model.git
cd adhd-prediction-model
πΉ Install Dependencies
pip install -r requirements.txt
πΉ Run the Model
python train_model.py
πΉ Make Predictions
python predict.py --input data/sample_input.csv
π Repository Structure
π adhd-prediction-model
βββ π data/ # Dataset (or instructions on where to get it)
βββ π notebooks/ # Jupyter Notebooks for EDA and experimentation
βββ π src/ # Source code for data preprocessing & modeling
βββ π models/ # Trained model files
βββ train_model.py # Script to train the model
βββ predict.py # Script to generate predictions
βββ requirements.txt # Dependencies
βββ README.md # Project documentation
π Future Work
Incorporate deep learning techniques (CNNs, RNNs) for improved prediction.
Deploy the model using Flask / FastAPI for real-world application.
Implement feature importance analysis for better interpretability.
π Acknowledgments
WiDS Datathon organizers for the dataset and challenge.
π Let's Connect!
πΌ LinkedIn: Faith Chemutai Labosoπ GitHub: LABOSO123
π Feel free to fork this repository, contribute, or reach out with any questions!