This project aims to analyze and predict depression among students based on various factors such as academic pressure, work pressure, CGPA, study satisfaction, job satisfaction, sleep duration, dietary habits, and more.
We have performed the following steps in this project:
-
Data Loading and Inspection
- Loaded the dataset and inspected its structure, shape, and summary statistics.
- Checked for missing values and handled them appropriately.
-
Data Preprocessing
- Encoded categorical features such as Gender, City, and Profession using One-Hot Encoding.
- Discretized the Age feature into meaningful age groups.
-
Exploratory Data Analysis (EDA)
- Visualized the distribution of age groups and their relationship with depression status.
- Created a correlation heatmap to understand the relationships between numerical features.
-
Feature Engineering
- Created new features such as Stress Level by combining Academic Pressure and Work Pressure.
- Identified important features using feature importance scores from the RandomForest model.
-
Model Building and Evaluation
- Built a RandomForest model to predict depression status.
- Evaluated the model using accuracy, classification report, and confusion matrix.
- Performed cross-validation to ensure consistent model performance.
- Conducted hyperparameter tuning using RandomizedSearchCV to optimize the model.
-
Model Interpretation and Saving
- Visualized the feature importances to understand the impact of different features on the model's predictions.
- Saved the trained model for future use.
- The RandomForest model achieved an accuracy of 0.82 on the test set.
- The most important features identified by the model include Academic Pressure, Work Pressure, CGPA, and Study Satisfaction.
- The XGBoost model achieved an accuracy of 0.82 on the test set after hyperparameter tuning.
- Best hyperparameters for XGBoost: {'subsample': 0.8, 'n_estimators': 200, 'max_depth': 3, 'learning_rate': 0.1, 'colsample_bytree': 0.8}
- Correlation coefficients:
- CGPA vs Depression: 0.022
- Age vs Depression: -0.226
- Financial Stress vs Depression: 0.364
- Work/Study Hours vs Depression: 0.209
- Further optimize the model using more advanced techniques.
- Explore additional features and their impact on depression prediction.
- Deploy the model for real-time predictions.
- Clone the repository:
git clone https://github.com/Zeesejo/Depressed-students/