This project aims to classify personality types based on responses to a psychological test. Using a dataset of responses to a 60-question test (with scores ranging from -3 to 3), the goal is to develop a machine learning model that can accurately predict one of the 16 distinct personality types (MBTI types). This is a challenging multi-class classification task with applications in psychology, human resources, and self-assessment tools.
- Dataset: Kaggle - 60k responses of 16 Personalities Test (MBTI)
- Size: 59,999 entries, 62 columns.
- Key Features:
- 60 numerical features representing responses to a personality test.
- Approach:
- Data Cleaning: The dataset was clean with no missing values or duplicates. The
Response Idcolumn was dropped as it is a unique identifier. - Exploratory Data Analysis: The code checks basic statistics, null values, duplicates, and unique values for all columns. The target variable
Personalityis well-balanced across all 16 classes. - Label Encoding: Applied to the target
Personalitycolumn to convert it into a numerical format for multi-class classification. - Multi-class Classification: The target variable
Personalityhas 16 distinct categories. - Models Used:
- Logistic Regression, Ridge Classifier, SVC, Random Forest, XGBoost, AdaBoost, Gradient Boosting, Bagging, Decision Tree.
- Data Cleaning: The dataset was clean with no missing values or duplicates. The
- Best Accuracy:
- 97.7% with XGBoost Classifier.
- 97.4% with Random Forest Classifier.
- 94.5% with Gradient Boosting Classifier.
- The very high accuracies for the ensemble models suggest that the test responses provide very strong discriminative power for personality classification.
- Automated Personality Assessment: Enables a quick and accurate classification of personality types from test responses.
- Psychological Research: Supports research in personality psychology and behavior analysis.
- Human Resources: Assists in team building, career guidance, and job-role matching.
- Self-Improvement: Provides a tool for individuals to better understand their own personality traits.
Clone the repository and extract the data from the zip file.
Install the necessary libraries:
pip install pandas numpy seaborn matplotlib scikit-learn xgboostWe welcome contributions to improve the project. You can help by:
- Performing comprehensive hyperparameter tuning and cross-validation for the top-performing models to ensure robustness.
- Investigating the impact of different preprocessing techniques.
- Adding explainability (e.g., SHAP or LIME) to understand which questions or groups of questions are the most critical for classifying a specific personality type.