Skip to content

BhaveshBhakta/Personality-Classification-Using-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Personality Classification

Project Overview

This project aims to classify personality types based on responses to a psychological test. Using a dataset of responses to a 60-question test (with scores ranging from -3 to 3), the goal is to develop a machine learning model that can accurately predict one of the 16 distinct personality types (MBTI types). This is a challenging multi-class classification task with applications in psychology, human resources, and self-assessment tools.


Technical Highlights

  • Dataset: Kaggle - 60k responses of 16 Personalities Test (MBTI)
  • Size: 59,999 entries, 62 columns.
  • Key Features:
    • 60 numerical features representing responses to a personality test.
  • Approach:
    • Data Cleaning: The dataset was clean with no missing values or duplicates. The Response Id column was dropped as it is a unique identifier.
    • Exploratory Data Analysis: The code checks basic statistics, null values, duplicates, and unique values for all columns. The target variable Personality is well-balanced across all 16 classes.
    • Label Encoding: Applied to the target Personality column to convert it into a numerical format for multi-class classification.
    • Multi-class Classification: The target variable Personality has 16 distinct categories.
    • Models Used:
      • Logistic Regression, Ridge Classifier, SVC, Random Forest, XGBoost, AdaBoost, Gradient Boosting, Bagging, Decision Tree.
  • Best Accuracy:
    • 97.7% with XGBoost Classifier.
    • 97.4% with Random Forest Classifier.
    • 94.5% with Gradient Boosting Classifier.
    • The very high accuracies for the ensemble models suggest that the test responses provide very strong discriminative power for personality classification.

Purpose and Applications

  • Automated Personality Assessment: Enables a quick and accurate classification of personality types from test responses.
  • Psychological Research: Supports research in personality psychology and behavior analysis.
  • Human Resources: Assists in team building, career guidance, and job-role matching.
  • Self-Improvement: Provides a tool for individuals to better understand their own personality traits.

Installation

Clone the repository and extract the data from the zip file.

Install the necessary libraries:

pip install pandas numpy seaborn matplotlib scikit-learn xgboost

Collaboration

We welcome contributions to improve the project. You can help by:

  • Performing comprehensive hyperparameter tuning and cross-validation for the top-performing models to ensure robustness.
  • Investigating the impact of different preprocessing techniques.
  • Adding explainability (e.g., SHAP or LIME) to understand which questions or groups of questions are the most critical for classifying a specific personality type.

Releases

No releases published

Packages

No packages published