Depression Detection using TF-IDF & Machine Learning

This repository contains a lightweight, efficient text classification system designed to detect signs of depression primarily from text data. It achieves this using a classical Natural Language Processing (NLP) approach, leveraging Term Frequency-Inverse Document Frequency (TF-IDF) for feature extraction and a Logistic Regression model for classification.

Overview

Unlike computationally expensive deep learning models, this system provides a rapid and relatively accurate baseline for text classification tasks. The system is split into two main components:

Training Script (Training.py): Processes the dataset, vectorizes the text using TF-IDF, trains a Logistic Regression classifier, and saves the trained model and vectorizer for future use.
Prediction Script (predict.py): Loads the pre-trained model and vectorizer to evaluate new text inputs, providing both a classification (Depressed / Not Depressed) and a confidence probability.

Key Features

Efficient Text Vectorization: Utilizes TF-IDF with up to 5,000 features (unigrams and bigrams) to capture meaningful word combinations.
Logistic Regression Classifier: Employs a classical, easily interpretable machine learning algorithm.
Pre-processing Integration: Handles basic text cleaning (lowercasing, punctuation removal) during inference.
Interactive & CLI Modes: The prediction script supports both command-line arguments and an interactive prompt.

Prerequisites

Python 3.x is required. It is highly recommended to use a virtual environment. Install the necessary dependencies via pip:

pip install pandas scikit-learn joblib numpy

Getting Started

1. Dataset Requirements

To train the model from scratch, you must provide a dataset named depression_dataset.csv in the root directory. The dataset should contain at least the following two columns:

clean_text: The textual data to be analyzed.
is_depression: The target label (1 for depressed, 0 for not depressed).

2. Training the Model

Run the training script to build the model based on your dataset:

python Training.py

This script will split the data, train the TF-IDF vectorizer and Logistic Regression model, output the accuracy and a classification report, and save two files relative to the script:

depression_model.pkl
vectorizer.pkl

(Note: Ensure the models directory exists or the saving paths align across both scripts for prediction).

3. Making Predictions

Once the model is trained and saved, you can use the prediction script to classify new sentences. You can run it interactively:

$ python predict.py
Enter text: I am feeling so sad and lonely
Prediction: Depressed
Confidence: 0.8421

Or you can pass the text directly as command-line arguments:

$ python predict.py I am feeling so sad and lonely
Prediction: Depressed
Confidence: 0.8421

Workflow Details

Text Preprocessing: During inference, the system automatically converts text to lowercase and strips non-alphabetic characters.
Threshold Adjustment: The prediction script utilizes a custom threshold (0.3 probability) to bias towards recall, meaning it is more sensitive to detecting signs of depression.
Artifact Generation: Uses joblib for serializing the trained pipeline, ensuring that predictions strictly adhere to the original training vocabulary.

License

Please refer to the repository's licensing information for usage rights and restrictions.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
Training.py		Training.py
predict.py		predict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Depression Detection using TF-IDF & Machine Learning

Overview

Key Features

Prerequisites

Getting Started

1. Dataset Requirements

2. Training the Model

3. Making Predictions

Workflow Details

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Depression Detection using TF-IDF & Machine Learning

Overview

Key Features

Prerequisites

Getting Started

1. Dataset Requirements

2. Training the Model

3. Making Predictions

Workflow Details

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages