🧪 Toxic Comment Classifier is a simple yet effective machine learning project that detects toxic comments in both English and Russian.
It uses classical NLP techniques (TF-IDF + Logistic Regression) for real-time text classification.
Whether you're building a moderation system or just exploring NLP, this project is a great starting point.
The script:
- Downloads and extracts English and Russian toxic comment datasets.
- Merges them into training and testing sets.
- Uses TfidfVectorizer to convert text into numerical features.
- Trains a logistic regression model.
- Saves the model and vectorizer to model.pkl.
- Allows the user to input a comment and checks if it is toxic.
Jigsaw Toxic Comment Classification Challenge:
https://www.kaggle.com/datasets/julian3833/jigsaw-toxic-comment-classification-challenge
Russian Language Toxic Comments:
https://www.kaggle.com/datasets/blackmoon/russian-language-toxic-comments
.
└── Toxic Comment Classifier AI
├── dataset
├── .gitattributes
├── .gitignore
├── LICENSE
├── main.py
├── model.pkl
├── README.md
└── requirements.txt
Clone the repository:
git clone https://github.com/pashudzu/ToxicCommentClassificationAI.git
cd ToxicCommentClassificationAI
python main.pyInstall dependencies:
pip install -r requirements.txt
| Comment | Classification |
|---|---|
| "You're stupid and nobody likes you!" | ❌ Toxic |
| "Have a great day!" | ✅ Kindness |
The model prints the accuracy score after training.
- Python 3
- scikit-learn
- NLTK
- pickle
- TF-IDF vectorization
- Logistic Regression
- ✅ Supports both English and Russian comments.
- 🧪 Uses only the
toxiclabel (binary classification) - 💾 The model is saved to avoid retraining on each run.
- 🚀 Avoids retraining if a saved model exists
This project is licensed under the MIT License. Use it freely.

