Classification_of_Toxic_Comments

Introduction: This challenge was posted in Kaggle by Jigsaw in collaboration with Google’s Alphabet. The reason I choose this dataset is to classify and remove or block rude comments online. All the comments were drawn from Wikipedia talk page where people can post their opinions and experiences online by implementing we can reduce online harassment to provide freedom to post publicly.

Aim: To classify comments into 6 categories based on level of toxicity.

Dataset Info We have 6 classification output variables

Toxic
Severe-toxic
Obscene
Threat
Insult
Identity-hate
ID variable and one input variable i.e. comment_text

We have used pandas_profiling library to outline the dataset.

Feature Extraction: Performed feature extraction which is the process of taking out a list of words from the text data and then transforming them into a feature set which would be used as predictors for classification. There are many ways to extract features using NLP techniques, in this project I have used the following two approach-

Count Vectorization
Term Frequency - Inverse Document Frequency (TF-IDF)

Classification Models:

Naïve Bayes
Logistic Regression
Random Forest
Gradient Boosting

Final Model:

Random Forest

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Complete.ipynb		Complete.ipynb
README.md		README.md
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classification_of_Toxic_Comments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Classification_of_Toxic_Comments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages