Skip to content

Implementing the Logistic Regression and Naive Bayes algorithms from scratch, and using them for sentiment analysis on IMDB comments.

Notifications You must be signed in to change notification settings

Christoforos00/IMDBClassification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IMDBClassification

In this project we implement the SGD Logistic Regression and Naive Bayes algorithms, and use them to classify IMDB comments.

Our dataset contains 50000 .txt files of IDMB comments, which are labeled as positive or negative. The main project is executed in mainNotebook.ipynb . Initially we preprocess the files so that only alphanumerics and spaces are kept. Then, we create a dictionary out of the n most frequent words that appear in our dataset, where the m first are skipped. As we mention below we decided n=2000 and m=0 , which means that the dictionary contains the 2000 most frequent words. We convert each comment to a vector(1x2000) which represents the existance of each one of the dictionary's words in the current comment. After converting all the comments ,we tune the hyperparameters (the parameters of our algorithms but also the n,m values) using the CV data . Finally for each algorithm we calculate the learning curves and accuracy (>80%).

From Logistic regression we get the following diagrams:

1 2

From Naive Bayes we get the following diagrams:

3 4

The code files are:

• logReg.py, where the functions of the Logistic Regression algorithm are implemented

• naiveBayes.py, where the functions of the Naive Bayes algorithm are implemented

• dataUtils.py, which contains functions for editing and converting text

• mainNotebook.ipynb, where the main part of the project is executed

The dataset is located at https://ai.stanford.edu/~amaas/data/sentiment/ .

About

Implementing the Logistic Regression and Naive Bayes algorithms from scratch, and using them for sentiment analysis on IMDB comments.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •