Skip to content

Aritra-20/Twitter-Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

Twitter-Sentiment-Analysis

Abstract:

Sentiment analysis is used to determine whether a given text contains negative, positive, or neutral emotions. It’s a form of text analytics that uses natural language processing (NLP) and machine learning. Sentiment analysis is also known as “opinion mining” or “emotion artificial intelligence”. Tweets are often useful in generating a vast amount of sentiment data upon analysis. These data are useful in understanding the opinion of the people about a variety of topics.

While building our dataset we had to open a Twitter Developer account then extract raw tweets based on our requirement. After that we compiled the collected data into a csv file and made it ready to create a dataframe.

In the phase of data preprocessing we tried to emit the stop-words, comments, url and other unnecessary items from our dataset. We have used some popular built-in methods to complete this step. Then we calculated polarity scores to signify that our dataset is balanced and not biased towards any polarity.

In the case of Model building, first we had to split our dataset into the training and testing part. Then we fit our model with MultinomialNB and got some results and as we know MultinomialNB works in that case of classification where we have sufficient differences between the classes. So we also used SGDClassifier to get a better result.

To create results, we have used popular techniques and got our results in metrics.

Finally, we can see that we can improve the overall accuracy of our model or improve the accuracy for each class by refining our dataset.

Steps of collecting the data and building the dataset:

Step 1: Get access to the Twitter API and create a developers account

Step 2: Apply for a developer account with Twitter and get your Twitter API keys and Tokens

Step 3: Fetch data from Twitter API in Python

Step 4: Install tweepy, which provides a way to invoke certain HTTP endpoints without dealing with low-level details.
This is an image

Step 5: Authenticate with your credentials, which we can get once we have registered with a developers account. This step is essential for getting our data.
This is an image

Step 6: Set up the search query containing the content related to which we want to collect the data. This is an image

Step 7: Collect the Tweets and append to a list
This is an image

Step 8: Create a dataset using pandas dataframe
This is an image

Step 9: Convert dataset to csv file

Dataset (first 12 lines) after manual labelling:

This is an image

Algorithm:

Step 1: START

Step 2: Import necessary libraries and packages

Step 3: Read the dataset and convert it into pandas data frame

Step 4: Convert the contents of the column named “Tweets” into lower case

Step 5: Define a list of stop words

Step 6: Remove the stop words using the above-mentioned list

Step 7: Remove the punctuations and special symbols

Step 8: Remove repeating characters

Step 9: Remove URLs/ Hyperlinks

Step 10: Remove numerical values

Step 11: Import nltk and download 'vader_lexicon'

Step 12: Using nltk.sentiment.vader , import SentimentIntensityAnalyzer

Step 13: Create a new column named “polarity scores” containing the polarities of individual tweets from our dataset using SentimentIntensityAnalyzer

Step 14: Create a new column named “polarity” containing the overall compound polarities of the tweets

Step 15: Print the results containing the number of tweets in favor of Russia/Ukraine and favor of War/No War

Step 16: Divide the dataset into training (80%) and testing (20%) dataset

Step 17: Import CountVectorizer,TfidfTransformer and MultinomialNB from sklearn.feature_extraction.text and sklearn.naive_bayes respectively

Step 18: Import Pipeline from sklearn.pipeline

Step 19: Train the columns of “Tweets” and “Support” using MultinomialNB model and the pipeline

Step 20: Predict the result and compare it with “Support” and find the accuracy

Step 21: Import SGDClassifier from sklearn.linear_model

Step 22: Train the columns of “Tweets” and “Support” using SGDClassifier model and the pipeline

Step 23: Predict the result and compare it with “Support” and find the accuracy

Step 24: Repeat Step 19 to Step 23 for “Tweets” and “War”

Step 25: Import metrics from sklearn

Step 26: Print the classification report

Output:

Polarity of Tweets -

This is an image

Number of tweets supporting Russia, Ukraine, War and No War -

This is an image
This is an image
This is an image
This is an image

Prediction Results - Accuracy measured:

Accuracy for support prediction using Multinomial NB -

This is an image

Accuracy for support prediction using SGD Classifier -

This is an image

Accuracy for War prediction using Multinomial NB -

This is an image

Accuracy for War prediction using SGD Classifier -

This is an image

Confusion Matrix -

This is an image

Classification Report -

This is an image

Conclusion:

● For Support for Russia/Ukraine, SGDClassifier gives better accuracy
● For Support for War (Yes/No), MultinomialNB gives better accuracy
● No. of tweets which want WAR: 499
● No. of tweets which do not want WAR: 506
● No. of tweets which support Russia: 490
● No. of tweets which support Ukraine: 468

Releases

No releases published

Packages

 
 
 

Contributors