GitHub - konnatzeik/NLP-Sentiment-Analysis

NLP-Sentiment-Analysis: US-Airline Tweets

This repository contains a comparative study of various machine learning classifiers for natural language processing (NLP) sentiment analysis. The project evaluates model performance on a dataset of user opinions about US Airlines from X (formerly Twitter).

Project Overview

Objective: Classify the polarity of airline tweets using text classification techniques.

Dataset: Twitter_US_Airline_Sentiment.csv (attached to this repository)

Feature Extraction: TF-IDF vectorization tested under three vocabulary constraints (minimum 5 document frequency, max 2500 words, and max 500 words).

Models Evaluated:

Logistic Regression
Support Vector Machines (LinearSVC)
Random Forests
Feed-forward Neural Network

The classifiers are evaluated using 5-fold cross-validation. Performance is measured across three primary metrics:

Accuracy
F1-score
Fit time

Results

(Logistic Regression) consistently outperformed the others. It achieved its highest performance in Experiment 1, reaching 78.37% Accuracy and a 76.95% F1-score, while remaining highly computationally efficient.
(Support Vector Machines (SVM)) delivered very competitive accuracy and F1-scores, and stood out by having the fastest fit times across all experiments.
(Random Forest) and the (Feed-Forward Neural Network) yielded significantly lower accuracy and F1-scores, alongside much slower training times. The Neural Network, due to its complexity, was the least effective and least efficient model for this specific setup.

The Impact of Vocabulary Size (TF-IDF)

The experiments revealed a clear trade-off between computational cost and classification performance based on vocabulary constraints:

Experiment 1 (min_df=5): Highest accuracy and F1-scores. Best choice for maximizing predictive performance.
Experiment 2 (max_features=2500): Offers the best balance of solid performance and faster training times.
Experiment 3 (max_features=500): Fastest to train, but suffers a significant drop in accuracy and F1-score due to the restricted vocabulary.

Note: See the attached Jupyter Notebook for full code, metrics, and exploratory data analysis (EDA).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
NLP-Sentiment-Analysis.ipynb		NLP-Sentiment-Analysis.ipynb
README.md		README.md
Twitter_US_Airline_Sentiment.csv		Twitter_US_Airline_Sentiment.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-Sentiment-Analysis: US-Airline Tweets

Project Overview

Results

The Impact of Vocabulary Size (TF-IDF)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NLP-Sentiment-Analysis: US-Airline Tweets

Project Overview

Results

The Impact of Vocabulary Size (TF-IDF)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages