Skip to content

Bakame1/SJTU_Sentimental_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 

Repository files navigation

IMDb Sentiment Analysis — Contextual Word Classification - Shanghai Jiao Tong University/Mines Paris

Authors: Wael Ben Slima & Marko Babic
Notebook: Wael_Marko_IMDB_Sentiment_Analysis.ipynb
Dataset: IMDb Dataset of 50K Movie Reviews


Project Overview

This notebook implements a contextual word sentiment classification model using the IMDb movie review dataset.
The primary goal is to classify individual words as positive, negative, or neutral by leveraging sentence-level sentiment labels and the context of surrounding words.

For example:

  • “beautiful” → Positive
  • “defeat” → Negative

Dataset Description

The IMDb dataset contains 50,000 movie reviews, split into:

  • 25,000 for training
  • 25,000 for testing

Each review is labeled as either positive or negative.


Workflow

1. Data Loading & Preprocessing

  • Load data using Pandas.
  • Clean the text: remove HTML tags & punctuation, lowercase, strip numbers, remove stopwords.
  • Tokenize and pad sequences for model input.

2. Model Building

  • Utilize TensorFlow / Keras.
  • Architecture includes:
    • Embedding layer
    • (Bi)LSTM layer to capture contextual dependencies
    • Dense output layers for classification

3. Training

  • Train on sentence-level labels.
  • Use callbacks (e.g. ModelCheckpoint) to save the best model.

4. Evaluation

  • Plot accuracy and loss curves.
  • Compute confusion matrix and classification metrics.

About

Sentimental classification on sentences based on the IMDb dataset.

Topics

Resources

Stars

Watchers

Forks

Contributors