Multi-Label Story Classification 📚

👥 Authors

Andrea Tribotti
Sara Furlani
Luca Serfilippi

📖 Project Overview

This project focuses on Multi-Label Text Classification applied to the TinyStories dataset, a corpus of short children's stories written with simple vocabulary. The objective is to predict six predefined narrative tags based on the story content. Unlike standard classification, each story can belong to multiple categories simultaneously.

📊 Dataset Details

Source: TinyStories dataset.
Scale: The dataset consists of a train set (2,735,100 stories) and a test set (10,000 stories).
Methodology: We utilized a subset of 250,000 stories for training and validation to optimize computational resources.
Integrity: The test set was used exclusively for final evaluation to ensure unbiased performance metrics.

🏷️ Target Tags

The models are designed to identify the following six narrative elements:

BadEnding
Conflict
Dialogue
Foreshadowing
MoralValue
Twist

🤖 Models Implemented

We developed and compared three different machine learning approaches, ranging from classical baselines to state-of-the-art Transformers.

1. Linear Model (Baseline)

Technique: TF-IDF Vectorizer (5,000 features, unigrams + bigrams) + Logistic Regression.
Strategy: One-vs-Rest Classifier with balanced class weights.
Characteristics: Very fast training (~2 mins), high recall but lower precision.

2. Non-Linear Model (Custom Transformer)

Architecture: A custom Transformer-based classifier built from scratch.
Components: Token Embedding, Learned Positional Embedding, 1 Transformer Encoder Layer, Max Pooling.
Characteristics: computationally efficient (~4 mins), higher precision than the baseline.

3. DistilBERT (Fine-Tuning)

Architecture: Pre-trained DistilBERT (uncased) from Hugging Face.
Training: Fine-tuned for 5 epochs with Binary Cross-Entropy Loss.
Characteristics: Best overall performance (Accuracy & F1-Score), though computationally more expensive (~200 mins training time).

📊 Results & Performance

We evaluated the models using Accuracy, Precision, Recall, and F1-Score.

Model	Accuracy	Average F1-Score	Training Time
Linear (LogReg)	89.11%	0.70	~2 min
Custom Transformer	93.99%	0.71	~4 min
DistilBERT	94.43%	0.78	~200 min

Key Findings

DistilBERT achieved the best balance between Precision and Recall.
Conflict and Foreshadowing were the hardest tags to predict across all models due to their dependency on subtle context rather than specific keywords.
BadEnding and Dialogue were the easiest to classify.

🚀 How to Run

You can run the full analysis directly in your browser using Google Colab — no installation required.

⚠️ Important: The notebook does NOT train the models by default. To ensure fast reproducibility and avoid ~200 minutes of training time, pre-trained weights are loaded from Google Drive.

Click the "Open in Colab" badge at the top of this README. Run the cells sequentially to load the weights and reproduce the reported results.

📒 Detailed Report

For a full description of the models, experiments, and results, see the full report: Report_Multi_Label_Classification

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Analysis_Multi_Label_Story_Classification.ipynb		Analysis_Multi_Label_Story_Classification.ipynb
README.md		README.md
Report_Multi_Label_Classification.pdf		Report_Multi_Label_Classification.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Label Story Classification 📚

👥 Authors

📖 Project Overview

📊 Dataset Details

🏷️ Target Tags

🤖 Models Implemented

1. Linear Model (Baseline)

2. Non-Linear Model (Custom Transformer)

3. DistilBERT (Fine-Tuning)

📊 Results & Performance

Key Findings

🚀 How to Run

📒 Detailed Report

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-Label Story Classification 📚

👥 Authors

📖 Project Overview

📊 Dataset Details

🏷️ Target Tags

🤖 Models Implemented

1. Linear Model (Baseline)

2. Non-Linear Model (Custom Transformer)

3. DistilBERT (Fine-Tuning)

📊 Results & Performance

Key Findings

🚀 How to Run

📒 Detailed Report

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages