Custom Transformer Text Classification

A complete implementation of a Transformer-based text classification model built from scratch using PyTorch. This project demonstrates how to implement the Transformer architecture without relying on pre-built libraries like Hugging Face Transformers.

Features

Custom Transformer Architecture: Complete implementation from scratch including:
- Multi-head self-attention mechanism
- Positional encoding using sine/cosine functions
- Feed-forward networks with residual connections
- Layer normalization
- Configurable number of layers, heads, and dimensions
Text Classification: Supports sentiment analysis and topic classification
Baseline Comparisons: Includes LSTM and Bag-of-Words baselines for performance comparison
Comprehensive Evaluation:
- Training/validation curves
- Confusion matrices
- Classification reports
- Attention weight visualization

Sample Results

EVALUATION METRICS SUMMARY
==================================================
Overall Accuracy: 0.8756

Per-Class Metrics:
------------------------------
Negative:
  Precision: 0.8842
  Recall:    0.8667
  F1-Score:  0.8754

Positive:
  Precision: 0.8674
  Recall:    0.8845
  F1-Score:  0.8759

MODEL COMPARISON
==================================================
Transformer Accuracy: 87.56%
Bag-of-Words Accuracy: 0.8234
LSTM Accuracy: 0.8456

Model Architecture

The Transformer model includes:

Token Embedding: Converts token IDs to dense vectors
Positional Encoding: Adds position information using sine/cosine functions
Multi-Head Attention: Multiple attention heads for capturing different relationships
Feed-Forward Networks: Position-wise fully connected layers
Residual Connections: Skip connections for better gradient flow
Layer Normalization: Normalization after each sub-layer
Classification Head: Global average pooling + linear layer

Model Parameters

Default Configuration: 128 dimensions, 8 heads, 6 layers
Trainable Parameters: ~1.2M parameters (configurable)
Memory Efficient: Optimized for CPU training if needed

Performance

The model achieves competitive performance on standard benchmarks:

Sentiment Analysis: ~87-90% accuracy
Classification: ~85-88% accuracy

Performance varies based on:

Model size (dimensions, layers, heads)
Training epochs
Learning rate
Sequence length

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Custom Transformer Text Classification

Features

Sample Results

Model Architecture

Model Parameters

Performance

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Custom Transformer Text Classification

Features

Sample Results

Model Architecture

Model Parameters

Performance

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages