Skip to content

nomik32/transformer-text-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Custom Transformer Text Classification

A complete implementation of a Transformer-based text classification model built from scratch using PyTorch. This project demonstrates how to implement the Transformer architecture without relying on pre-built libraries like Hugging Face Transformers.

Features

  • Custom Transformer Architecture: Complete implementation from scratch including:

    • Multi-head self-attention mechanism
    • Positional encoding using sine/cosine functions
    • Feed-forward networks with residual connections
    • Layer normalization
    • Configurable number of layers, heads, and dimensions
  • Text Classification: Supports sentiment analysis and topic classification

  • Baseline Comparisons: Includes LSTM and Bag-of-Words baselines for performance comparison

  • Comprehensive Evaluation:

    • Training/validation curves
    • Confusion matrices
    • Classification reports
    • Attention weight visualization

Sample Results

EVALUATION METRICS SUMMARY
==================================================
Overall Accuracy: 0.8756

Per-Class Metrics:
------------------------------
Negative:
  Precision: 0.8842
  Recall:    0.8667
  F1-Score:  0.8754

Positive:
  Precision: 0.8674
  Recall:    0.8845
  F1-Score:  0.8759

MODEL COMPARISON
==================================================
Transformer Accuracy: 87.56%
Bag-of-Words Accuracy: 0.8234
LSTM Accuracy: 0.8456

Model Architecture

The Transformer model includes:

  1. Token Embedding: Converts token IDs to dense vectors
  2. Positional Encoding: Adds position information using sine/cosine functions
  3. Multi-Head Attention: Multiple attention heads for capturing different relationships
  4. Feed-Forward Networks: Position-wise fully connected layers
  5. Residual Connections: Skip connections for better gradient flow
  6. Layer Normalization: Normalization after each sub-layer
  7. Classification Head: Global average pooling + linear layer

Model Parameters

  • Default Configuration: 128 dimensions, 8 heads, 6 layers
  • Trainable Parameters: ~1.2M parameters (configurable)
  • Memory Efficient: Optimized for CPU training if needed

Performance

The model achieves competitive performance on standard benchmarks:

  • Sentiment Analysis: ~87-90% accuracy
  • Classification: ~85-88% accuracy

Performance varies based on:

  • Model size (dimensions, layers, heads)
  • Training epochs
  • Learning rate
  • Sequence length

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages