Skip to content

Pavan-Bellam/gpt-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Building LLM from Scratch

A PyTorch implementation of a GPT-style language model built from scratch.

Overview

This project implements a GPT (Generative Pre-trained Transformer) model from the ground up, featuring:

  • Multi-head self-attention mechanism
  • Transformer blocks with pre-layer normalization
  • Custom GELU activation and LayerNorm
  • Complete training pipeline
  • Text generation capabilities

Project Structure

building_llm_from_scratch/
├── src/
│   ├── model/
│   │   ├── attention.py         # Multi-head self-attention
│   │   ├── layers.py            # GELU, LayerNorm, FeedForward
│   │   ├── transformer_block.py # Transformer block
│   │   └── gpt_model.py         # Complete GPT model
│   ├── data/
│   │   └── dataset.py           # Dataset and DataLoader
│   ├── config.py                # Model configurations
│   ├── utils.py                 # Text generation utilities
│   └── visualization.py         # Training plots
├── data/                        # Training data directory
├── main.py                      # Main training script
├── train.py                     # Training loop
├── requirements.txt             # Dependencies
└── README.md                    # Documentation

Features

  • Clean Architecture: Production-ready code with comprehensive docstrings and type hints
  • GPT-124M Configuration: Implements a GPT model with ~124M parameters
  • Custom Implementation: Built from scratch including attention, layer norm, and GELU
  • Training Pipeline: Complete training loop with evaluation and sample generation
  • Text Generation: Autoregressive text generation with greedy decoding
  • Visualization: Training and validation loss plotting
  • Modular Design: Well-organized codebase for easy extension and modification

Installation

# Install dependencies using uv (recommended)
uv sync

# Or using pip
pip install -r requirements.txt

Usage

  1. Prepare your data: Place your training text file in the data/ directory

  2. Configure training: Edit hyperparameters in main.py if needed

  3. Run training:

python main.py

Model Configuration

Default configuration (GPT-124M):

  • Vocabulary size: 50,257 (GPT-2 tokenizer)
  • Context length: 128 tokens
  • Embedding dimension: 768
  • Number of attention heads: 12
  • Number of transformer layers: 12
  • Dropout: 0.1
  • QKV bias: True

You can modify these settings in src/config.py or create new configurations.

Example: Text Generation

from src.model.gpt_model import GPTModel
from src.config import GPT_CONFIG_124M
from src.utils import generate_and_print
import tiktoken
import torch

# Load model
model = GPTModel(GPT_CONFIG_124M)
model.load_state_dict(torch.load('model.pth'))

# Generate text
tokenizer = tiktoken.get_encoding("r50k_base")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

text = generate_and_print(
    model=model,
    tokenizer=tokenizer,
    device=device,
    prompt="Once upon a time",
    max_new_tokens=100
)
print(text)

Requirements

  • Python 3.8+
  • PyTorch 2.0+
  • tiktoken
  • matplotlib

About

A PyTorch implementation of a GPT-style language model built from scratch.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages