Skip to content

BlaiseMarvin/Language-Modelling

Repository files navigation

Karpathy's Neural Language Models - Implementation Journey

This repository contains implementations of increasingly sophisticated neural language models, progressing from simple statistical models to advanced transformer architectures.

Overview

This is a learning-focused repository documenting the progression from basic character-level language models to more complex deep learning approaches. The work is largely inspired by Andrej Karpathy's educational content on neural networks and language models.

Project Structure

Main Notebooks (Progressive Difficulty)

  1. 1-bigram_language_model.ipynb

    • Simplest language model using bigrams (2-character sequences)
    • Both statistical approach (counting bigrams) and neural network approach
    • Demonstrates character encoding, probability distributions, and sampling
    • Foundation for understanding sequence modeling
  2. 2-trigram_language_model.ipynb

    • Extends bigram to trigrams (3-character sequences)
    • Takes 2 input characters to predict the next character
    • Increases context and model expressiveness
    • Shows how to build 2D count matrices for higher-order n-grams
  3. 3-mlp_language_model.ipynb

    • Multi-Layer Perceptron approach to language modeling
    • Uses context windows (typically block_size=3)
    • Introduces embedding layers for character encoding
    • More sophisticated than n-gram counting methods
  4. 4-mlp_improvement.ipynb

    • Enhanced MLP with better architecture and training
    • Implements proper train/dev/test split
    • Uses Xavier uniform initialization for better convergence
    • Demonstrates hyperparameter tuning and learning rate scheduling
    • Generates coherent name sequences
  5. 5-activations_gradients_batchnorm.ipynb

    • Deep dive into activation functions and gradient flow
    • Explores batch normalization effects
    • Analyzes internal layer dynamics during training
    • Addresses vanishing/exploding gradient problems
  6. 6-backprop_ninja.ipynb

    • Manual backpropagation implementation
    • Detailed breakdown of gradient computations
    • Understanding automatic differentiation mechanics
    • Advanced debugging techniques for neural networks
  7. 7-wavenet.ipynb

    • WaveNet-style dilated convolutions for sequence modeling
    • Exponentially expanding receptive fields
    • Improved context understanding
    • More sophisticated architecture for better performance

Supporting Files

/gpt_dev/ - GPT-Style Transformer Development

  • gpt_dev.ipynb: Transformer model implementation with multi-head self-attention
  • v2.py: Full transformer decoder implementation
    • Multi-head self-attention
    • Feed-forward networks
    • Positional embeddings
    • Configurable layers and heads
    • Training loop with evaluation metrics
    • Text generation capabilities

Key Concepts Covered

Fundamental Concepts

  • Character-level tokenization
  • One-hot encoding
  • Probability distributions and sampling
  • Loss functions (negative log-likelihood, cross-entropy)

Model Architectures

  • N-gram Models: Statistical baseline using counting
  • Feedforward Networks: MLPs with embeddings and hidden layers
  • Convolutional Approaches: Dilated convolutions (WaveNet)
  • Transformers: Self-attention mechanisms and positional encoding

Model Performance Progression

The notebooks demonstrate progression in model capability:

  1. Bigram: Simple character patterns, coherent but limited
  2. Trigram: More context, better structure
  3. MLP: Neural approach, learns non-linear patterns
  4. WaveNet: Dilated receptive fields, richer representations
  5. Transformer: Self-attention, long-range dependencies, state-of-the-art results

Notes

  • All models use character-level tokenization
  • Device detection (MPS for Apple Silicon, CPU fallback)
  • Generator seeds for reproducibility
  • Evaluation metrics tracked during training
  • Sampling/generation functions included in each implementation

References

This work is educational and builds upon concepts from:

  • Andrej Karpathy's lecture series on neural networks
  • The Transformer paper and subsequent work
  • Classical language modeling techniques

Last Updated: February 2026

About

Repo on language modelling from simple bigram models to OpenAI GPT like transformer models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors