Karpathy's Neural Language Models - Implementation Journey

This repository contains implementations of increasingly sophisticated neural language models, progressing from simple statistical models to advanced transformer architectures.

Overview

This is a learning-focused repository documenting the progression from basic character-level language models to more complex deep learning approaches. The work is largely inspired by Andrej Karpathy's educational content on neural networks and language models.

Project Structure

Main Notebooks (Progressive Difficulty)

1-bigram_language_model.ipynb
- Simplest language model using bigrams (2-character sequences)
- Both statistical approach (counting bigrams) and neural network approach
- Demonstrates character encoding, probability distributions, and sampling
- Foundation for understanding sequence modeling
2-trigram_language_model.ipynb
- Extends bigram to trigrams (3-character sequences)
- Takes 2 input characters to predict the next character
- Increases context and model expressiveness
- Shows how to build 2D count matrices for higher-order n-grams
3-mlp_language_model.ipynb
- Multi-Layer Perceptron approach to language modeling
- Uses context windows (typically block_size=3)
- Introduces embedding layers for character encoding
- More sophisticated than n-gram counting methods
4-mlp_improvement.ipynb
- Enhanced MLP with better architecture and training
- Implements proper train/dev/test split
- Uses Xavier uniform initialization for better convergence
- Demonstrates hyperparameter tuning and learning rate scheduling
- Generates coherent name sequences
5-activations_gradients_batchnorm.ipynb
- Deep dive into activation functions and gradient flow
- Explores batch normalization effects
- Analyzes internal layer dynamics during training
- Addresses vanishing/exploding gradient problems
6-backprop_ninja.ipynb
- Manual backpropagation implementation
- Detailed breakdown of gradient computations
- Understanding automatic differentiation mechanics
- Advanced debugging techniques for neural networks
7-wavenet.ipynb
- WaveNet-style dilated convolutions for sequence modeling
- Exponentially expanding receptive fields
- Improved context understanding
- More sophisticated architecture for better performance

Supporting Files

`/gpt_dev/` - GPT-Style Transformer Development

gpt_dev.ipynb: Transformer model implementation with multi-head self-attention
v2.py: Full transformer decoder implementation
- Multi-head self-attention
- Feed-forward networks
- Positional embeddings
- Configurable layers and heads
- Training loop with evaluation metrics
- Text generation capabilities

Key Concepts Covered

Fundamental Concepts

Character-level tokenization
One-hot encoding
Probability distributions and sampling
Loss functions (negative log-likelihood, cross-entropy)

Model Architectures

N-gram Models: Statistical baseline using counting
Feedforward Networks: MLPs with embeddings and hidden layers
Convolutional Approaches: Dilated convolutions (WaveNet)
Transformers: Self-attention mechanisms and positional encoding

Model Performance Progression

The notebooks demonstrate progression in model capability:

Bigram: Simple character patterns, coherent but limited
Trigram: More context, better structure
MLP: Neural approach, learns non-linear patterns
WaveNet: Dilated receptive fields, richer representations
Transformer: Self-attention, long-range dependencies, state-of-the-art results

Notes

All models use character-level tokenization
Device detection (MPS for Apple Silicon, CPU fallback)
Generator seeds for reproducibility
Evaluation metrics tracked during training
Sampling/generation functions included in each implementation

References

This work is educational and builds upon concepts from:

Andrej Karpathy's lecture series on neural networks
The Transformer paper and subsequent work
Classical language modeling techniques

Last Updated: February 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Karpathy's Neural Language Models - Implementation Journey

Overview

Project Structure

Main Notebooks (Progressive Difficulty)

Supporting Files

`/gpt_dev/` - GPT-Style Transformer Development

Key Concepts Covered

Fundamental Concepts

Model Architectures

Model Performance Progression

Notes

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
gpt_dev		gpt_dev
.gitignore		.gitignore
1-bigram_language_model.ipynb		1-bigram_language_model.ipynb
2-trigram_language_model.ipynb		2-trigram_language_model.ipynb
3-mlp_language_model.ipynb		3-mlp_language_model.ipynb
4-mlp_improvement.ipynb		4-mlp_improvement.ipynb
5-activations_gradients_batchnorm.ipynb		5-activations_gradients_batchnorm.ipynb
6-backprop_ninja.ipynb		6-backprop_ninja.ipynb
7-wavenet.ipynb		7-wavenet.ipynb
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Karpathy's Neural Language Models - Implementation Journey

Overview

Project Structure

Main Notebooks (Progressive Difficulty)

Supporting Files

/gpt_dev/ - GPT-Style Transformer Development

Key Concepts Covered

Fundamental Concepts

Model Architectures

Model Performance Progression

Notes

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`/gpt_dev/` - GPT-Style Transformer Development

Packages