This repository contains an exploration of the fundamental components of Generative Pre-trained Transformers (GPTs), focusing on the core building blocks of large language models.
This project explored the key elements of GPTs, including:
- Self-attention: Mechanisms to determine token-to-token relationships.
- Positional encoding: Adding order information to input tokens.
- Masking: Enforcing autoregressive token prediction.
- Transformer blocks: Stacking multi-head self-attention layers with feed-forward networks.
- AdderGPT: A minimal GPT model trained to accurately simulate three-digit addition.
-
gpt_from_scratch.ipynb: A notebook that walks through:- Building and training a simplified GPT from scratch.
- Visualizing self-attention and token interactions.
- Training AdderGPT to solve addition problems.
-
Visualization: Attention patterns are plotted to showcase how the model learns to focus on specific tokens.
To get started, ensure you have the following installed:
- Python 3.8 or higher
- PyTorch
- Matplotlib (for visualizations)
Install the required dependencies using pip install torch matplotlib jupyter.
To run the project:
- Open
gpt_from_scratch.ipynbin Jupyter or Google Colab. - Follow the step-by-step implementation and training of AdderGPT.