LLM

Complete implementations of large language models including all sub-components. Also includes training/fine-tuning implementations (i.e LoRA, QLoRA).

Repository Structure

finetuning/
├── lora/              # LoRA implementation & intuition
└── qlora/             # QLoRA implementation & intuition

models/
├── gpt/               # GPT-1 style implementation
└── llama/             # LLaMA-1/2 implementation

What's Implemented

GPT (models/gpt/):

Multi-head self-attention with causal masking
Learned positional embeddings
LayerNorm, feedforward blocks
Training loop with loss estimation

LLaMA (models/llama/):

Multi-head attention with Rotary Position Embeddings (RoPE)
RMSNorm (instead of LayerNorm)
SwiGLU feedforward network
Top-p sampling for generation
SentencePiece tokenizer

Usage

GPT:

cd models/gpt
python train.py

LLaMA:

cd models/llama
python generate.py

Default Configurations

Parameter	GPT	LLaMA
Embedding dim	384	4096
Hidden dim	-	11008
Heads	6	32
Layers	6	32
Context length	256	2048
Dropout	0.2	0.0

References

Attention Is All You Need - Vaswani et al., 2017
Layer Normalization - Ba et al., 2016
Root Mean Square Layer Normalization - Zhang & Sennrich, 2019
RoFormer: Enhanced Transformer with Rotary Position Embedding - Su et al., 2021
LLaMA: Open and Efficient Foundation Language Models - Touvron et al., 2023
LLaMA 2: Open Foundation and Fine-Tuned Chat Models - Touvron et al., 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM

Repository Structure

What's Implemented

Usage

Default Configurations

References

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

LLM

Repository Structure

What's Implemented

Usage

Default Configurations

References