My Journey Into the World of Language Models

This repository is my journal for hands-on exploration of building and training large language model from scratch. Before going into fine-tuning and playing with pre-built LLM, I wanted to understand under the hood working of LLMs. Here's what I learned along the way.

Stages of Journey

1. Text Processing & Embeddings

Built basic tokenizer to convert raw text into model comprehensible token ids.
Converted token ids to word embeddings to capture semantic relationships between tokens.
Created dataset for LLM pretraining using small custom text data.

2. Attention is all you need

Implemented self-attention mechanism.
Progressively built masked attention to focus only on current and previous tokens.
Created multi-head attention to let the model learn variety of relationships at the same level of details (nouns, verbs, subject, objects, punctuation).

3. Building components of GPT-style Model

Implemented layernorm, shortcut-connection, feedforward neural network.
Combined these into the core transformer building blocks
Architected a GPT2-style language model using transformer blocks

4. Pretraining the LLM for next-word prediction task

Pretrained the LLM on prepared dataset to predict the next word based input sequence.
Implemented temperature scaling and top-k sampling for controlled text generation.
Successfully loaded and utilized OpenAI's pre-trained GPT2 weights.

5. Fine-Tuning for Spam Classification

Used OpenAI's pre-trained weights.
Modified LLM architecture making it suitable to the classification task.
Fine-tuned the model for practical spam classification.

6. Supervised Fine-Tuning (Instruction Fine-Tuning)

Implemented instruction fine-tuning to make the model follow specific commands.
Explored parameter-efficient fine-tuning using LoRA (Low-Rank Adaptation).
- Experimented with different LoRA ranks (8, 16, 32, 128) to find optimal balance.
- Compared performance with and without considering instruction tokens in training loss.

7. Preference Alignment

Implemented preference alignment to train LLM to align its response with user's preference like tone, structure, details, relevance, etc.
Used Direct Preference Optimization approach over instruction-tuned LLM.

Key Learnings

"What I cannot create,
I do not understand"
- Richard P. Feynman

Having just an abstract idea of a concept isn't enough to solve a problem fundamentally. With above aim in my mind and freely following the curiosity about workings of a LLM, I started my LLM journey. In this journey, I have tried to piece together each component - from basic tokenization to intricate attention mechanisms to autoregressively generate coherent text - by building them from zero.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Data		Data
Notebooks		Notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

My Journey Into the World of Language Models

Stages of Journey

1. Text Processing & Embeddings

2. Attention is all you need

3. Building components of GPT-style Model

4. Pretraining the LLM for next-word prediction task

5. Fine-Tuning for Spam Classification

6. Supervised Fine-Tuning (Instruction Fine-Tuning)

7. Preference Alignment

Key Learnings

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

My Journey Into the World of Language Models

Stages of Journey

1. Text Processing & Embeddings

2. Attention is all you need

3. Building components of GPT-style Model

4. Pretraining the LLM for next-word prediction task

5. Fine-Tuning for Spam Classification

6. Supervised Fine-Tuning (Instruction Fine-Tuning)

7. Preference Alignment

Key Learnings

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages