From cb89a1022fa9bacd88ab5f52b2ecf44e380c28bd Mon Sep 17 00:00:00 2001
From: DazzedUpDas <161445393+DazzedUpDas@users.noreply.github.com>
Date: Fri, 21 Feb 2025 12:46:28 -0500
Subject: [PATCH 1/7] Update README.md

Added Links for tokenization, cross entropy in LLMs and LLM Distillation
---
 README.md | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index dd17f64..733b2a0 100644
--- a/README.md
+++ b/README.md
@@ -104,6 +104,7 @@ This section talks about the key aspects of LLM architecture.
 - [Umar Jamil: Attention](https://www.youtube.com/watch?v=bCz4OMemCcA&) ![Easy](https://img.shields.io/badge/difficulty-Easy-green)
 
 ##### Tokenization
+- [Tokenization in large language models, explained](https://seantrott.substack.com/p/tokenization-in-large-language-models)![Easy](https://img.shields.io/badge/difficulty-Easy-green)
 ##### Positional Encoding
 ###### Rotational Positional Encoding
 ###### Rotary Positional Encoding
@@ -119,17 +120,17 @@ This section talks about the key aspects of LLM architecture.
 
 #### ◻️Loss
 ##### Cross-Entropy Loss
-
+- [Cross Entropy in Large Language Models (LLMs)](https://medium.com/ai-assimilating-intelligence/cross-entropy-in-large-language-models-llms-4f1c842b5fca)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 ---
 ### 🟩 Agentic LLMs
--[Agentic LLMs Deep Dive](https://www.aimon.ai/posts/deep-dive-into-agentic-llm-frameworks)
+-[Agentic LLMs Deep Dive](https://www.aimon.ai/posts/deep-dive-into-agentic-llm-frameworks)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 This section talks about various aspects of the Agentic LLMs
 
 ---
 ### 🟩 Methodology
 This section tries to cover various methodologies used in LLMs. 
 #### ◻️Distillation
-
+- [LLM distillation demystified: a complete guide](https://snorkel.ai/blog/llm-distillation-demystified-a-complete-guide/)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 ---
 ### 🟩 Datasets
 

From 4e3a566b9a6401c68969a0de32732164f825410d Mon Sep 17 00:00:00 2001
From: DazzedUpDas <161445393+DazzedUpDas@users.noreply.github.com>
Date: Thu, 6 Mar 2025 15:06:05 -0500
Subject: [PATCH 2/7] Updated README.md

Added the links from the Notion Site
---
 README.md | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 733b2a0..334b54c 100644
--- a/README.md
+++ b/README.md
@@ -57,7 +57,7 @@ Each area has multiple types of subtopics each of which will go more in depth. I
   - [◻️Torch Fundamentals](#️torch-fundamentals)
 - [🟩 Deployment](#-deployment)
 - [🟩 Engineering](#-engineering)
-  - [◻️Flash Attention 2](#️flash-attention-2)
+  - [◻️Flash Attention 2]
   - [◻️KV Cache](#️kv-cache)
   - [◻️Batched Inference](#️batched-inference)
   - [◻️Python Advanced](#️python-advanced)
@@ -117,7 +117,8 @@ This section talks about the key aspects of LLM architecture.
 - [Umar Jamil: Llama 2 from Scratch](https://www.youtube.com/watch?v=oM4VmoabDAI) ![Hard](https://img.shields.io/badge/difficulty-Hard-red)
 
 #### ◻️Attention
-
+- [Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained](https://www.youtube.com/watch?v=o68RRGxAtDo)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
+- 
 #### ◻️Loss
 ##### Cross-Entropy Loss
 - [Cross Entropy in Large Language Models (LLMs)](https://medium.com/ai-assimilating-intelligence/cross-entropy-in-large-language-models-llms-4f1c842b5fca)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
@@ -131,6 +132,7 @@ This section talks about various aspects of the Agentic LLMs
 This section tries to cover various methodologies used in LLMs. 
 #### ◻️Distillation
 - [LLM distillation demystified: a complete guide](https://snorkel.ai/blog/llm-distillation-demystified-a-complete-guide/)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
+- [Distilling step-by-step: Outperforming larger language models with less training data and smaller model sizes](https://research.google/blog/distilling-step-by-step-outperforming-larger-language-models-with-less-training-data-and-smaller-model-sizes/)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 ---
 ### 🟩 Datasets
 
@@ -176,6 +178,7 @@ This section tries to cover various methodologies used in LLMs.
 - [[vLLM] LLM Inference Optimizations: Chunked Prefill and Decode-Maximal Batching](https://medium.com/byte-sized-ai/llm-inference-optimizations-2-chunked-prefill-764407b3a67a)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 - [LLM Inference Series: 2. The two-phase process behind LLMs’ responses](https://medium.com/@plienhar/llm-inference-series-2-the-two-phase-process-behind-llms-responses-1ff1ff021cd5)![Hard](https://img.shields.io/badge/difficulty-Hard-red)
 - [LLM Inference Series: 4. KV caching, a deeper look](https://medium.com/@plienhar/llm-inference-series-4-kv-caching-a-deeper-look-4ba9a77746c8)![Hard](https://img.shields.io/badge/difficulty-Hard-red)
+- [How KV caches impact time to first token for LLMs](https://www.glean.com/blog/glean-kv-caches-llm-latency)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 
 
 
@@ -189,6 +192,7 @@ This section tries to cover various methodologies used in LLMs.
 - [PyTorch Engineers Meeting Talk](https://www.youtube.com/watch?v=MQwryfkydc0) ![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 - [Hugging Face Collab Blog](https://huggingface.co/blog/unsloth-trl) ![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 #### ◻️Flash Attention 2
+-[FLASHATTENTION: Fast and Memory-Efficient Exact Attention with IO-Awareness]((https://arxiv.org/pdf/2205.14135)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 #### ◻️KV Cache
 #### ◻️Batched Inference
 #### ◻️Python Advanced
@@ -199,6 +203,8 @@ This section tries to cover various methodologies used in LLMs.
 - [CUDA / GPU Mode lecture Talk](https://www.youtube.com/watch?v=hfb_AIhDYnA) ![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 #### ◻️JAX / XLA JIT compilers
 #### ◻️Model Exporting (vLLM, Llama.cpp, QLoRA)
+-[QLoRA: Fine-Tuning Large Language Models (LLM’s)](https://medium.com/@dillipprasad60/qlora-explained-a-deep-dive-into-parametric-efficient-fine-tuning-in-large-language-models-llms-c1a4794b1766)![Hard](https://img.shields.io/badge/difficulty-Hard-red)
+-[]()
 #### ◻️ML Debugging
 
 ---

From 246a5651e96390cf37d90ae14e01888286519c79 Mon Sep 17 00:00:00 2001
From: DazzedUpDas <161445393+DazzedUpDas@users.noreply.github.com>
Date: Sat, 8 Mar 2025 12:11:24 -0500
Subject: [PATCH 3/7] Update README.md

Completely added all the Links of the Notion Page here. Looking forward to the new list
---
 README.md | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 334b54c..949c3d5 100644
--- a/README.md
+++ b/README.md
@@ -99,12 +99,18 @@ Each area has multiple types of subtopics each of which will go more in depth. I
 ### 🟩 Model Architecture
 This section talks about the key aspects of LLM architecture.
 > 📝 Try to cover basics of Transformers, then understand the GPT architecture before diving deeper into other concepts
+- [Numbers every LLM Developer should know](https://github.com/ray-project/llm-numbers#1-mb-gpu-memory-required-for-1-token-of-output-with-a-13b-parameter-model)![Easy](https://img.shields.io/badge/difficulty-Easy-green)
 #### ◻️Transformer Architecture
 - [Jay Alamar - Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/) ![Easy](https://img.shields.io/badge/difficulty-Easy-green)
 - [Umar Jamil: Attention](https://www.youtube.com/watch?v=bCz4OMemCcA&) ![Easy](https://img.shields.io/badge/difficulty-Easy-green)
+- [Large Scale Transformer model training with Tensor Parallel (TP)](https://pytorch.org/tutorials/intermediate/TP_tutorial.html)![Easy](https://img.shields.io/badge/difficulty-Easy-green)
+- [RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs](https://www.youtube.com/watch?v=GQPOtyITy54&t=66s)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
+- [Rotary Embeddings: A Relative Revolution | EleutherAI Blog](https://blog.eleuther.ai/rotary-embeddings/)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 
 ##### Tokenization
 - [Tokenization in large language models, explained](https://seantrott.substack.com/p/tokenization-in-large-language-models)![Easy](https://img.shields.io/badge/difficulty-Easy-green)
+- [LLM Tokenizers Explained: BPE Encoding, WordPiece and SentencePiece](https://www.youtube.com/watch?v=hL4ZnAWSyuU)![Easy](https://img.shields.io/badge/difficulty-Easy-green)
+- [SentencePiece Tokenizer Demystified](https://towardsdatascience.com/sentencepiece-tokenizer-demystified-d0a3aac19b15/)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 ##### Positional Encoding
 ###### Rotational Positional Encoding
 ###### Rotary Positional Encoding
@@ -141,10 +147,12 @@ This section tries to cover various methodologies used in LLMs.
 #### ◻️Training
 #### ◻️Inference
 ##### RAG
+- [Introduction to Facebook AI Similarity Search (Faiss)](https://www.pinecone.io/learn/series/faiss/faiss-tutorial/)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 #### ◻️Prompting
 
 ---
 ### 🟩 FineTuning
+-[Deep Learning Tuning Playbook](https://github.com/google-research/tuning_playbook)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 #### ◻️Quantized FineTuning
 - [Umar Jamil: Quantization](https://www.youtube.com/watch?v=0VdNflU08yA) ![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 #### ◻️LoRA
@@ -154,9 +162,10 @@ This section tries to cover various methodologies used in LLMs.
 #### ◻️ORPO
 #### ◻️RLHF
 - [Umar Jamil: RLHF Explained](https://www.youtube.com/watch?v=qGyFrqc34yc) ![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
-
+- [Policy Gradients: The Foundation of RLHF](https://cameronrwolfe.substack.com/p/policy-gradients-the-foundation-of)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 ---
 ### 🟩 Quantization
+- [HuggingFace Quantization Overview](https://huggingface.co/docs/transformers/main/en/quantization/overview)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 #### ◻️Post Training Quantization
 ##### Static/Dynamic Quantization
 ##### GPTQ
@@ -179,7 +188,7 @@ This section tries to cover various methodologies used in LLMs.
 - [LLM Inference Series: 2. The two-phase process behind LLMs’ responses](https://medium.com/@plienhar/llm-inference-series-2-the-two-phase-process-behind-llms-responses-1ff1ff021cd5)![Hard](https://img.shields.io/badge/difficulty-Hard-red)
 - [LLM Inference Series: 4. KV caching, a deeper look](https://medium.com/@plienhar/llm-inference-series-4-kv-caching-a-deeper-look-4ba9a77746c8)![Hard](https://img.shields.io/badge/difficulty-Hard-red)
 - [How KV caches impact time to first token for LLMs](https://www.glean.com/blog/glean-kv-caches-llm-latency)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
-
+- [Generation with LLMs](https://charoori.notion.site/Generation-with-LLMs-17d311b8ed1e819b99a3e79112e00ca6)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 
 
 ---
@@ -191,7 +200,12 @@ This section tries to cover various methodologies used in LLMs.
 - [PyTorch Conference Mini Talk](https://www.youtube.com/watch?v=PdtKkc5jB4g) ![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 - [PyTorch Engineers Meeting Talk](https://www.youtube.com/watch?v=MQwryfkydc0) ![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 - [Hugging Face Collab Blog](https://huggingface.co/blog/unsloth-trl) ![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
+- [Summary of Designing Machine Learning Systems](https://github.com/serodriguez68/designing-ml-systems-summary)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
+- [System Design for Recommendations and Search](https://eugeneyan.com/writing/system-design-for-discovery/)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
+- [Recommender Systems, Not Just Recommender Models](https://medium.com/nvidia-merlin/recommender-systems-not-just-recommender-models-485c161c755e)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
+- [Blueprints for recommender system architectures: 10th anniversary edition](https://amatria.in/blog/RecsysArchitectures)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 #### ◻️Flash Attention 2
+- [Flash Attention Machine Learning](https://www.youtube.com/watch?v=N1EZpa7lZc8)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 -[FLASHATTENTION: Fast and Memory-Efficient Exact Attention with IO-Awareness]((https://arxiv.org/pdf/2205.14135)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 #### ◻️KV Cache
 #### ◻️Batched Inference
@@ -249,6 +263,7 @@ This section tries to cover various methodologies used in LLMs.
 
 ### 🟩 Misc
 - [Tweet on what to learn in ML (RT by Karpathy)](https://x.com/youraimarketer/status/1778992208697258152) ![Hard](https://img.shields.io/badge/difficulty-Hard-red)
+- [Schedule - CS 685, Spring 2024, UMass Amherst](https://people.cs.umass.edu/~miyyer/cs685/schedule.html)![Hard](https://img.shields.io/badge/difficulty-Hard-red)
 ---
 
 

From 987d6f35bac13d9457fd611d9ade28aef9be7a8f Mon Sep 17 00:00:00 2001
From: DazzedUpDas <161445393+DazzedUpDas@users.noreply.github.com>
Date: Sat, 8 Mar 2025 12:15:36 -0500
Subject: [PATCH 4/7] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 949c3d5..385c238 100644
--- a/README.md
+++ b/README.md
@@ -206,7 +206,7 @@ This section tries to cover various methodologies used in LLMs.
 - [Blueprints for recommender system architectures: 10th anniversary edition](https://amatria.in/blog/RecsysArchitectures)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 #### ◻️Flash Attention 2
 - [Flash Attention Machine Learning](https://www.youtube.com/watch?v=N1EZpa7lZc8)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
--[FLASHATTENTION: Fast and Memory-Efficient Exact Attention with IO-Awareness]((https://arxiv.org/pdf/2205.14135)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
+- [FLASHATTENTION: Fast and Memory-Efficient Exact Attention with IO-Awareness]((https://arxiv.org/pdf/2205.14135)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 #### ◻️KV Cache
 #### ◻️Batched Inference
 #### ◻️Python Advanced

From f4e77d17996f2b5412caed9e7ae4a5ea59bb1d94 Mon Sep 17 00:00:00 2001
From: DazzedUpDas <161445393+DazzedUpDas@users.noreply.github.com>
Date: Sat, 8 Mar 2025 12:16:42 -0500
Subject: [PATCH 5/7] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 385c238..20c54f2 100644
--- a/README.md
+++ b/README.md
@@ -206,7 +206,7 @@ This section tries to cover various methodologies used in LLMs.
 - [Blueprints for recommender system architectures: 10th anniversary edition](https://amatria.in/blog/RecsysArchitectures)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 #### ◻️Flash Attention 2
 - [Flash Attention Machine Learning](https://www.youtube.com/watch?v=N1EZpa7lZc8)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
-- [FLASHATTENTION: Fast and Memory-Efficient Exact Attention with IO-Awareness]((https://arxiv.org/pdf/2205.14135)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
+- [FLASHATTENTION: Fast and Memory-Efficient Exact Attention with IO-Awareness](https://arxiv.org/pdf/2205.14135)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow)
 #### ◻️KV Cache
 #### ◻️Batched Inference
 #### ◻️Python Advanced

From 5ec4516c92b885c5a54c64c432894cc640cb2995 Mon Sep 17 00:00:00 2001
From: DazzedUpDas <161445393+DazzedUpDas@users.noreply.github.com>
Date: Sat, 8 Mar 2025 12:21:04 -0500
Subject: [PATCH 6/7] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 20c54f2..62a6bb1 100644
--- a/README.md
+++ b/README.md
@@ -57,7 +57,7 @@ Each area has multiple types of subtopics each of which will go more in depth. I
   - [◻️Torch Fundamentals](#️torch-fundamentals)
 - [🟩 Deployment](#-deployment)
 - [🟩 Engineering](#-engineering)
-  - [◻️Flash Attention 2]
+  - [◻️Flash Attention 2](#Flash Attention 2)
   - [◻️KV Cache](#️kv-cache)
   - [◻️Batched Inference](#️batched-inference)
   - [◻️Python Advanced](#️python-advanced)

From 0b52087ba5dc1ec6223865e143a7e756e514b4c1 Mon Sep 17 00:00:00 2001
From: DazzedUpDas <161445393+DazzedUpDas@users.noreply.github.com>
Date: Sat, 8 Mar 2025 12:22:56 -0500
Subject: [PATCH 7/7] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 62a6bb1..171db03 100644
--- a/README.md
+++ b/README.md
@@ -57,7 +57,7 @@ Each area has multiple types of subtopics each of which will go more in depth. I
   - [◻️Torch Fundamentals](#️torch-fundamentals)
 - [🟩 Deployment](#-deployment)
 - [🟩 Engineering](#-engineering)
-  - [◻️Flash Attention 2](#Flash Attention 2)
+  - [◻️Flash Attention 2](#Flash-Attention-2)
   - [◻️KV Cache](#️kv-cache)
   - [◻️Batched Inference](#️batched-inference)
   - [◻️Python Advanced](#️python-advanced)