From cb89a1022fa9bacd88ab5f52b2ecf44e380c28bd Mon Sep 17 00:00:00 2001 From: DazzedUpDas <161445393+DazzedUpDas@users.noreply.github.com> Date: Fri, 21 Feb 2025 12:46:28 -0500 Subject: [PATCH 1/7] Update README.md Added Links for tokenization, cross entropy in LLMs and LLM Distillation --- README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index dd17f64..733b2a0 100644 --- a/README.md +++ b/README.md @@ -104,6 +104,7 @@ This section talks about the key aspects of LLM architecture. - [Umar Jamil: Attention](https://www.youtube.com/watch?v=bCz4OMemCcA&) ![Easy](https://img.shields.io/badge/difficulty-Easy-green) ##### Tokenization +- [Tokenization in large language models, explained](https://seantrott.substack.com/p/tokenization-in-large-language-models)![Easy](https://img.shields.io/badge/difficulty-Easy-green) ##### Positional Encoding ###### Rotational Positional Encoding ###### Rotary Positional Encoding @@ -119,17 +120,17 @@ This section talks about the key aspects of LLM architecture. #### ◻️Loss ##### Cross-Entropy Loss - +- [Cross Entropy in Large Language Models (LLMs)](https://medium.com/ai-assimilating-intelligence/cross-entropy-in-large-language-models-llms-4f1c842b5fca)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) --- ### 🟩 Agentic LLMs --[Agentic LLMs Deep Dive](https://www.aimon.ai/posts/deep-dive-into-agentic-llm-frameworks) +-[Agentic LLMs Deep Dive](https://www.aimon.ai/posts/deep-dive-into-agentic-llm-frameworks)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) This section talks about various aspects of the Agentic LLMs --- ### 🟩 Methodology This section tries to cover various methodologies used in LLMs. #### ◻️Distillation - +- [LLM distillation demystified: a complete guide](https://snorkel.ai/blog/llm-distillation-demystified-a-complete-guide/)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) --- ### 🟩 Datasets From 4e3a566b9a6401c68969a0de32732164f825410d Mon Sep 17 00:00:00 2001 From: DazzedUpDas <161445393+DazzedUpDas@users.noreply.github.com> Date: Thu, 6 Mar 2025 15:06:05 -0500 Subject: [PATCH 2/7] Updated README.md Added the links from the Notion Site --- README.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 733b2a0..334b54c 100644 --- a/README.md +++ b/README.md @@ -57,7 +57,7 @@ Each area has multiple types of subtopics each of which will go more in depth. I - [◻️Torch Fundamentals](#️torch-fundamentals) - [🟩 Deployment](#-deployment) - [🟩 Engineering](#-engineering) - - [◻️Flash Attention 2](#️flash-attention-2) + - [◻️Flash Attention 2] - [◻️KV Cache](#️kv-cache) - [◻️Batched Inference](#️batched-inference) - [◻️Python Advanced](#️python-advanced) @@ -117,7 +117,8 @@ This section talks about the key aspects of LLM architecture. - [Umar Jamil: Llama 2 from Scratch](https://www.youtube.com/watch?v=oM4VmoabDAI) ![Hard](https://img.shields.io/badge/difficulty-Hard-red) #### ◻️Attention - +- [Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained](https://www.youtube.com/watch?v=o68RRGxAtDo)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) +- #### ◻️Loss ##### Cross-Entropy Loss - [Cross Entropy in Large Language Models (LLMs)](https://medium.com/ai-assimilating-intelligence/cross-entropy-in-large-language-models-llms-4f1c842b5fca)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) @@ -131,6 +132,7 @@ This section talks about various aspects of the Agentic LLMs This section tries to cover various methodologies used in LLMs. #### ◻️Distillation - [LLM distillation demystified: a complete guide](https://snorkel.ai/blog/llm-distillation-demystified-a-complete-guide/)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) +- [Distilling step-by-step: Outperforming larger language models with less training data and smaller model sizes](https://research.google/blog/distilling-step-by-step-outperforming-larger-language-models-with-less-training-data-and-smaller-model-sizes/)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) --- ### 🟩 Datasets @@ -176,6 +178,7 @@ This section tries to cover various methodologies used in LLMs. - [[vLLM] LLM Inference Optimizations: Chunked Prefill and Decode-Maximal Batching](https://medium.com/byte-sized-ai/llm-inference-optimizations-2-chunked-prefill-764407b3a67a)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) - [LLM Inference Series: 2. The two-phase process behind LLMs’ responses](https://medium.com/@plienhar/llm-inference-series-2-the-two-phase-process-behind-llms-responses-1ff1ff021cd5)![Hard](https://img.shields.io/badge/difficulty-Hard-red) - [LLM Inference Series: 4. KV caching, a deeper look](https://medium.com/@plienhar/llm-inference-series-4-kv-caching-a-deeper-look-4ba9a77746c8)![Hard](https://img.shields.io/badge/difficulty-Hard-red) +- [How KV caches impact time to first token for LLMs](https://www.glean.com/blog/glean-kv-caches-llm-latency)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) @@ -189,6 +192,7 @@ This section tries to cover various methodologies used in LLMs. - [PyTorch Engineers Meeting Talk](https://www.youtube.com/watch?v=MQwryfkydc0) ![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) - [Hugging Face Collab Blog](https://huggingface.co/blog/unsloth-trl) ![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) #### ◻️Flash Attention 2 +-[FLASHATTENTION: Fast and Memory-Efficient Exact Attention with IO-Awareness]((https://arxiv.org/pdf/2205.14135)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) #### ◻️KV Cache #### ◻️Batched Inference #### ◻️Python Advanced @@ -199,6 +203,8 @@ This section tries to cover various methodologies used in LLMs. - [CUDA / GPU Mode lecture Talk](https://www.youtube.com/watch?v=hfb_AIhDYnA) ![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) #### ◻️JAX / XLA JIT compilers #### ◻️Model Exporting (vLLM, Llama.cpp, QLoRA) +-[QLoRA: Fine-Tuning Large Language Models (LLM’s)](https://medium.com/@dillipprasad60/qlora-explained-a-deep-dive-into-parametric-efficient-fine-tuning-in-large-language-models-llms-c1a4794b1766)![Hard](https://img.shields.io/badge/difficulty-Hard-red) +-[]() #### ◻️ML Debugging --- From 246a5651e96390cf37d90ae14e01888286519c79 Mon Sep 17 00:00:00 2001 From: DazzedUpDas <161445393+DazzedUpDas@users.noreply.github.com> Date: Sat, 8 Mar 2025 12:11:24 -0500 Subject: [PATCH 3/7] Update README.md Completely added all the Links of the Notion Page here. Looking forward to the new list --- README.md | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 334b54c..949c3d5 100644 --- a/README.md +++ b/README.md @@ -99,12 +99,18 @@ Each area has multiple types of subtopics each of which will go more in depth. I ### 🟩 Model Architecture This section talks about the key aspects of LLM architecture. > 📝 Try to cover basics of Transformers, then understand the GPT architecture before diving deeper into other concepts +- [Numbers every LLM Developer should know](https://github.com/ray-project/llm-numbers#1-mb-gpu-memory-required-for-1-token-of-output-with-a-13b-parameter-model)![Easy](https://img.shields.io/badge/difficulty-Easy-green) #### ◻️Transformer Architecture - [Jay Alamar - Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/) ![Easy](https://img.shields.io/badge/difficulty-Easy-green) - [Umar Jamil: Attention](https://www.youtube.com/watch?v=bCz4OMemCcA&) ![Easy](https://img.shields.io/badge/difficulty-Easy-green) +- [Large Scale Transformer model training with Tensor Parallel (TP)](https://pytorch.org/tutorials/intermediate/TP_tutorial.html)![Easy](https://img.shields.io/badge/difficulty-Easy-green) +- [RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs](https://www.youtube.com/watch?v=GQPOtyITy54&t=66s)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) +- [Rotary Embeddings: A Relative Revolution | EleutherAI Blog](https://blog.eleuther.ai/rotary-embeddings/)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) ##### Tokenization - [Tokenization in large language models, explained](https://seantrott.substack.com/p/tokenization-in-large-language-models)![Easy](https://img.shields.io/badge/difficulty-Easy-green) +- [LLM Tokenizers Explained: BPE Encoding, WordPiece and SentencePiece](https://www.youtube.com/watch?v=hL4ZnAWSyuU)![Easy](https://img.shields.io/badge/difficulty-Easy-green) +- [SentencePiece Tokenizer Demystified](https://towardsdatascience.com/sentencepiece-tokenizer-demystified-d0a3aac19b15/)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) ##### Positional Encoding ###### Rotational Positional Encoding ###### Rotary Positional Encoding @@ -141,10 +147,12 @@ This section tries to cover various methodologies used in LLMs. #### ◻️Training #### ◻️Inference ##### RAG +- [Introduction to Facebook AI Similarity Search (Faiss)](https://www.pinecone.io/learn/series/faiss/faiss-tutorial/)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) #### ◻️Prompting --- ### 🟩 FineTuning +-[Deep Learning Tuning Playbook](https://github.com/google-research/tuning_playbook)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) #### ◻️Quantized FineTuning - [Umar Jamil: Quantization](https://www.youtube.com/watch?v=0VdNflU08yA) ![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) #### ◻️LoRA @@ -154,9 +162,10 @@ This section tries to cover various methodologies used in LLMs. #### ◻️ORPO #### ◻️RLHF - [Umar Jamil: RLHF Explained](https://www.youtube.com/watch?v=qGyFrqc34yc) ![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) - +- [Policy Gradients: The Foundation of RLHF](https://cameronrwolfe.substack.com/p/policy-gradients-the-foundation-of)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) --- ### 🟩 Quantization +- [HuggingFace Quantization Overview](https://huggingface.co/docs/transformers/main/en/quantization/overview)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) #### ◻️Post Training Quantization ##### Static/Dynamic Quantization ##### GPTQ @@ -179,7 +188,7 @@ This section tries to cover various methodologies used in LLMs. - [LLM Inference Series: 2. The two-phase process behind LLMs’ responses](https://medium.com/@plienhar/llm-inference-series-2-the-two-phase-process-behind-llms-responses-1ff1ff021cd5)![Hard](https://img.shields.io/badge/difficulty-Hard-red) - [LLM Inference Series: 4. KV caching, a deeper look](https://medium.com/@plienhar/llm-inference-series-4-kv-caching-a-deeper-look-4ba9a77746c8)![Hard](https://img.shields.io/badge/difficulty-Hard-red) - [How KV caches impact time to first token for LLMs](https://www.glean.com/blog/glean-kv-caches-llm-latency)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) - +- [Generation with LLMs](https://charoori.notion.site/Generation-with-LLMs-17d311b8ed1e819b99a3e79112e00ca6)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) --- @@ -191,7 +200,12 @@ This section tries to cover various methodologies used in LLMs. - [PyTorch Conference Mini Talk](https://www.youtube.com/watch?v=PdtKkc5jB4g) ![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) - [PyTorch Engineers Meeting Talk](https://www.youtube.com/watch?v=MQwryfkydc0) ![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) - [Hugging Face Collab Blog](https://huggingface.co/blog/unsloth-trl) ![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) +- [Summary of Designing Machine Learning Systems](https://github.com/serodriguez68/designing-ml-systems-summary)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) +- [System Design for Recommendations and Search](https://eugeneyan.com/writing/system-design-for-discovery/)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) +- [Recommender Systems, Not Just Recommender Models](https://medium.com/nvidia-merlin/recommender-systems-not-just-recommender-models-485c161c755e)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) +- [Blueprints for recommender system architectures: 10th anniversary edition](https://amatria.in/blog/RecsysArchitectures)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) #### ◻️Flash Attention 2 +- [Flash Attention Machine Learning](https://www.youtube.com/watch?v=N1EZpa7lZc8)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) -[FLASHATTENTION: Fast and Memory-Efficient Exact Attention with IO-Awareness]((https://arxiv.org/pdf/2205.14135)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) #### ◻️KV Cache #### ◻️Batched Inference @@ -249,6 +263,7 @@ This section tries to cover various methodologies used in LLMs. ### 🟩 Misc - [Tweet on what to learn in ML (RT by Karpathy)](https://x.com/youraimarketer/status/1778992208697258152) ![Hard](https://img.shields.io/badge/difficulty-Hard-red) +- [Schedule - CS 685, Spring 2024, UMass Amherst](https://people.cs.umass.edu/~miyyer/cs685/schedule.html)![Hard](https://img.shields.io/badge/difficulty-Hard-red) --- From 987d6f35bac13d9457fd611d9ade28aef9be7a8f Mon Sep 17 00:00:00 2001 From: DazzedUpDas <161445393+DazzedUpDas@users.noreply.github.com> Date: Sat, 8 Mar 2025 12:15:36 -0500 Subject: [PATCH 4/7] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 949c3d5..385c238 100644 --- a/README.md +++ b/README.md @@ -206,7 +206,7 @@ This section tries to cover various methodologies used in LLMs. - [Blueprints for recommender system architectures: 10th anniversary edition](https://amatria.in/blog/RecsysArchitectures)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) #### ◻️Flash Attention 2 - [Flash Attention Machine Learning](https://www.youtube.com/watch?v=N1EZpa7lZc8)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) --[FLASHATTENTION: Fast and Memory-Efficient Exact Attention with IO-Awareness]((https://arxiv.org/pdf/2205.14135)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) +- [FLASHATTENTION: Fast and Memory-Efficient Exact Attention with IO-Awareness]((https://arxiv.org/pdf/2205.14135)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) #### ◻️KV Cache #### ◻️Batched Inference #### ◻️Python Advanced From f4e77d17996f2b5412caed9e7ae4a5ea59bb1d94 Mon Sep 17 00:00:00 2001 From: DazzedUpDas <161445393+DazzedUpDas@users.noreply.github.com> Date: Sat, 8 Mar 2025 12:16:42 -0500 Subject: [PATCH 5/7] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 385c238..20c54f2 100644 --- a/README.md +++ b/README.md @@ -206,7 +206,7 @@ This section tries to cover various methodologies used in LLMs. - [Blueprints for recommender system architectures: 10th anniversary edition](https://amatria.in/blog/RecsysArchitectures)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) #### ◻️Flash Attention 2 - [Flash Attention Machine Learning](https://www.youtube.com/watch?v=N1EZpa7lZc8)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) -- [FLASHATTENTION: Fast and Memory-Efficient Exact Attention with IO-Awareness]((https://arxiv.org/pdf/2205.14135)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) +- [FLASHATTENTION: Fast and Memory-Efficient Exact Attention with IO-Awareness](https://arxiv.org/pdf/2205.14135)![Medium](https://img.shields.io/badge/difficulty-Medium-yellow) #### ◻️KV Cache #### ◻️Batched Inference #### ◻️Python Advanced From 5ec4516c92b885c5a54c64c432894cc640cb2995 Mon Sep 17 00:00:00 2001 From: DazzedUpDas <161445393+DazzedUpDas@users.noreply.github.com> Date: Sat, 8 Mar 2025 12:21:04 -0500 Subject: [PATCH 6/7] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 20c54f2..62a6bb1 100644 --- a/README.md +++ b/README.md @@ -57,7 +57,7 @@ Each area has multiple types of subtopics each of which will go more in depth. I - [◻️Torch Fundamentals](#️torch-fundamentals) - [🟩 Deployment](#-deployment) - [🟩 Engineering](#-engineering) - - [◻️Flash Attention 2] + - [◻️Flash Attention 2](#Flash Attention 2) - [◻️KV Cache](#️kv-cache) - [◻️Batched Inference](#️batched-inference) - [◻️Python Advanced](#️python-advanced) From 0b52087ba5dc1ec6223865e143a7e756e514b4c1 Mon Sep 17 00:00:00 2001 From: DazzedUpDas <161445393+DazzedUpDas@users.noreply.github.com> Date: Sat, 8 Mar 2025 12:22:56 -0500 Subject: [PATCH 7/7] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 62a6bb1..171db03 100644 --- a/README.md +++ b/README.md @@ -57,7 +57,7 @@ Each area has multiple types of subtopics each of which will go more in depth. I - [◻️Torch Fundamentals](#️torch-fundamentals) - [🟩 Deployment](#-deployment) - [🟩 Engineering](#-engineering) - - [◻️Flash Attention 2](#Flash Attention 2) + - [◻️Flash Attention 2](#Flash-Attention-2) - [◻️KV Cache](#️kv-cache) - [◻️Batched Inference](#️batched-inference) - [◻️Python Advanced](#️python-advanced)