ApartsinProjects
diff --git a/‎agents/avatars/40-bibliography.png‎
628 KB b/‎agents/avatars/40-bibliography.png‎
628 KB
diff --git a/‎agents/avatars/generate_all_avatars.py‎
Lines changed: 2 additions & 0 deletions b/‎agents/avatars/generate_all_avatars.py‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎part-1-foundations/module-00-ml-pytorch-foundations/index.html‎
Lines changed: 23 additions & 12 deletions b/‎part-1-foundations/module-00-ml-pytorch-foundations/index.html‎
Lines changed: 23 additions & 12 deletions
diff --git a/‎part-1-foundations/module-01-foundations-nlp-text-representation/index.html‎
Lines changed: 15 additions & 12 deletions b/‎part-1-foundations/module-01-foundations-nlp-text-representation/index.html‎
Lines changed: 15 additions & 12 deletions
@@ -50,6 +50,8 @@
     ("36-illustrator", "a French female artist named Iris with paint-stained apron, wild colorful hair, surrounded by floating illustrations and paintbrushes"),
     ("37-epigraph-writer", "a British male poet named Quentin with a quill pen, sitting in a leather armchair, chuckling at his own witty writing"),
     ("38-application-example", "a Nigerian female business consultant named Nadia with a blazer, holding case study folders, standing in front of industry charts"),
+    ("39-fun-injector", "a charismatic mixed-race male comedian named Ziggy with wild curly hair, colorful suspenders, holding a rubber chicken in one hand and a textbook in the other, winking"),
+    ("40-bibliography", "a distinguished older white female librarian named Margot with silver hair in a French twist, reading glasses on a chain, surrounded by floating hyperlinked book spines and glowing citation marks"),
 ]
 
 BASE_STYLE = "Digital art avatar portrait, Kurzgesagt-inspired minimal cartoon style, clean vector lines, vibrant flat colors, circular composition, gradient background, friendly and professional, no text"
 
@@ -196,21 +196,27 @@ <h2>Chapter Overview</h2>
             This chapter is your launchpad. Before we can understand how Large Language Models work,
             we need to build a solid foundation in machine learning, neural networks, and the tools
             we will use throughout the course. Think of this chapter as ensuring everyone speaks the
-            same language before the real journey begins.
+            same language before the real journey begins, from
+            <a href="../module-01-foundations-nlp-text-representation/index.html">NLP fundamentals (Module 1)</a>
+            all the way through to
+            <a href="../../part-6-agents-applications/module-21-ai-agents/index.html">building AI agents (Module 21)</a>.
         </p>
         <p>
             We start with the core ideas of machine learning: how machines learn patterns from data,
             what can go wrong (overfitting), and how to fix it. Then we dive into neural networks
-            and the magic of backpropagation. Next, we get our hands dirty with PyTorch, the
+            and the magic of backpropagation, concepts you will see again when we study the
+            <a href="../module-04-transformer-architecture/index.html">Transformer architecture (Module 4)</a>.
+            Next, we get our hands dirty with PyTorch, the
             framework that powers most modern LLM research and development. Finally, we introduce
-            reinforcement learning, the paradigm that makes LLMs helpful through RLHF.
+            reinforcement learning, the paradigm that makes LLMs helpful through RLHF, a topic
+            explored in full in <a href="../../part-4-training-adapting/module-16-alignment-rlhf-dpo/index.html">Module 16: Alignment, RLHF &amp; DPO</a>.
         </p>
     </div>
 
     <div class="callout practical-example">
         <h4>Practical Example: The Shortcut That Backfired</h4>
         <p><strong>Who:</strong> A junior ML engineer at a mid-sized e-commerce company (150 employees)</p>
-        <p><strong>Situation:</strong> Tasked with building a product recommendation model, she jumped straight into fine-tuning a large pretrained transformer without reviewing ML fundamentals.</p>
+        <p><strong>Situation:</strong> Tasked with building a product recommendation model, she jumped straight into <a href="../../part-4-training-adapting/module-13-fine-tuning-fundamentals/index.html">fine-tuning</a> a large pretrained transformer without reviewing ML fundamentals.</p>
         <p><strong>Problem:</strong> The model achieved 94% accuracy on the training set but only 61% on production data. She had no idea why.</p>
         <p><strong>Dilemma:</strong> Ship the underperforming model to meet a deadline, or pause and diagnose the gap between training and production performance.</p>
         <p><strong>Decision:</strong> She convinced her manager to allow a one-week pause to revisit overfitting diagnostics and regularization basics.</p>
@@ -222,11 +228,11 @@ <h4>Practical Example: The Shortcut That Backfired</h4>
     <div class="objectives">
         <h3>Learning Objectives</h3>
         <ul>
-            <li>Explain supervised learning, loss functions, and gradient descent intuitively and mathematically</li>
+            <li>Explain supervised learning, loss functions, and gradient descent intuitively and mathematically (these resurface in <a href="../../part-4-training-adapting/module-13-fine-tuning-fundamentals/index.html">Module 13: Fine-Tuning Fundamentals</a>)</li>
             <li>Describe the bias-variance tradeoff and apply regularization techniques</li>
             <li>Build and train neural networks, understanding backpropagation at a mechanical level</li>
-            <li>Write complete PyTorch training loops with custom datasets and GPU acceleration</li>
-            <li>Explain the RL framework (agent, policy, reward) and its connection to LLM training</li>
+            <li>Write complete PyTorch training loops with custom datasets and GPU acceleration, skills applied throughout <a href="../../part-4-training-adapting/module-14-peft/index.html">Part 4: Training &amp; Adapting</a></li>
+            <li>Explain the RL framework (agent, policy, reward) and its connection to LLM training via <a href="../../part-4-training-adapting/module-16-alignment-rlhf-dpo/index.html">RLHF and DPO (Module 16)</a></li>
         </ul>
     </div>
 
@@ -245,7 +251,8 @@ <h2 style="margin-bottom: 1rem; font-size: 1.2rem;">Sections</h2>
                 <span class="badge" title="Fundamentals">📐</span>
                 <span class="section-desc">
                     Supervised learning, loss functions, gradient descent, overfitting, regularization,
-                    bias-variance tradeoff, cross-validation.
+                    bias-variance tradeoff, cross-validation. These optimization concepts carry directly into
+                    <a href="../../part-2-understanding-llms/module-06-pretraining-scaling-laws/index.html" style="color: var(--text-light);">pretraining and scaling (Module 6)</a>.
                 </span>
             </a>
         </li>
@@ -257,7 +264,8 @@ <h2 style="margin-bottom: 1rem; font-size: 1.2rem;">Sections</h2>
                 <span class="badge" title="Fundamentals">📐</span>
                 <span class="section-desc">
                     Perceptrons, MLPs, activation functions, backpropagation, batch normalization,
-                    dropout, CNNs, training best practices.
+                    dropout, CNNs, training best practices. The foundation for understanding
+                    <a href="../module-03-sequence-models-attention/index.html" style="color: var(--text-light);">sequence models and attention (Module 3)</a>.
                 </span>
             </a>
         </li>
@@ -283,7 +291,9 @@ <h4>Practical Example: Debugging a Silent Gradient Failure</h4>
                 <span class="badge" title="Lab">🔧</span>
                 <span class="section-desc">
                     Tensors, autograd, nn.Module, DataLoader, training loops, saving/loading models,
-                    debugging. Includes hands-on lab: build an image classifier.
+                    debugging. Includes hands-on lab: build an image classifier. You will use these PyTorch skills in every hands-on module, especially
+                    <a href="../../part-4-training-adapting/module-14-peft/index.html" style="color: var(--text-light);">PEFT (Module 14)</a> and
+                    <a href="../../part-2-understanding-llms/module-08-inference-optimization/index.html" style="color: var(--text-light);">inference optimization (Module 8)</a>.
                 </span>
             </a>
         </li>
@@ -295,7 +305,8 @@ <h4>Practical Example: Debugging a Silent Gradient Failure</h4>
                 <span class="badge" title="Fundamentals">📐</span>
                 <span class="section-desc">
                     Agent-environment loop, policy, value functions, Bellman equation, policy gradients,
-                    PPO, and how RL connects to LLM training (RLHF).
+                    PPO, and how RL connects to LLM training (RLHF). See also
+                    <a href="../../part-6-agents-applications/module-21-ai-agents/index.html" style="color: var(--text-light);">AI Agents (Module 21)</a> for how RL-trained policies power autonomous systems.
                 </span>
             </a>
         </li>
@@ -307,7 +318,7 @@ <h4>Practical Example: Choosing PyTorch Under Pressure</h4>
         <p><strong>Situation:</strong> The team needed to prototype a fraud detection model and had two weeks before a board demo. Half the team knew TensorFlow; half knew PyTorch.</p>
         <p><strong>Problem:</strong> Framework fragmentation meant code reviews took twice as long, and shared utilities were incompatible across the two codebases.</p>
         <p><strong>Dilemma:</strong> Standardize on one framework (slowing down half the team initially) or maintain both and accept long-term maintenance costs.</p>
-        <p><strong>Decision:</strong> She standardized on PyTorch, citing its dominance in research papers the team needed to reproduce and its more intuitive debugging with eager execution.</p>
+        <p><strong>Decision:</strong> She standardized on PyTorch, citing its dominance in research papers the team needed to reproduce (see <a href="../../part-2-understanding-llms/module-07-modern-llm-landscape/index.html">Module 7: Modern LLM Landscape</a>) and its more intuitive debugging with eager execution.</p>
         <p><strong>How:</strong> She allocated three days for a "PyTorch bootcamp" covering tensors, autograd, and training loops, then paired TensorFlow veterans with PyTorch users for the remaining sprint.</p>
         <p><strong>Result:</strong> The team delivered the demo on time. Over the following quarter, code review turnaround dropped by 35%, and onboarding new hires became two days faster.</p>
         <p><strong>Lesson:</strong> <strong>Short-term slowdowns from standardizing tools pay compound dividends in team velocity.</strong></p>
 
@@ -207,15 +207,18 @@ <h2>Chapter Overview</h2>
         <p><strong>Every modern AI system that can read, write, or converse began with the ideas in this chapter.</strong></p>
         <p>
             How do machines learn to read? This chapter traces the evolution of text representation
-            from counting words to understanding meaning. We start with the fundamental challenge
+            from counting words to understanding meaning. Building on the neural network and optimization
+            fundamentals from <a href="../module-00-ml-pytorch-foundations/index.html">Module 0: ML &amp; PyTorch Foundations</a>,
+            we start with the fundamental challenge
             of turning raw human language into numbers, work through classical techniques like
             Bag-of-Words and TF-IDF, then explore the revolution sparked by Word2Vec and dense
             word embeddings.
         </p>
         <p>
             Along the way, you will build a complete text preprocessing pipeline, train word
             embeddings from scratch, explore the famous king/queen analogy, and see how contextual
-            embeddings (ELMo) paved the road to the transformer models that power every modern LLM.
+            embeddings (ELMo) paved the road to the transformer models covered in
+            <a href="../module-04-transformer-architecture/index.html">Module 4: Transformer Architecture</a>.
             Understanding this progression is essential: the entire history of NLP is a quest for
             better representations of meaning, and each technique you learn here is a building block
             for everything that follows.
@@ -231,19 +234,19 @@ <h4>Practical Example: When Bag-of-Words Met Sarcasm</h4>
         <p><strong>Decision:</strong> They replaced TF-IDF with pre-trained Word2Vec embeddings, reasoning that dense vectors would capture tonal patterns that sparse counts could not.</p>
         <p><strong>How:</strong> They fine-tuned 300-dimensional Word2Vec vectors on their review corpus, then trained a simple logistic regression on the averaged embeddings.</p>
         <p><strong>Result:</strong> Sarcasm misclassification dropped from 23% to 9%. Customer response SLA compliance rose from 74% to 91% within two months.</p>
-        <p><strong>Lesson:</strong> <strong>Representation quality is the ceiling for downstream model performance. Upgrading from sparse to dense embeddings often delivers bigger gains than swapping classifiers.</strong></p>
+        <p><strong>Lesson:</strong> <strong>Representation quality is the ceiling for downstream model performance. Upgrading from sparse to dense embeddings often delivers bigger gains than swapping classifiers.</strong> For production-scale embedding pipelines, see <a href="../../part-5-retrieval-conversation/module-18-embeddings-vector-db/index.html">Module 18: Embeddings &amp; Vector Databases</a>.</p>
     </div>
 
     <div class="objectives">
         <h3>Learning Objectives</h3>
         <ul>
-            <li>Explain the evolution of NLP from rule-based systems to modern LLMs and <em>why</em> each transition happened</li>
+            <li>Explain the evolution of NLP from rule-based systems to modern LLMs (surveyed in <a href="../../part-2-understanding-llms/module-07-modern-llm-landscape/index.html">Module 7: Modern LLM Landscape</a>) and <em>why</em> each transition happened</li>
             <li>Build a complete text preprocessing pipeline using spaCy and NLTK</li>
             <li>Implement and compare Bag-of-Words, TF-IDF, and one-hot encoding, and articulate their limitations</li>
             <li>Explain how Word2Vec, GloVe, and FastText create dense word representations and <em>why</em> they work</li>
-            <li>Train a Word2Vec model from scratch and explore word analogies</li>
+            <li>Train a Word2Vec model from scratch (using techniques from <a href="../module-00-ml-pytorch-foundations/index.html">Module 0</a>) and explore word analogies</li>
             <li>Explain why static embeddings fail for polysemous words and how ELMo introduced contextual embeddings</li>
-            <li>Articulate the "big picture" of how text representation evolved toward transformers and LLMs</li>
+            <li>Articulate the "big picture" of how text representation evolved toward transformers and LLMs (explored further in <a href="../../part-2-understanding-llms/module-06-pretraining-scaling-laws/index.html">Module 6: Pretraining &amp; Scaling Laws</a>)</li>
         </ul>
     </div>
 
@@ -275,7 +278,7 @@ <h2 style="margin-bottom: 1rem; font-size: 1.2rem;">Sections</h2>
                 <span class="badge" title="Lab">&#x1F527;</span>
                 <span class="section-desc">
                     The preprocessing pipeline (tokenization, lemmatization, stop words),
-                    Bag-of-Words, TF-IDF, one-hot encoding, n-grams, and their limitations.
+                    Bag-of-Words, TF-IDF, one-hot encoding, n-grams, and their limitations. Tokenization is explored in depth in <a href="../module-02-tokenization-subword-models/index.html">Module 2</a>.
                 </span>
             </a>
         </li>
@@ -288,7 +291,7 @@ <h2 style="margin-bottom: 1rem; font-size: 1.2rem;">Sections</h2>
                 <span class="badge" title="Lab">&#x1F527;</span>
                 <span class="section-desc">
                     The distributional hypothesis, Skip-gram, negative sampling, word analogies,
-                    cosine similarity, GloVe co-occurrence matrices, FastText subwords, and t-SNE visualization.
+                    cosine similarity, GloVe co-occurrence matrices, FastText subwords, and t-SNE visualization. These embeddings underpin the retrieval systems in <a href="../../part-5-retrieval-conversation/module-19-rag/index.html">Module 19: RAG</a>.
                 </span>
             </a>
         </li>
@@ -300,7 +303,7 @@ <h2 style="margin-bottom: 1rem; font-size: 1.2rem;">Sections</h2>
                 <span class="badge" title="Fundamentals">&#x1F4D0;</span>
                 <span class="section-desc">
                     The polysemy problem, ELMo's bidirectional LSTMs, layer-wise representations,
-                    the pre-train/fine-tune paradigm, and how this led to transformers and LLMs.
+                    the pre-train/fine-tune paradigm, and how this led to transformers and LLMs. Continues into <a href="../module-03-sequence-models-attention/index.html">Module 3: Sequence Models &amp; Attention</a>.
                 </span>
             </a>
         </li>
@@ -315,7 +318,7 @@ <h4>Practical Example: The Polysemy Trap in Medical Search</h4>
         <p><strong>Decision:</strong> They adopted ELMo embeddings, which generate a different vector for "cold" depending on surrounding words.</p>
         <p><strong>How:</strong> They replaced their GloVe-based query encoder with a fine-tuned ELMo model, keeping the same retrieval pipeline and re-indexing their 50K article corpus.</p>
         <p><strong>Result:</strong> Relevant-result click-through rate jumped from 34% to 52%. Polysemy-related support tickets dropped by 60% in the first quarter.</p>
-        <p><strong>Lesson:</strong> <strong>When your domain has words with multiple critical meanings, static embeddings will silently degrade search quality. Contextual embeddings are not a luxury; they are a necessity.</strong></p>
+        <p><strong>Lesson:</strong> <strong>When your domain has words with multiple critical meanings, static embeddings will silently degrade search quality. Contextual embeddings are not a luxury; they are a necessity.</strong> Modern transformer-based approaches (see <a href="../module-04-transformer-architecture/index.html">Module 4: Transformer Architecture</a>) take this idea even further with full self-attention.</p>
     </div>
 
     <div class="callout practical-example">
@@ -327,7 +330,7 @@ <h4>Practical Example: Preprocessing Saves a Recommendation Engine</h4>
         <p><strong>Decision:</strong> They built a proper pipeline: HTML stripping, lowercasing, lemmatization, stop-word removal, and boilerplate detection.</p>
         <p><strong>How:</strong> Using spaCy, they wrote a five-stage pipeline that reduced the average document from 1,200 tokens to 180 meaningful lemmas, then retrained the same TF-IDF model.</p>
         <p><strong>Result:</strong> User engagement with recommendations rose 41%. The "thumbs-down" rate on suggestions fell from 38% to 14%, all without changing the underlying algorithm.</p>
-        <p><strong>Lesson:</strong> <strong>Before reaching for a more powerful model, audit your preprocessing. Clean input often outperforms a fancy model fed noisy data.</strong></p>
+        <p><strong>Lesson:</strong> <strong>Before reaching for a more powerful model, audit your preprocessing. Clean input often outperforms a fancy model fed noisy data.</strong> For a deeper look at subword tokenization techniques that modern LLMs use, see <a href="../module-02-tokenization-subword-models/index.html">Module 2: Tokenization &amp; Subword Models</a>.</p>
     </div>
 
     <div class="callout fun-note">
@@ -337,7 +340,7 @@ <h4>Practical Example: Preprocessing Saves a Recommendation Engine</h4>
     <div class="prereqs">
         <h3>Prerequisites</h3>
         <ul>
-            <li>Module 00: ML &amp; PyTorch Foundations (especially sections on neural networks and gradient descent)</li>
+            <li><a href="../module-00-ml-pytorch-foundations/index.html">Module 00: ML &amp; PyTorch Foundations</a> (especially sections on neural networks and gradient descent)</li>
             <li>Python proficiency (functions, classes, list comprehensions)</li>
             <li>Basic linear algebra: vectors, dot products, matrix multiplication</li>
             <li>Familiarity with NumPy and basic scikit-learn usage</li>
Original file line number	Diff line number	Diff line change
`@@ -50,6 +50,8 @@`
`50`	`50`	`("36-illustrator", "a French female artist named Iris with paint-stained apron, wild colorful hair, surrounded by floating illustrations and paintbrushes"),`
`51`	`51`	`("37-epigraph-writer", "a British male poet named Quentin with a quill pen, sitting in a leather armchair, chuckling at his own witty writing"),`
`52`	`52`	`("38-application-example", "a Nigerian female business consultant named Nadia with a blazer, holding case study folders, standing in front of industry charts"),`
	`53`	`+ ("39-fun-injector", "a charismatic mixed-race male comedian named Ziggy with wild curly hair, colorful suspenders, holding a rubber chicken in one hand and a textbook in the other, winking"),`
	`54`	`+ ("40-bibliography", "a distinguished older white female librarian named Margot with silver hair in a French twist, reading glasses on a chain, surrounded by floating hyperlinked book spines and glowing citation marks"),`
`53`	`55`	`]`
`54`	`56`
`55`	`57`	`BASE_STYLE = "Digital art avatar portrait, Kurzgesagt-inspired minimal cartoon style, clean vector lines, vibrant flat colors, circular composition, gradient background, friendly and professional, no text"`