You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
QA sweep: responsive CSS, format consistency, cross-refs, conformance checklist
Add responsive CSS media queries (1024/768/480px breakpoints) across all 141
section files for tablet and mobile rendering. Fix inline styles on whats-next,
bibliography, and callout elements by moving to CSS classes. Add missing CSS
definitions (code-caption, research-frontier, whats-next) to style blocks.
Fix non-canonical callout class names. Add header navigation links (module
label to index, part subtitle to part index). Add code captions after code
blocks. Create CONFORMANCE_CHECKLIST.md as single source of truth for all
formatting, structural, and content requirements (24 categories, A through X).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
<p>A good representation makes the relationship between inputs and outputs <em>simple enough for the model to learn</em>. In deep learning (and particularly in LLMs), the model learns its own representations automatically. This is one of the key breakthroughs that makes deep learning so powerful, and we will explore it in <ahref="section-0.2.html" style="color: var(--accent, #0f3460);">Section 0.2</a>.</p>
447
+
<p>A good representation makes the relationship between inputs and outputs <em>simple enough for the model to learn</em>. In deep learning (and particularly in LLMs), the model learns its own representations automatically. This is one of the key breakthroughs that makes deep learning so powerful, and we will explore it in <ahref="section-0.2.html" class="cross-ref">Section 0.2</a>.</p>
<p>We could try random guessing, but the space is impossibly large. Instead, we use a beautiful insight from calculus: <strong>the gradient tells us which direction is uphill</strong>. If we walk in the opposite direction, we go downhill, reducing the loss.</p>
525
556
526
557
<figureclass="illustration">
527
-
<imgsrc="images/gradient-descent-landscape.png" alt="A hilly landscape illustrating gradient descent, showing a path from a high point down to the lowest valley"style="max-width: 100%; border-radius: 8px;">
558
+
<imgsrc="images/gradient-descent-landscape.png" alt="A hilly landscape illustrating gradient descent, showing a path from a high point down to the lowest valley">
528
559
<figcaption>Gradient descent navigates a loss landscape by following the steepest downhill direction at each step, seeking the lowest valley (minimum loss).</figcaption>
529
560
</figure>
530
561
@@ -630,7 +661,7 @@ <h3>Variants of Gradient Descent</h3>
630
661
<h2>4. Overfitting, Underfitting, and Regularization</h2>
631
662
632
663
<figureclass="illustration">
633
-
<imgsrc="images/overfitting-vs-generalization.png" alt="Side-by-side comparison of an overfitting model that memorizes noise versus a well-generalized model that captures the true pattern"style="max-width: 100%; border-radius: 8px;">
664
+
<imgsrc="images/overfitting-vs-generalization.png" alt="Side-by-side comparison of an overfitting model that memorizes noise versus a well-generalized model that captures the true pattern">
634
665
<figcaption>Overfitting versus generalization: the model on the left has memorized every training point (including noise), while the model on the right captures the underlying pattern.</figcaption>
635
666
</figure>
636
667
@@ -891,7 +922,7 @@ <h2>7. Putting It All Together: The Full Pipeline</h2>
891
922
<li><strong>Diagnosis:</strong> If training accuracy is 95% but test accuracy is 70%, you are overfitting. Consider stronger regularization, more data, or fewer features. If both training and test accuracy are 65%, you are underfitting. Consider a more powerful model (e.g., a neural network) or better features.</li>
892
923
</ol>
893
924
894
-
<p>This exact workflow scales to far more complex settings. When researchers train GPT-style models, they follow the same logical steps at a vastly larger scale: represent text as token sequences (features), define cross-entropy loss over next-token prediction, optimize with Adam (a sophisticated variant of SGD that adapts the learning rate per parameter; we will explain Adam in detail in <ahref="section-0.3.html" style="color: var(--accent, #0f3460);">Section 0.3</a>), apply dropout and weight decay, and evaluate on held-out benchmarks. In Section 0.3, you will implement this entire workflow in PyTorch, the framework used for most modern LLM research.</p>
925
+
<p>This exact workflow scales to far more complex settings. When researchers train GPT-style models, they follow the same logical steps at a vastly larger scale: represent text as token sequences (features), define cross-entropy loss over next-token prediction, optimize with Adam (a sophisticated variant of SGD that adapts the learning rate per parameter; we will explain Adam in detail in <ahref="section-0.3.html" class="cross-ref">Section 0.3</a>), apply dropout and weight decay, and evaluate on held-out benchmarks. In Section 0.3, you will implement this entire workflow in PyTorch, the framework used for most modern LLM research.</p>
<p>In the next section, <ahref="section-0.2.html">Section 0.2: Deep Learning Essentials</a>, we build on these ML fundamentals by exploring deep learning essentials, including neural network architectures and training techniques.</p>
<ahref="https://hastie.su.domains/ElemStatLearn/" target="_blank" rel="noopener">Hastie, T., Tibshirani, R., & Friedman, J. (2009). <em>The Elements of Statistical Learning</em> (2nd ed.). Springer.</a>
976
1007
</p>
977
-
<pclass="bib-annotation">Comprehensive treatment of the bias-variance tradeoff, cross-validation, and regularization methods discussed in this section. <ahref="../../part-2-understanding-llms/module-07-modern-llm-landscape/index.html" style="color: var(--accent, #0f3460);">Chapter 7</a> (model assessment) is especially relevant. Freely available online and ideal for practitioners who want statistical depth without full measure theory.</p>
1008
+
<pclass="bib-annotation">Comprehensive treatment of the bias-variance tradeoff, cross-validation, and regularization methods discussed in this section. <ahref="../../part-2-understanding-llms/module-07-modern-llm-landscape/index.html" class="cross-ref">Chapter 7</a> (model assessment) is especially relevant. Freely available online and ideal for practitioners who want statistical depth without full measure theory.</p>
<divclass="callout-title">Big Picture: From Basic ML to Neural Networks</div>
430
-
<p>In <ahref="section-0.1.html" style="color: var(--accent, #0f3460);">Section 0.1</a>, you learned how a model can learn from data using gradient descent and loss functions. Those ideas were powerful, but they were limited to finding simple patterns (linear boundaries, shallow decision trees). Deep learning changed everything by <em>stacking layers of simple functions</em> to learn extraordinarily complex representations. This single idea, composing simple transformations into deep hierarchies, is what lets a neural network translate languages, generate images, and power the conversational AI systems you will build in this book.</p>
461
+
<p>In <ahref="section-0.1.html" class="cross-ref">Section 0.1</a>, you learned how a model can learn from data using gradient descent and loss functions. Those ideas were powerful, but they were limited to finding simple patterns (linear boundaries, shallow decision trees). Deep learning changed everything by <em>stacking layers of simple functions</em> to learn extraordinarily complex representations. This single idea, composing simple transformations into deep hierarchies, is what lets a neural network translate languages, generate images, and power the conversational AI systems you will build in this book.</p>
431
462
</div>
432
463
433
464
<h2>1. Neural Network Fundamentals</h2>
434
465
435
466
<figureclass="illustration">
436
-
<imgsrc="images/neural-network-machine.png" alt="A stylized neural network depicted as an interconnected machine with layers of processing units"style="max-width: 100%; border-radius: 8px;">
467
+
<imgsrc="images/neural-network-machine.png" alt="A stylized neural network depicted as an interconnected machine with layers of processing units">
437
468
<figcaption>Neural networks are layered machines: raw inputs flow in, pass through layers of learned transformations, and emerge as predictions.</figcaption>
438
469
</figure>
439
470
@@ -672,7 +703,7 @@ <h4>Debugging a Vanishing Gradient in Production</h4>
672
703
673
704
<h2>3. Regularization Techniques</h2>
674
705
675
-
<p>In <ahref="section-0.1.html" style="color: var(--accent, #0f3460);">Section 0.1</a>, you learned that overfitting occurs when a model memorizes training data instead of learning general patterns. Deep networks, with their enormous capacity, are especially prone to this. Here are the three most important tools for fighting overfitting in deep learning.</p>
706
+
<p>In <ahref="section-0.1.html" class="cross-ref">Section 0.1</a>, you learned that overfitting occurs when a model memorizes training data instead of learning general patterns. Deep networks, with their enormous capacity, are especially prone to this. Here are the three most important tools for fighting overfitting in deep learning.</p>
<p>In the next section, <ahref="section-0.3.html">Section 0.3: PyTorch Tutorial</a>, we put theory into practice with a hands-on PyTorch tutorial, learning the framework that powers most modern LLM research.</p>
0 commit comments