ApartsinProjects
diff --git a/‎CONFORMANCE_CHECKLIST.md‎
Lines changed: 433 additions & 0 deletions b/‎CONFORMANCE_CHECKLIST.md‎
Lines changed: 433 additions & 0 deletions
diff --git a/‎part-1-foundations/module-00-ml-pytorch-foundations/section-0.1.html‎
Lines changed: 38 additions & 7 deletions b/‎part-1-foundations/module-00-ml-pytorch-foundations/section-0.1.html‎
Lines changed: 38 additions & 7 deletions
diff --git a/‎part-1-foundations/module-00-ml-pytorch-foundations/section-0.2.html‎
Lines changed: 36 additions & 5 deletions b/‎part-1-foundations/module-00-ml-pytorch-foundations/section-0.2.html‎
Lines changed: 36 additions & 5 deletions
@@ -376,13 +376,44 @@
             margin-top: 1.5rem;
         }
 
+                        a.cross-ref { color: var(--accent, #0f3460); }
+        .illustration img { max-width: 100%; border-radius: 8px; }
+        .whats-next h3 { color: #1565c0; margin-bottom: 0.8rem; }
+        .whats-next { background: linear-gradient(135deg, #e3f2fd, #e8eaf6); border: 1px solid #90caf9; border-radius: 10px; padding: 1.5rem 1.8rem; margin: 2rem 0; }
+
+        /* Tablet */
+        @media (max-width: 1024px) {
+            .content { max-width: 100%; padding: 2rem 1.5rem; }
+            .chapter-header { padding: 3rem 1.5rem; }
+            .chapter-header h1 { font-size: 2rem; }
+            .epigraph, .prerequisites { max-width: 90%; }
+            svg { max-width: 100%; height: auto; }
+        }
+        @media (max-width: 768px) {
+            .content { padding: 1.5rem 1rem; }
+            .chapter-header h1 { font-size: 1.6rem; }
+            .chapter-header .subtitle { font-size: 1rem; }
+            body { font-size: 16px; line-height: 1.75; }
+            h2 { font-size: 1.5rem; }
+            h3 { font-size: 1.2rem; }
+            table { display: block; overflow-x: auto; -webkit-overflow-scrolling: touch; }
+            pre { overflow-x: auto; -webkit-overflow-scrolling: touch; }
+            .callout { margin-left: 0; margin-right: 0; }
+        }
+        @media (max-width: 480px) {
+            .content { padding: 1rem 0.75rem; }
+            .epigraph, .prerequisites { max-width: 100%; margin-left: 0; margin-right: 0; }
+            .chapter-header h1 { font-size: 1.4rem; }
+        }
+
 </style>
 </head>
 <body>
 <header class="chapter-header">
     <a href="../../index.html" class="module-label" style="color: rgba(255,255,255,0.7); text-decoration: none;">Module 00 &middot; Section 0.1</a>
     <h1>ML Basics: Features, Optimization &amp; Generalization</h1>
     <div class="subtitle">From raw data to learning machines: the fundamental ideas that make machine learning work</div>
+    <p style="margin-top: 0.5rem;"><a href="../index.html" style="color: rgba(255,255,255,0.85); text-decoration: none;">Part 1: Foundations</a></p>
 </header>
 
 <blockquote class="epigraph">
@@ -413,7 +444,7 @@ <h3>Why Representation Matters</h3>
 
     <div class="callout key-insight">
         <div class="callout-title">&#128161; Key Insight</div>
-        <p>A good representation makes the relationship between inputs and outputs <em>simple enough for the model to learn</em>. In deep learning (and particularly in LLMs), the model learns its own representations automatically. This is one of the key breakthroughs that makes deep learning so powerful, and we will explore it in <a href="section-0.2.html" style="color: var(--accent, #0f3460);">Section 0.2</a>.</p>
+        <p>A good representation makes the relationship between inputs and outputs <em>simple enough for the model to learn</em>. In deep learning (and particularly in LLMs), the model learns its own representations automatically. This is one of the key breakthroughs that makes deep learning so powerful, and we will explore it in <a href="section-0.2.html" class="cross-ref">Section 0.2</a>.</p>
     </div>
 
     <h3>Common Feature Engineering Techniques</h3>
@@ -524,7 +555,7 @@ <h3>Why Gradient Descent Works</h3>
     <p>We could try random guessing, but the space is impossibly large. Instead, we use a beautiful insight from calculus: <strong>the gradient tells us which direction is uphill</strong>. If we walk in the opposite direction, we go downhill, reducing the loss.</p>
 
     <figure class="illustration">
-        <img src="images/gradient-descent-landscape.png" alt="A hilly landscape illustrating gradient descent, showing a path from a high point down to the lowest valley" style="max-width: 100%; border-radius: 8px;">
+        <img src="images/gradient-descent-landscape.png" alt="A hilly landscape illustrating gradient descent, showing a path from a high point down to the lowest valley">
         <figcaption>Gradient descent navigates a loss landscape by following the steepest downhill direction at each step, seeking the lowest valley (minimum loss).</figcaption>
     </figure>
 
@@ -630,7 +661,7 @@ <h3>Variants of Gradient Descent</h3>
     <h2>4. Overfitting, Underfitting, and Regularization</h2>
 
     <figure class="illustration">
-        <img src="images/overfitting-vs-generalization.png" alt="Side-by-side comparison of an overfitting model that memorizes noise versus a well-generalized model that captures the true pattern" style="max-width: 100%; border-radius: 8px;">
+        <img src="images/overfitting-vs-generalization.png" alt="Side-by-side comparison of an overfitting model that memorizes noise versus a well-generalized model that captures the true pattern">
         <figcaption>Overfitting versus generalization: the model on the left has memorized every training point (including noise), while the model on the right captures the underlying pattern.</figcaption>
     </figure>
 
@@ -891,7 +922,7 @@ <h2>7. Putting It All Together: The Full Pipeline</h2>
         <li><strong>Diagnosis:</strong> If training accuracy is 95% but test accuracy is 70%, you are overfitting. Consider stronger regularization, more data, or fewer features. If both training and test accuracy are 65%, you are underfitting. Consider a more powerful model (e.g., a neural network) or better features.</li>
     </ol>
 
-    <p>This exact workflow scales to far more complex settings. When researchers train GPT-style models, they follow the same logical steps at a vastly larger scale: represent text as token sequences (features), define cross-entropy loss over next-token prediction, optimize with Adam (a sophisticated variant of SGD that adapts the learning rate per parameter; we will explain Adam in detail in <a href="section-0.3.html" style="color: var(--accent, #0f3460);">Section 0.3</a>), apply dropout and weight decay, and evaluate on held-out benchmarks. In Section 0.3, you will implement this entire workflow in PyTorch, the framework used for most modern LLM research.</p>
+    <p>This exact workflow scales to far more complex settings. When researchers train GPT-style models, they follow the same logical steps at a vastly larger scale: represent text as token sequences (features), define cross-entropy loss over next-token prediction, optimize with Adam (a sophisticated variant of SGD that adapts the learning rate per parameter; we will explain Adam in detail in <a href="section-0.3.html" class="cross-ref">Section 0.3</a>), apply dropout and weight decay, and evaluate on held-out benchmarks. In Section 0.3, you will implement this entire workflow in PyTorch, the framework used for most modern LLM research.</p>
 
     <!-- Interactive Quiz -->
     <div class="quiz">
@@ -952,8 +983,8 @@ <h2>Key Takeaways</h2>
         </ol>
     </div>
 
-<div class="whats-next" style="background: linear-gradient(135deg, #e3f2fd, #e8eaf6); border: 1px solid #90caf9; border-radius: 10px; padding: 1.5rem 1.8rem; margin: 2rem 0;">
-    <h3 style="color: #1565c0; margin-bottom: 0.8rem;">What's Next?</h3>
+<div class="whats-next">
+    <h3>What's Next?</h3>
     <p>In the next section, <a href="section-0.2.html">Section 0.2: Deep Learning Essentials</a>, we build on these ML fundamentals by exploring deep learning essentials, including neural network architectures and training techniques.</p>
 </div>
 
@@ -974,7 +1005,7 @@ <h3 style="color: #1565c0; margin-bottom: 0.8rem;">What's Next?</h3>
         <p class="bib-ref">
             <a href="https://hastie.su.domains/ElemStatLearn/" target="_blank" rel="noopener">Hastie, T., Tibshirani, R., &amp; Friedman, J. (2009). <em>The Elements of Statistical Learning</em> (2nd ed.). Springer.</a>
         </p>
-        <p class="bib-annotation">Comprehensive treatment of the bias-variance tradeoff, cross-validation, and regularization methods discussed in this section. <a href="../../part-2-understanding-llms/module-07-modern-llm-landscape/index.html" style="color: var(--accent, #0f3460);">Chapter 7</a> (model assessment) is especially relevant. Freely available online and ideal for practitioners who want statistical depth without full measure theory.</p>
+        <p class="bib-annotation">Comprehensive treatment of the bias-variance tradeoff, cross-validation, and regularization methods discussed in this section. <a href="../../part-2-understanding-llms/module-07-modern-llm-landscape/index.html" class="cross-ref">Chapter 7</a> (model assessment) is especially relevant. Freely available online and ideal for practitioners who want statistical depth without full measure theory.</p>
         <span class="bib-meta">Book</span>
     </div>
 
 
@@ -409,13 +409,44 @@
             margin-top: 1.5rem;
         }
 
+                        a.cross-ref { color: var(--accent, #0f3460); }
+        .illustration img { max-width: 100%; border-radius: 8px; }
+        .whats-next h3 { color: #1565c0; margin-bottom: 0.8rem; }
+        .whats-next { background: linear-gradient(135deg, #e3f2fd, #e8eaf6); border: 1px solid #90caf9; border-radius: 10px; padding: 1.5rem 1.8rem; margin: 2rem 0; }
+
+        /* Tablet */
+        @media (max-width: 1024px) {
+            .content { max-width: 100%; padding: 2rem 1.5rem; }
+            .chapter-header { padding: 3rem 1.5rem; }
+            .chapter-header h1 { font-size: 2rem; }
+            .epigraph, .prerequisites { max-width: 90%; }
+            svg { max-width: 100%; height: auto; }
+        }
+        @media (max-width: 768px) {
+            .content { padding: 1.5rem 1rem; }
+            .chapter-header h1 { font-size: 1.6rem; }
+            .chapter-header .subtitle { font-size: 1rem; }
+            body { font-size: 16px; line-height: 1.75; }
+            h2 { font-size: 1.5rem; }
+            h3 { font-size: 1.2rem; }
+            table { display: block; overflow-x: auto; -webkit-overflow-scrolling: touch; }
+            pre { overflow-x: auto; -webkit-overflow-scrolling: touch; }
+            .callout { margin-left: 0; margin-right: 0; }
+        }
+        @media (max-width: 480px) {
+            .content { padding: 1rem 0.75rem; }
+            .epigraph, .prerequisites { max-width: 100%; margin-left: 0; margin-right: 0; }
+            .chapter-header h1 { font-size: 1.4rem; }
+        }
+
 </style>
 </head>
 <body>
 <header class="chapter-header">
     <a href="../../index.html" class="module-label" style="color: rgba(255,255,255,0.7); text-decoration: none;">Module 00 &middot; Section 0.2</a>
     <h1>Deep Learning Essentials</h1>
     <div class="subtitle">From single neurons to powerful networks: the building blocks of modern AI</div>
+    <p style="margin-top: 0.5rem;"><a href="../index.html" style="color: rgba(255,255,255,0.85); text-decoration: none;">Part 1: Foundations</a></p>
 </header>
 
 <blockquote class="epigraph">
@@ -427,13 +458,13 @@ <h1>Deep Learning Essentials</h1>
 
 <div class="callout big-picture">
     <div class="callout-title">Big Picture: From Basic ML to Neural Networks</div>
-    <p>In <a href="section-0.1.html" style="color: var(--accent, #0f3460);">Section 0.1</a>, you learned how a model can learn from data using gradient descent and loss functions. Those ideas were powerful, but they were limited to finding simple patterns (linear boundaries, shallow decision trees). Deep learning changed everything by <em>stacking layers of simple functions</em> to learn extraordinarily complex representations. This single idea, composing simple transformations into deep hierarchies, is what lets a neural network translate languages, generate images, and power the conversational AI systems you will build in this book.</p>
+    <p>In <a href="section-0.1.html" class="cross-ref">Section 0.1</a>, you learned how a model can learn from data using gradient descent and loss functions. Those ideas were powerful, but they were limited to finding simple patterns (linear boundaries, shallow decision trees). Deep learning changed everything by <em>stacking layers of simple functions</em> to learn extraordinarily complex representations. This single idea, composing simple transformations into deep hierarchies, is what lets a neural network translate languages, generate images, and power the conversational AI systems you will build in this book.</p>
 </div>
 
 <h2>1. Neural Network Fundamentals</h2>
 
 <figure class="illustration">
-    <img src="images/neural-network-machine.png" alt="A stylized neural network depicted as an interconnected machine with layers of processing units" style="max-width: 100%; border-radius: 8px;">
+    <img src="images/neural-network-machine.png" alt="A stylized neural network depicted as an interconnected machine with layers of processing units">
     <figcaption>Neural networks are layered machines: raw inputs flow in, pass through layers of learned transformations, and emerge as predictions.</figcaption>
 </figure>
 
@@ -672,7 +703,7 @@ <h4>Debugging a Vanishing Gradient in Production</h4>
 
 <h2>3. Regularization Techniques</h2>
 
-<p>In <a href="section-0.1.html" style="color: var(--accent, #0f3460);">Section 0.1</a>, you learned that overfitting occurs when a model memorizes training data instead of learning general patterns. Deep networks, with their enormous capacity, are especially prone to this. Here are the three most important tools for fighting overfitting in deep learning.</p>
+<p>In <a href="section-0.1.html" class="cross-ref">Section 0.1</a>, you learned that overfitting occurs when a model memorizes training data instead of learning general patterns. Deep networks, with their enormous capacity, are especially prone to this. Here are the three most important tools for fighting overfitting in deep learning.</p>
 
 <h3>3.1 Dropout</h3>
 
@@ -971,8 +1002,8 @@ <h2>Key Takeaways</h2>
     </ol>
 </div>
 
-<div class="whats-next" style="background: linear-gradient(135deg, #e3f2fd, #e8eaf6); border: 1px solid #90caf9; border-radius: 10px; padding: 1.5rem 1.8rem; margin: 2rem 0;">
-    <h3 style="color: #1565c0; margin-bottom: 0.8rem;">What's Next?</h3>
+<div class="whats-next">
+    <h3>What's Next?</h3>
     <p>In the next section, <a href="section-0.3.html">Section 0.3: PyTorch Tutorial</a>, we put theory into practice with a hands-on PyTorch tutorial, learning the framework that powers most modern LLM research.</p>
 </div>