Late agent fixes: Module 11 captions and capstone formatting

apartsin · claude · apartsin · commit 52e982b9c145 · 2026-03-28T03:47:43.000+03:00
Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/capstone/index.html b/capstone/index.html
@@ -40,7 +40,7 @@
             margin-bottom: 0.8rem;
         }
         header h1 { font-size: 2.2rem; margin-bottom: 0.5rem; }
-        header .subtitle { font-size: 1.05rem; opacity: 0.85; font-style: italic; max-width: 650px; margin: 0 auto; }
+        header .chapter-subtitle { font-size: 1.05rem; opacity: 0.85; font-style: italic; max-width: 650px; margin: 0 auto; }
 
         .chapter-nav {
             display: flex;
@@ -154,7 +154,7 @@
 <header>
     <div class="module-num">Capstone Project</div>
     <h1>End-to-End LLM System</h1>
-    <p class="subtitle">Design, build, evaluate, and present a production-grade LLM application that integrates every major skill from this book</p>
+    <p class="chapter-subtitle">Design, build, evaluate, and present a production-grade LLM application that integrates every major skill from this book</p>
 </header>
 
 <div class="container">
diff --git a/capstone/requirements.html b/capstone/requirements.html
@@ -33,7 +33,7 @@
         .chapter-header { background: linear-gradient(135deg, #1a1a2e, #0f3460, #e94560); color: white; padding: 4rem 2rem; text-align: center; }
         .chapter-header .module-label { font-family: 'Segoe UI', sans-serif; font-size: 0.85rem; text-transform: uppercase; letter-spacing: 3px; opacity: 0.7; margin-bottom: 0.5rem; }
         .chapter-header h1 { font-size: 2.5rem; font-weight: 700; margin-bottom: 1rem; line-height: 1.2; }
-        .chapter-header .subtitle { font-size: 1.1rem; opacity: 0.85; max-width: 650px; margin: 0 auto; font-style: italic; }
+        .chapter-header .chapter-subtitle { font-size: 1.1rem; opacity: 0.85; max-width: 650px; margin: 0 auto; font-style: italic; }
         .content { max-width: 820px; margin: 0 auto; padding: 3rem 2rem; }
         h2 { font-size: 1.8rem; color: var(--primary); margin: 3rem 0 1.5rem; padding-bottom: 0.5rem; border-bottom: 3px solid var(--highlight); }
         h3 { font-size: 1.35rem; color: var(--accent); margin: 2rem 0 1rem; }
@@ -94,7 +94,7 @@
 <header class="chapter-header">
     <div class="module-label">Capstone &middot; C.1 &amp; C.2</div>
     <h1>Requirements &amp; Deliverables</h1>
-    <div class="subtitle">Detailed technical requirements for the capstone system and specifications for each deliverable</div>
+    <div class="chapter-subtitle">Detailed technical requirements for the capstone system and specifications for each deliverable</div>
 </header>
 
 <main class="content">
diff --git a/part-3-working-with-llms/module-11-hybrid-ml-llm/section-11.1.html b/part-3-working-with-llms/module-11-hybrid-ml-llm/section-11.1.html
@@ -524,7 +524,7 @@ <h3>2.1 TF-IDF + Logistic Regression Baseline</h3>
 Cost per query: ~$0.000001 (CPU inference)
     </div>
 
-<div class="code-caption"><strong>Code Fragment 1:</strong> Cost tracking utility that estimates API spend from token usage. Mapping model names to per-token prices lets you monitor expenses programmatically.</div>
+<div class="code-caption"><strong>Code Fragment 1:</strong> TF-IDF plus logistic regression classifier for customer support tickets. The pipeline trains on bigram features (<code>ngram_range=(1, 2)</code>) and benchmarks inference at 0.12 ms per query, roughly 3,000x faster than an LLM API call, at negligible cost.</div>
 
     <p>For structured extraction tasks, regular expressions offer even faster, deterministic results. The following snippet demonstrates regex-based entity extraction for common patterns such as emails, phone numbers, and monetary amounts.</p>
 
@@ -570,7 +570,7 @@ <h3>2.1 TF-IDF + Logistic Regression Baseline</h3>
 Cost: $0.00 (no API call)
     </div>
 
-<div class="code-caption"><strong>Code Fragment 2:</strong> RLHF training loop using PPO to optimize the language model against a reward signal. The KL divergence penalty prevents drift from the reference model.</div>
+<div class="code-caption"><strong>Code Fragment 2:</strong> Regex-based entity extractor for deterministic structured patterns (emails, phone numbers, monetary amounts, dates). The benchmark loop of 10,000 iterations demonstrates sub-microsecond per-extraction latency with zero false positives and zero API cost.</div>
 
     <h2>5. Cost Modeling at Scale</h2>
 
diff --git a/part-3-working-with-llms/module-11-hybrid-ml-llm/section-11.2.html b/part-3-working-with-llms/module-11-hybrid-ml-llm/section-11.2.html
@@ -329,7 +329,7 @@ <h3>1.2 Generating Embeddings with OpenAI</h3>
 Cost: ~$0.00002 per text (text-embedding-3-small)
 Total for 5 texts: ~$0.0001
     </div>
-<div class="code-caption"><strong>Code Fragment 1:</strong> Embedding generation for converting text into dense vector representations. These vectors capture semantic meaning, enabling similarity search and clustering.</div>
+<div class="code-caption"><strong>Code Fragment 1:</strong> Batch embedding via OpenAI's <code>text-embedding-3-small</code> model. The <code>get_embeddings()</code> function sends multiple texts in a single API call and returns a NumPy array of shape (n_texts, 1536), with per-text cost at approximately $0.00002.</div>
 
     <p>Code Fragment 2 loads the model via Transformers.</p>
 
@@ -379,7 +379,7 @@ <h3>1.2 Generating Embeddings with OpenAI</h3>
 Similarity between 'charged twice' and 'app crashes': 0.089
     </div>
 
-<div class="code-caption"><strong>Code Fragment 2:</strong> Cost tracking utility that estimates API spend from token usage. Mapping model names to per-token prices lets you monitor expenses programmatically.</div>
+<div class="code-caption"><strong>Code Fragment 2:</strong> Local embedding with <code>SentenceTransformer('all-MiniLM-L6-v2')</code>, an 80 MB model producing 384-dimensional vectors. The <code>normalize_embeddings=True</code> flag enables direct dot-product similarity. At 5.7 ms per text on CPU with zero API cost, this is orders of magnitude cheaper than cloud embedding APIs.</div>
 
     <h2>4. Combining Embeddings with Structured Features</h2>
 
@@ -444,7 +444,7 @@ <h2>4. Combining Embeddings with Structured Features</h2>
   Combined (structured + embeddings)       0.841 (+/- 0.018)
     </div>
 
-<div class="code-caption"><strong>Code Fragment 3:</strong> Embedding generation for converting text into dense vector representations. These vectors capture semantic meaning, enabling similarity search and clustering.</div>
+<div class="code-caption"><strong>Code Fragment 3:</strong> Feature ablation study comparing structured-only, embeddings-only, and combined feature sets using XGBoost with 5-fold cross-validation. The combined configuration (<code>StandardScaler</code> on structured features concatenated with 384-dim embeddings) outperforms either source alone, demonstrating complementary signal.</div>
 
     <div class="callout key-insight">
         <div class="callout-title">&#128161; Key Insight</div>
@@ -506,7 +506,7 @@ <h3>4.1 Semantic Caching as a Hybrid Pattern</h3>
             self.responses.pop(0)
 
         return response</code></pre>
-<div class="code-caption"><strong>Code Fragment 4:</strong> Embedding generation for converting text into dense vector representations. These vectors capture semantic meaning, enabling similarity search and clustering.</div>
+<div class="code-caption"><strong>Code Fragment 4:</strong> Semantic cache implementation using cosine similarity for cache lookup. The <code>SemanticCache.get_or_generate()</code> method embeds incoming queries, compares against stored vectors at a configurable <code>threshold</code> (default 0.95), and returns cached responses on hits, bypassing the LLM entirely.</div>
 
     <p>The cost savings from semantic caching can be dramatic. In customer support applications where 30% to 50% of queries are paraphrases of common questions, semantic caching reduces LLM API costs proportionally while cutting median response latency from 1 to 2 seconds (LLM generation) to under 50 milliseconds (vector lookup). The embedding cost for the cache lookup is negligible: a single embedding API call costs roughly 1,000x less than a full LLM generation. For even lower latency, a local embedding model like all-MiniLM-L6-v2 can handle the cache lookup in under 5 milliseconds on CPU.</p>
 
diff --git a/part-3-working-with-llms/module-11-hybrid-ml-llm/section-11.3.html b/part-3-working-with-llms/module-11-hybrid-ml-llm/section-11.3.html
@@ -364,7 +364,7 @@ <h3>1.1 Implementing Confidence-Based Routing</h3>
   Category: complex_case | Confidence: 0.95 | Source: llm | Cost: $0.00300
     </div>
 
-<div class="code-caption"><strong>Code Fragment 1:</strong> Anthropic Messages API call showing the distinct parameter layout. The system prompt is a top-level parameter rather than a message role, and max_tokens is required. Content blocks provide structured access to generated text.</div>
+<div class="code-caption"><strong>Code Fragment 1:</strong> Hybrid triage router in <code>TriageRouter</code> that uses a TF-IDF classifier as the first pass. When <code>confidence</code> exceeds the threshold (0.85), the classifier handles the query at $0.00001 per call. Ambiguous or mixed-intent queries (e.g., "change email and also get a refund") fall through to the LLM at $0.003.</div>
 
     <p>Code Fragment 2 implements request routing.</p>
 
@@ -688,7 +688,7 @@ <h2>5. Lab: Building a Customer Support Pipeline</h2>
   Cost: $0.00500
     </div>
 
-<div class="code-caption"><strong>Code Fragment 4:</strong> Cost tracking utility that estimates API spend from token usage. Mapping model names to per-token prices lets you monitor expenses programmatically.</div>
+<div class="code-caption"><strong>Code Fragment 4:</strong> End-to-end customer support pipeline in <code>CustomerSupportPipeline</code>. Each ticket flows through classification, regex extraction, and conditional LLM escalation. The output includes <code>routing_tier</code>, <code>extracted_info</code>, suggested <code>action</code>, and <code>total_cost</code>, showing how layered processing keeps most tickets under $0.0001.</div>
 
     <div class="quiz">
         <h3>Knowledge Check</h3>
diff --git a/part-3-working-with-llms/module-11-hybrid-ml-llm/section-11.4.html b/part-3-working-with-llms/module-11-hybrid-ml-llm/section-11.4.html
@@ -460,7 +460,7 @@ <h3>2.1 Mapping Your Frontier</h3>
 Pareto-optimal configs: 6 / 10
     </div>
 
-<div class="code-caption"><strong>Code Fragment 2:</strong> Few-shot prompting pattern providing labeled examples before the actual query. The examples establish the expected input-output format. Ordering and diversity of examples significantly affect output quality.</div>
+<div class="code-caption"><strong>Code Fragment 2:</strong> Pareto frontier analysis across 10 model configurations using <code>find_pareto_frontier()</code>. Each <code>ModelConfig</code> records accuracy, cost, and latency. The output marks dominated configurations (e.g., "Bad prompt" costs more than DistilBERT but achieves lower accuracy) and highlights the hybrid router as a Pareto-optimal point at one-fifth the cost of GPT-4o.</div>
 
     <div class="callout note">
         <div class="callout-title">&#128221; Note</div>