Skip to content

Commit 52e982b

Browse files
apartsinclaude
andcommitted
Late agent fixes: Module 11 captions and capstone formatting
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 06491ed commit 52e982b

6 files changed

Lines changed: 13 additions & 13 deletions

File tree

capstone/index.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@
4040
margin-bottom: 0.8rem;
4141
}
4242
header h1 { font-size: 2.2rem; margin-bottom: 0.5rem; }
43-
header .subtitle { font-size: 1.05rem; opacity: 0.85; font-style: italic; max-width: 650px; margin: 0 auto; }
43+
header .chapter-subtitle { font-size: 1.05rem; opacity: 0.85; font-style: italic; max-width: 650px; margin: 0 auto; }
4444

4545
.chapter-nav {
4646
display: flex;
@@ -154,7 +154,7 @@
154154
<header>
155155
<div class="module-num">Capstone Project</div>
156156
<h1>End-to-End LLM System</h1>
157-
<p class="subtitle">Design, build, evaluate, and present a production-grade LLM application that integrates every major skill from this book</p>
157+
<p class="chapter-subtitle">Design, build, evaluate, and present a production-grade LLM application that integrates every major skill from this book</p>
158158
</header>
159159

160160
<div class="container">

capstone/requirements.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
.chapter-header { background: linear-gradient(135deg, #1a1a2e, #0f3460, #e94560); color: white; padding: 4rem 2rem; text-align: center; }
3434
.chapter-header .module-label { font-family: 'Segoe UI', sans-serif; font-size: 0.85rem; text-transform: uppercase; letter-spacing: 3px; opacity: 0.7; margin-bottom: 0.5rem; }
3535
.chapter-header h1 { font-size: 2.5rem; font-weight: 700; margin-bottom: 1rem; line-height: 1.2; }
36-
.chapter-header .subtitle { font-size: 1.1rem; opacity: 0.85; max-width: 650px; margin: 0 auto; font-style: italic; }
36+
.chapter-header .chapter-subtitle { font-size: 1.1rem; opacity: 0.85; max-width: 650px; margin: 0 auto; font-style: italic; }
3737
.content { max-width: 820px; margin: 0 auto; padding: 3rem 2rem; }
3838
h2 { font-size: 1.8rem; color: var(--primary); margin: 3rem 0 1.5rem; padding-bottom: 0.5rem; border-bottom: 3px solid var(--highlight); }
3939
h3 { font-size: 1.35rem; color: var(--accent); margin: 2rem 0 1rem; }
@@ -94,7 +94,7 @@
9494
<header class="chapter-header">
9595
<div class="module-label">Capstone &middot; C.1 &amp; C.2</div>
9696
<h1>Requirements &amp; Deliverables</h1>
97-
<div class="subtitle">Detailed technical requirements for the capstone system and specifications for each deliverable</div>
97+
<div class="chapter-subtitle">Detailed technical requirements for the capstone system and specifications for each deliverable</div>
9898
</header>
9999

100100
<main class="content">

part-3-working-with-llms/module-11-hybrid-ml-llm/section-11.1.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -524,7 +524,7 @@ <h3>2.1 TF-IDF + Logistic Regression Baseline</h3>
524524
Cost per query: ~$0.000001 (CPU inference)
525525
</div>
526526

527-
<div class="code-caption"><strong>Code Fragment 1:</strong> Cost tracking utility that estimates API spend from token usage. Mapping model names to per-token prices lets you monitor expenses programmatically.</div>
527+
<div class="code-caption"><strong>Code Fragment 1:</strong> TF-IDF plus logistic regression classifier for customer support tickets. The pipeline trains on bigram features (<code>ngram_range=(1, 2)</code>) and benchmarks inference at 0.12 ms per query, roughly 3,000x faster than an LLM API call, at negligible cost.</div>
528528

529529
<p>For structured extraction tasks, regular expressions offer even faster, deterministic results. The following snippet demonstrates regex-based entity extraction for common patterns such as emails, phone numbers, and monetary amounts.</p>
530530

@@ -570,7 +570,7 @@ <h3>2.1 TF-IDF + Logistic Regression Baseline</h3>
570570
Cost: $0.00 (no API call)
571571
</div>
572572

573-
<div class="code-caption"><strong>Code Fragment 2:</strong> RLHF training loop using PPO to optimize the language model against a reward signal. The KL divergence penalty prevents drift from the reference model.</div>
573+
<div class="code-caption"><strong>Code Fragment 2:</strong> Regex-based entity extractor for deterministic structured patterns (emails, phone numbers, monetary amounts, dates). The benchmark loop of 10,000 iterations demonstrates sub-microsecond per-extraction latency with zero false positives and zero API cost.</div>
574574

575575
<h2>5. Cost Modeling at Scale</h2>
576576

part-3-working-with-llms/module-11-hybrid-ml-llm/section-11.2.html

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -329,7 +329,7 @@ <h3>1.2 Generating Embeddings with OpenAI</h3>
329329
Cost: ~$0.00002 per text (text-embedding-3-small)
330330
Total for 5 texts: ~$0.0001
331331
</div>
332-
<div class="code-caption"><strong>Code Fragment 1:</strong> Embedding generation for converting text into dense vector representations. These vectors capture semantic meaning, enabling similarity search and clustering.</div>
332+
<div class="code-caption"><strong>Code Fragment 1:</strong> Batch embedding via OpenAI's <code>text-embedding-3-small</code> model. The <code>get_embeddings()</code> function sends multiple texts in a single API call and returns a NumPy array of shape (n_texts, 1536), with per-text cost at approximately $0.00002.</div>
333333

334334
<p>Code Fragment 2 loads the model via Transformers.</p>
335335

@@ -379,7 +379,7 @@ <h3>1.2 Generating Embeddings with OpenAI</h3>
379379
Similarity between 'charged twice' and 'app crashes': 0.089
380380
</div>
381381

382-
<div class="code-caption"><strong>Code Fragment 2:</strong> Cost tracking utility that estimates API spend from token usage. Mapping model names to per-token prices lets you monitor expenses programmatically.</div>
382+
<div class="code-caption"><strong>Code Fragment 2:</strong> Local embedding with <code>SentenceTransformer('all-MiniLM-L6-v2')</code>, an 80 MB model producing 384-dimensional vectors. The <code>normalize_embeddings=True</code> flag enables direct dot-product similarity. At 5.7 ms per text on CPU with zero API cost, this is orders of magnitude cheaper than cloud embedding APIs.</div>
383383

384384
<h2>4. Combining Embeddings with Structured Features</h2>
385385

@@ -444,7 +444,7 @@ <h2>4. Combining Embeddings with Structured Features</h2>
444444
Combined (structured + embeddings) 0.841 (+/- 0.018)
445445
</div>
446446

447-
<div class="code-caption"><strong>Code Fragment 3:</strong> Embedding generation for converting text into dense vector representations. These vectors capture semantic meaning, enabling similarity search and clustering.</div>
447+
<div class="code-caption"><strong>Code Fragment 3:</strong> Feature ablation study comparing structured-only, embeddings-only, and combined feature sets using XGBoost with 5-fold cross-validation. The combined configuration (<code>StandardScaler</code> on structured features concatenated with 384-dim embeddings) outperforms either source alone, demonstrating complementary signal.</div>
448448

449449
<div class="callout key-insight">
450450
<div class="callout-title">&#128161; Key Insight</div>
@@ -506,7 +506,7 @@ <h3>4.1 Semantic Caching as a Hybrid Pattern</h3>
506506
self.responses.pop(0)
507507

508508
return response</code></pre>
509-
<div class="code-caption"><strong>Code Fragment 4:</strong> Embedding generation for converting text into dense vector representations. These vectors capture semantic meaning, enabling similarity search and clustering.</div>
509+
<div class="code-caption"><strong>Code Fragment 4:</strong> Semantic cache implementation using cosine similarity for cache lookup. The <code>SemanticCache.get_or_generate()</code> method embeds incoming queries, compares against stored vectors at a configurable <code>threshold</code> (default 0.95), and returns cached responses on hits, bypassing the LLM entirely.</div>
510510

511511
<p>The cost savings from semantic caching can be dramatic. In customer support applications where 30% to 50% of queries are paraphrases of common questions, semantic caching reduces LLM API costs proportionally while cutting median response latency from 1 to 2 seconds (LLM generation) to under 50 milliseconds (vector lookup). The embedding cost for the cache lookup is negligible: a single embedding API call costs roughly 1,000x less than a full LLM generation. For even lower latency, a local embedding model like all-MiniLM-L6-v2 can handle the cache lookup in under 5 milliseconds on CPU.</p>
512512

part-3-working-with-llms/module-11-hybrid-ml-llm/section-11.3.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -364,7 +364,7 @@ <h3>1.1 Implementing Confidence-Based Routing</h3>
364364
Category: complex_case | Confidence: 0.95 | Source: llm | Cost: $0.00300
365365
</div>
366366

367-
<div class="code-caption"><strong>Code Fragment 1:</strong> Anthropic Messages API call showing the distinct parameter layout. The system prompt is a top-level parameter rather than a message role, and max_tokens is required. Content blocks provide structured access to generated text.</div>
367+
<div class="code-caption"><strong>Code Fragment 1:</strong> Hybrid triage router in <code>TriageRouter</code> that uses a TF-IDF classifier as the first pass. When <code>confidence</code> exceeds the threshold (0.85), the classifier handles the query at $0.00001 per call. Ambiguous or mixed-intent queries (e.g., "change email and also get a refund") fall through to the LLM at $0.003.</div>
368368

369369
<p>Code Fragment 2 implements request routing.</p>
370370

@@ -688,7 +688,7 @@ <h2>5. Lab: Building a Customer Support Pipeline</h2>
688688
Cost: $0.00500
689689
</div>
690690

691-
<div class="code-caption"><strong>Code Fragment 4:</strong> Cost tracking utility that estimates API spend from token usage. Mapping model names to per-token prices lets you monitor expenses programmatically.</div>
691+
<div class="code-caption"><strong>Code Fragment 4:</strong> End-to-end customer support pipeline in <code>CustomerSupportPipeline</code>. Each ticket flows through classification, regex extraction, and conditional LLM escalation. The output includes <code>routing_tier</code>, <code>extracted_info</code>, suggested <code>action</code>, and <code>total_cost</code>, showing how layered processing keeps most tickets under $0.0001.</div>
692692

693693
<div class="quiz">
694694
<h3>Knowledge Check</h3>

part-3-working-with-llms/module-11-hybrid-ml-llm/section-11.4.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -460,7 +460,7 @@ <h3>2.1 Mapping Your Frontier</h3>
460460
Pareto-optimal configs: 6 / 10
461461
</div>
462462

463-
<div class="code-caption"><strong>Code Fragment 2:</strong> Few-shot prompting pattern providing labeled examples before the actual query. The examples establish the expected input-output format. Ordering and diversity of examples significantly affect output quality.</div>
463+
<div class="code-caption"><strong>Code Fragment 2:</strong> Pareto frontier analysis across 10 model configurations using <code>find_pareto_frontier()</code>. Each <code>ModelConfig</code> records accuracy, cost, and latency. The output marks dominated configurations (e.g., "Bad prompt" costs more than DistilBERT but achieves lower accuracy) and highlights the hybrid router as a Pareto-optimal point at one-fifth the cost of GPT-4o.</div>
464464

465465
<div class="callout note">
466466
<div class="callout-title">&#128221; Note</div>

0 commit comments

Comments
 (0)