Fix reward tampering date (2025→2024) and update Stochastic Parrots link to canonical ACM URL

jeremymanning · claude · jeremymanning · commit 114eda2655c2 · 2026-03-06T00:21:41.000-05:00
Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/slides/week9/lecture26.html b/slides/week9/lecture26.html
@@ -741,7 +741,7 @@ <h1 id="alignment-faking">Alignment faking</h1>
 </foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="5" data-class="scale-80" data-theme="cdl-theme" lang="C" class="scale-80" style="--class:scale-80;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
 <h1 id="reward-tampering">Reward tampering</h1>
 <div class="definition-box" data-title="What happened">
-<p><a href="https://www.anthropic.com/research/reward-tampering">Anthropic's reward tampering research</a> (<a href="https://www.anthropic.com/research/reward-tampering">Denison et al., 2025</a>) revealed that training models to be sycophantic (agreeable) produced unexpected <strong>emergent dangerous behaviors</strong>:</p>
+<p><a href="https://www.anthropic.com/research/reward-tampering">Anthropic's reward tampering research</a> (<a href="https://www.anthropic.com/research/reward-tampering">Denison et al., 2024</a>) revealed that training models to be sycophantic (agreeable) produced unexpected <strong>emergent dangerous behaviors</strong>:</p>
 </div>
 <div class="warning-box" data-title="What emerged without explicit training">
 <p>Models trained to agree with users spontaneously learned to:</p>
@@ -1110,10 +1110,10 @@ <h1 id="the-question-that-matters">The question that matters</h1>
 <h1 id="further-reading">Further reading</h1>
 <div class="note-box" data-title="Further reading">
 <p><a href="https://arxiv.org/abs/2412.14093"><strong>Greenblatt et al. (2024, <em>arXiv</em>)</strong></a> &quot;Alignment Faking in Large Language Models&quot; — Models that strategically pretend to be aligned.</p>
-<p><a href="https://www.anthropic.com/research/reward-tampering"><strong>Anthropic (2025)</strong></a> &quot;Reward Tampering&quot; — Emergent deceptive behaviors from sycophancy training.</p>
+<p><a href="https://www.anthropic.com/research/reward-tampering"><strong>Anthropic (2024)</strong></a> &quot;Reward Tampering&quot; — Emergent deceptive behaviors from sycophancy training.</p>
 <p><a href="https://transformer-circuits.pub/2025/attribution-graphs/methods.html"><strong>Anthropic (2025)</strong></a> &quot;Circuit Tracing&quot; — Seeing inside the black box with attribution graphs.</p>
 <p><a href="https://internationalaisafetyreport.org/publication/international-ai-safety-report-2026"><strong>International AI Safety Report (2026)</strong></a> — Global scientific consensus on AI safety.</p>
-<p><a href="https://faculty.washington.edu/ebender/papers/Stochastic_Parrots.pdf"><strong>Bender et al. (2021, <em>FAccT</em>)</strong></a> &quot;On the Dangers of Stochastic Parrots&quot; — Environmental and social costs of large language models.</p>
+<p><a href="https://dl.acm.org/doi/10.1145/3442188.3445922"><strong>Bender et al. (2021, <em>FAccT</em>)</strong></a> &quot;On the Dangers of Stochastic Parrots&quot; — Environmental and social costs of large language models.</p>
 <p><a href="https://hbr.org/2026/01/companies-are-laying-off-workers-because-of-ais-potential-not-its-performance"><strong>Harvard Business Review (2026)</strong></a> &quot;Companies Are Laying Off Workers Because of AI's Potential, Not Its Performance.&quot;</p>
 </div>
 </section>
diff --git a/slides/week9/lecture26.md b/slides/week9/lecture26.md
@@ -70,7 +70,7 @@ The model *reasoned* about its own training process and chose a deceptive strate
 
 <div class="definition-box" data-title="What happened">
 
-[Anthropic's reward tampering research](https://www.anthropic.com/research/reward-tampering) ([Denison et al., 2025](https://www.anthropic.com/research/reward-tampering)) revealed that training models to be sycophantic (agreeable) produced unexpected **emergent dangerous behaviors**:
+[Anthropic's reward tampering research](https://www.anthropic.com/research/reward-tampering) ([Denison et al., 2024](https://www.anthropic.com/research/reward-tampering)) revealed that training models to be sycophantic (agreeable) produced unexpected **emergent dangerous behaviors**:
 
 </div>
 
@@ -415,13 +415,13 @@ You are graduating into a world where:
 
 [**Greenblatt et al. (2024, *arXiv*)**](https://arxiv.org/abs/2412.14093) "Alignment Faking in Large Language Models" — Models that strategically pretend to be aligned.
 
-[**Anthropic (2025)**](https://www.anthropic.com/research/reward-tampering) "Reward Tampering" — Emergent deceptive behaviors from sycophancy training.
+[**Anthropic (2024)**](https://www.anthropic.com/research/reward-tampering) "Reward Tampering" — Emergent deceptive behaviors from sycophancy training.
 
 [**Anthropic (2025)**](https://transformer-circuits.pub/2025/attribution-graphs/methods.html) "Circuit Tracing" — Seeing inside the black box with attribution graphs.
 
 [**International AI Safety Report (2026)**](https://internationalaisafetyreport.org/publication/international-ai-safety-report-2026) — Global scientific consensus on AI safety.
 
-[**Bender et al. (2021, *FAccT*)**](https://faculty.washington.edu/ebender/papers/Stochastic_Parrots.pdf) "On the Dangers of Stochastic Parrots" — Environmental and social costs of large language models.
+[**Bender et al. (2021, *FAccT*)**](https://dl.acm.org/doi/10.1145/3442188.3445922) "On the Dangers of Stochastic Parrots" — Environmental and social costs of large language models.
 
 [**Harvard Business Review (2026)**](https://hbr.org/2026/01/companies-are-laying-off-workers-because-of-ais-potential-not-its-performance) "Companies Are Laying Off Workers Because of AI's Potential, Not Its Performance."
 
diff --git a/slides/week9/lecture26.pdf b/slides/week9/lecture26.pdf