From a9441eed4f6b3f39c302666c3498838476f8bdf8 Mon Sep 17 00:00:00 2001
From: Yad Konrad <yad.konrad@gmail.com>
Date: Fri, 15 May 2026 09:07:27 -0400
Subject: [PATCH] Promote hand-written 2017 notes out of Archive/ into notes/
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The trusted, hand-written CS294 notes were sitting in a folder called
Archive — which sounds like "old/dead" — while the unreviewed AI-drafted
lecture series occupied notes/. Backwards from a learner's perspective.

- Archive/2017-Course-Notes/CS294-DeepRL-Berkeley/ -> notes/cs294-2017/
  (with imgs/ intact)
- Archive/2017-Course-Notes/Elements-Of-RL/ -> notes/sutton-barto-digest/
- Both files got <!-- status: hand-written --> headers
- Archive/ directory deleted (Archive/README.md was just a wrapper)

readme.md restructured: "What's here" now leads with trusted hand-written
content (the CS294 notes, the Sutton & Barto digest, the curated
talks/books/courses, the tested exercises). The AI-drafted lecture
series is clearly demoted as "scaffold, treat with skepticism." "Start
here" reordered: talks/books -> exercises -> drafts.

notes/README.md rewritten in the same spirit. AGENTS.md and CLAUDE.md
updated to point at notes/cs294-2017/ and notes/sutton-barto-digest/
as the trusted, frozen, never-reword material.

GitHub topics refreshed separately (not in this commit): dropped
`guideline` and `study`; added `rlhf`, `llm-alignment`, `dpo`, `grpo`,
`ppo`, `rlvr`, `agentic-rl`, `lecture-notes`, `study-notes`,
`deepseek-r1`, `constitutional-ai`, `policy-gradient`, `q-learning`,
`sutton-barto`. Description sharpened.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 AGENTS.md                                     |  10 +--
 Archive/README.md                             |  17 -----
 CHANGELOG.md                                  |  13 ++++
 CLAUDE.md                                     |   2 +-
 notes/README.md                               |  60 ++++++++----------
 .../cs294-2017}/imgs/cannon.svg               |   0
 .../cs294-2017}/imgs/linear-lqr.png           | Bin
 .../cs294-2017}/imgs/nvidia-case.png          | Bin
 .../imgs/rl-imitation-learning.png            | Bin
 .../cs294-2017}/readme.md                     |   2 +
 .../sutton-barto-digest}/readme.md            |   2 +
 readme.md                                     |  33 ++++++----
 12 files changed, 71 insertions(+), 68 deletions(-)
 delete mode 100644 Archive/README.md
 rename {Archive/2017-Course-Notes/CS294-DeepRL-Berkeley => notes/cs294-2017}/imgs/cannon.svg (100%)
 rename {Archive/2017-Course-Notes/CS294-DeepRL-Berkeley => notes/cs294-2017}/imgs/linear-lqr.png (100%)
 rename {Archive/2017-Course-Notes/CS294-DeepRL-Berkeley => notes/cs294-2017}/imgs/nvidia-case.png (100%)
 rename {Archive/2017-Course-Notes/CS294-DeepRL-Berkeley => notes/cs294-2017}/imgs/rl-imitation-learning.png (100%)
 rename {Archive/2017-Course-Notes/CS294-DeepRL-Berkeley => notes/cs294-2017}/readme.md (99%)
 rename {Archive/2017-Course-Notes/Elements-Of-RL => notes/sutton-barto-digest}/readme.md (91%)

diff --git a/AGENTS.md b/AGENTS.md
index 8aae0b5..566c83d 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -6,7 +6,7 @@ Instructions for AI coding agents (Codex, Claude Code, etc.) working in this rep
 
 A personal study repository for reinforcement learning and its use in training LLMs. It holds:
 
-- the original course notes from 2017 (`Archive/`)
+- the original course notes from 2017 (`notes/cs294-2017/`, `notes/sutton-barto-digest/`)
 - a self-study lecture series taking RL from MDPs to RLHF (`notes/lectures/`)
 - worked, tested coding exercises (`exercises/`)
 - a curated reading list of recent papers (`reference/papers/`)
@@ -18,8 +18,8 @@ It is a learning environment, not a library or a product. A person is working th
 
 | Path | What it is | Editable by an agent? |
 |---|---|---|
-| `Archive/` | The original 2017 notes, idiosyncratic voice, kept as written | No. Reference only. Never reword. |
-| `notes/lectures/` | The lecture series, `NN-topic.md` | Yes, under the rules below |
+| `notes/cs294-2017/`, `notes/sutton-barto-digest/` | Original 2017 hand-written notes (CS294 student notes; Sutton & Barto digest); idiosyncratic voice, kept as written | No. Reference only. Never reword. |
+| `notes/lectures/` | The 19-lecture series, `NN-topic.md` | Yes, under the rules below |
 | `notes/cheat-sheets/`, `notes/diagrams/` | Quick reference | Yes |
 | `exercises/NN-topic/` | A task, a starter file, tests, a reference solution, hints | Yes |
 | `reference/papers/` | Reading lists. The `PAPERS.md` files are generated by the collector; the per-topic READMEs are hand notes. | READMEs yes; don't hand-edit `PAPERS.md` — re-run the collector. |
@@ -58,7 +58,7 @@ The repo has a voice: plain and direct, a little informal, written by someone le
 - **No marketing voice.** Not a product launch. Don't call things "comprehensive," "powerful," "cutting-edge," "robust." Don't open a section with "Why this matters." Don't close with "the future is bright."
 - **No AI-slop tells.** No emoji as bullets or in headings. No rule-of-three padding ("fast, simple, and elegant"). No "it's not just X — it's Y." No "let's dive in." Sentence-case headings. If `~/.claude/skills/anti-slop-guide` is available, follow it.
 - **Be specific.** "The loss explodes around update 50 if you don't normalize the advantage" beats "this can be unstable."
-- **Keep the old notes' quirks.** The 2017 archive says "quadratize (it could be a word)" and references the author being from Iraq. That stays. Don't sand it down.
+- **Keep the old notes' quirks.** The 2017 hand-written notes (`notes/cs294-2017/`) say "quadratize (it could be a word)" and reference the author being from Iraq. That stays. Don't sand it down.
 
 ## Citations
 
@@ -95,7 +95,7 @@ An agent acting as tutor: have the student edit `starter.py`, run `pytest exerci
 
 ## Don't
 
-- Don't reword `Archive/`.
+- Don't reword the 2017 hand-written notes (`notes/cs294-2017/`, `notes/sutton-barto-digest/`).
 - Don't mark your own output `reviewed`.
 - Don't add a citation you haven't verified.
 - Don't touch files outside the repo, shell config, or git history without being asked.
diff --git a/Archive/README.md b/Archive/README.md
deleted file mode 100644
index e281ec9..0000000
--- a/Archive/README.md
+++ /dev/null
@@ -1,17 +0,0 @@
-# Archive — original notes (2017)
-
-Hand-written notes kept as they were. Not edited, not modernized. If something here is dated, that's the point — it's a record, not a maintained doc.
-
-## Contents
-
-### [2017-Course-Notes/CS294-DeepRL-Berkeley/](./2017-Course-Notes/CS294-DeepRL-Berkeley/)
-
-Notes from CS 294: Deep Reinforcement Learning, Berkeley, Spring 2017 (Sergey Levine, John Schulman, Chelsea Finn). Covers imitation learning and DAgger, optimal control and trajectory optimization (LQR/iLQR, MCTS), learning dynamics models, policy gradients, TRPO, actor-critic, and model-based RL. The current version of the course is [CS 285](https://rail.eecs.berkeley.edu/deeprlcourse/).
-
-### [2017-Course-Notes/Elements-Of-RL/](./2017-Course-Notes/Elements-Of-RL/)
-
-A short digest of the four elements of an RL system — policy, reward signal, value function, model — from Sutton & Barto and Li's *Deep RL: An Overview*.
-
-## Status
-
-`hand-written`. These are student notes: informal, with the occasional error or dead link, and some personal asides that stay in. For authoritative versions, go to the original course materials and papers. The newer lecture series in [`../notes/lectures/`](../notes/lectures/) covers the same foundations and continues into RLHF; the reading lists are in [`../reference/papers/`](../reference/papers/).
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 3242e23..20fad36 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,19 @@
 
 Notable changes to the repo. Not a release log — there are no releases — just a record of what moved and why.
 
+## 2026-05-15 — promote the hand-written notes out of Archive/
+
+The trusted, hand-written 2017 notes had been sitting in a folder called `Archive/` — which connotes "old/dead" — while the unreviewed AI-drafted lecture series occupied `notes/`. Backwards. Fixed:
+
+- `Archive/2017-Course-Notes/CS294-DeepRL-Berkeley/` → `notes/cs294-2017/` (with `imgs/` intact and image links unchanged).
+- `Archive/2017-Course-Notes/Elements-Of-RL/` → `notes/sutton-barto-digest/`.
+- Both moved files got a `<!-- status: hand-written -->` header.
+- `Archive/` directory deleted (`Archive/README.md` was a wrapper; no content lost).
+- Root `readme.md` "What's here" section restructured to lead with the trusted, hand-written content (the CS294 notes, the Sutton & Barto digest, the curated talks/books/courses, the tested exercises) and clearly demote the AI-drafted lecture series as scaffold-with-skepticism. "Start here" reordered to lead with safer paths (talks/books → exercises → drafts).
+- `notes/README.md` rewritten in the same spirit — hand-written content first, lecture series second with a clear caveat about what `unreviewed` means.
+- `AGENTS.md` and `CLAUDE.md` updated: the layout table now points at `notes/cs294-2017/` and `notes/sutton-barto-digest/` as the trusted, frozen, never-reword material instead of `Archive/`.
+- GitHub topics refreshed: dropped `guideline` and `study` (generic), added `rlhf`, `llm-alignment`, `dpo`, `grpo`, `ppo`, `rlvr`, `agentic-rl`, `lecture-notes`, `study-notes`, `deepseek-r1`, `constitutional-ai`, `policy-gradient`, `q-learning`, `sutton-barto`. Description sharpened.
+
 ## 2026-05-12 — restructure: separate the layers, set up rules
 
 Context: the repo had grown two layers — the original 2017 notes, and a much larger newer layer added in 2025 (a 13-lecture series, scraped paper lists, a content tool). The newer layer was unmarked, wrote in a first person it hadn't earned, and shipped broken links, phantom lectures, and made-up citations. This pass separates the two so nobody has to guess what's trustworthy, and sets up conventions so they can coexist.
diff --git a/CLAUDE.md b/CLAUDE.md
index d0f9cb3..6fb03f8 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -5,7 +5,7 @@ Read [`AGENTS.md`](./AGENTS.md). Everything in it applies to Claude Code working
 Quick orientation:
 
 - This is a study repo for RL and RL-for-LLMs. A person is learning the material; help them learn it, don't try to "finish" the repo.
-- `Archive/` is frozen — reference it, never reword it.
+- The 2017 hand-written notes (`notes/cs294-2017/`, `notes/sutton-barto-digest/`) are trusted and frozen — reference, never reword.
 - Docs under `notes/` and `reference/` carry a `<!-- status: ... -->` comment. `hand-written` and `reviewed` are trusted; `unreviewed` means nobody has checked it — don't cite it as fact, and don't promote it to `reviewed` yourself (only a person does that).
 - Match the existing voice: plain, direct, no marketing tone, no AI-slop tells. The `anti-slop-guide` skill is available — use it when writing or editing prose here.
 - Verify every paper citation before adding it. The repo currently has some invented ones; don't add more.
diff --git a/notes/README.md b/notes/README.md
index 1e0b296..d5418d3 100644
--- a/notes/README.md
+++ b/notes/README.md
@@ -1,19 +1,32 @@
 <!-- status: unreviewed | last-reviewed: never -->
 
-# Lecture series: deep RL to LLM alignment
+# notes — study material
 
-A self-study sequence that goes from MDPs and policy gradients up through RLHF, DPO, and the 2024–2025 alignment methods. The lecture bodies **haven't been reviewed yet** — useful as a structured path, but check the math, the code, and the citations against primary sources. `../CURRICULUM.md` is the same path with prerequisites and time estimates; [`../AGENTS.md`](../AGENTS.md) explains the `status:` labels.
+Two layers live in this directory, mixed.
 
-Each lecture tries to do four things: give the intuition before the math, show code that runs, point at where the method breaks in practice, and name the papers that introduced it. When a lecture has a matching exercise, it links to [`../exercises/`](../exercises/).
+**Trusted, hand-written:**
 
-## Lectures and review status
+- **[`cs294-2017/`](./cs294-2017/)** — personal student notes from CS 294 Deep RL (Berkeley, Spring 2017 — Levine, Schulman, Finn). 246 lines of working notes from the field being built. Idiosyncratic, kept as written. `status: hand-written`.
+- **[`sutton-barto-digest/`](./sutton-barto-digest/)** — short distillation of the four elements of an RL system (policy, reward, value function, model) from Sutton & Barto. `status: hand-written`.
+
+These are old (2017) and informal — but they're a real person's understanding, not AI text. Trusted as starting points.
+
+**AI-drafted, useful as scaffold (`unreviewed` — treat with skepticism):**
+
+- **[`lectures/`](./lectures/)** — a 19-lecture series taking RL from MDPs through to RLHF / DPO / GRPO / RLVR / agentic / offline. Editorial pass has been done — broken links fixed, code bugs caught (`import gym` → `gymnasium`, missing imports, old-API `env.step` calls), citations checked or removed when they didn't resolve, fake-first-person framing stripped. **But no person has read each lecture end to end and signed off.** Cross-check the math against the cited papers; treat the code as a starting point that needs verification. Index and per-lecture review status below.
+- **[`cheat-sheets/`](./cheat-sheets/)** — `RL-Math-Formulas.md` and `RL-Quick-Reference.md`. Audited (caught a wrong KL direction; fixed). Same caveat.
+- **[`diagrams/`](./diagrams/)** — `RL-Algorithm-Diagrams.md`. Audited (caught and fixed a wrong DPO loss diagram and a wrong GRPO advantage diagram). Same caveat.
+
+[`../CURRICULUM.md`](../CURRICULUM.md) is the suggested order through everything. [`../AGENTS.md`](../AGENTS.md) explains the `<!-- status: ... -->` convention every doc carries.
+
+## Lecture series — drafts, in order
 
 | # | Lecture | Status |
 |---|---|---|
 | 01 | [MDPs and Bellman equations](./lectures/01-mdps-bellman.md) — exercise: [`01-mdps`](../exercises/01-mdps/) | unreviewed (de-slopped; a fabricated value-function output was removed) |
 | 02 | [Policy gradients from scratch](./lectures/02-policy-gradients.md) — exercise: [`02-policy-gradients`](../exercises/02-policy-gradients/) | unreviewed (de-slopped; a broken link and a code bug were fixed) |
 | 03 | [Value functions & Q-learning](./lectures/03-value-functions-q-learning.md) — exercise: [`03-q-learning`](../exercises/03-q-learning/) | unreviewed (de-slopped; a dead `Modern-RL-Research/` path and a missing import fixed) |
-| 04 | [Actor-critic methods](./lectures/04-actor-critic.md) | unreviewed (de-slopped; a code bug fixed) |
+| 04 | [Actor-critic methods](./lectures/04-actor-critic.md) — exercise: [`04-actor-critic`](../exercises/04-actor-critic/) | unreviewed (de-slopped; a code bug fixed) |
 | 05 | [Trust regions and TRPO](./lectures/05-trpo.md) | unreviewed (de-slopped; fabricated training times removed) |
 | 06 | [PPO](./lectures/06-ppo.md) | unreviewed (de-slopped; `import gym` → `gymnasium` fixed) |
 | 07 | [Off-policy learning: SAC and TD3](./lectures/07-off-policy-rl.md) | unreviewed (de-slopped; an old-API `env.step` call fixed) |
@@ -22,45 +35,28 @@ Each lecture tries to do four things: give the intuition before the math, show c
 | 10 | [PPO for language models](./lectures/10-ppo-for-llms.md) | unreviewed (de-slopped; a broken next-lecture link + unverified compute claims fixed) |
 | 11 | [Direct preference optimization](./lectures/11-dpo.md) | unreviewed (de-slopped; a fabricated paper removed) |
 | 12 | [Beyond DPO: GRPO, RRHF, IPO](./lectures/12-beyond-dpo.md) | unreviewed (de-slopped; a fabricated benchmark table + a fabricated paper removed) |
-| 13 | [RLHF for code generation](./lectures/13-rlhf-code-generation.md) | unreviewed (de-slopped; CodeRL mis-attributed to Meta → fixed to Salesforce; fabricated benchmark numbers removed) |
+| 13 | [RLHF for code generation](./lectures/13-rlhf-code-generation.md) — exercise: [`15-grpo-rlvr`](../exercises/15-grpo-rlvr/) (related) | unreviewed (de-slopped; CodeRL mis-attributed to Meta → fixed to Salesforce; fabricated benchmark numbers removed) |
 | 14 | [Constitutional AI, RLAIF, self-improvement](./lectures/14-constitutional-ai-rlaif.md) | unreviewed (new draft) |
-| 15 | [RL with verifiable rewards & reasoning models](./lectures/15-rl-verifiable-rewards.md) | unreviewed (new draft) |
+| 15 | [RL with verifiable rewards & reasoning models](./lectures/15-rl-verifiable-rewards.md) — exercise: [`15-grpo-rlvr`](../exercises/15-grpo-rlvr/) | unreviewed (new draft) |
 | 16 | [Agentic RL: tool use, multi-turn](./lectures/16-agentic-rl.md) | unreviewed (new draft) |
 | 17 | [Online & iterative preference optimization](./lectures/17-online-iterative-preference.md) | unreviewed (new draft) |
 | 18 | [Distillation of reasoning models](./lectures/18-distillation-reasoning.md) | unreviewed (new draft) |
 | 19 | [Offline RL](./lectures/19-offline-rl.md) | unreviewed (new draft) |
 
-Planned: a curated paper layer in [`../reference/papers/`](../reference/papers/), built from `../tools/lit-builder/` once the LLM scoring step has been run (it needs a credential). Optionally: an exploration lecture (intrinsic motivation, count-based methods, RND) — the one remaining foundational gap.
+What "unreviewed" means here: nobody has read the lecture end-to-end and signed off on it. The editorial pass (de-slop, fix broken links, catch code bugs, verify citations) has happened — that's the parenthetical note next to each row. The next step is a person reads it and either flips it to `reviewed` (with today's date in `last-reviewed:`) or notes what's still wrong.
 
-Cheat sheets and diagrams are in [`cheat-sheets/`](./cheat-sheets/) and [`diagrams/`](./diagrams/) — also unreviewed.
+Planned: a curated paper layer in [`../reference/papers/`](../reference/papers/), built from `../tools/lit-builder/` once the LLM scoring step has been run (it needs a credential — see issue #2). Optionally: an exploration lecture (intrinsic motivation, count-based methods, RND).
 
 ## How to use this
 
-Starting from scratch: do 01–05 in order, type out the code yourself, and don't move on from a lecture until you can explain its method without notes. Then 06–08, then 09 onward.
+Starting from scratch: read the talks/books/courses linked in [`../readme.md`](../readme.md) — they're the trusted external material. The hand-written CS294 notes at [`cs294-2017/`](./cs294-2017/) give you one student's path through the same material.
 
-Already know RL, here for the LLM part: skim 01–05 for notation, then go 09 → 10 → 11 → 12. Lecture 13 if you care about code generation specifically.
+Already know RL, here for the LLM part: lectures 09 → 11 → 12 → 14 → 15 → 17 covers the RLHF → DPO → GRPO → constitutional AI → RLVR → iterative preference optimization arc.
 
-Here for code generation: 02 (policy-gradient intuition), 10 (PPO for LLMs), 11–13.
+Here for code generation specifically: lecture 02 (policy-gradient intuition), 10 (PPO for LLMs), 13 (RLHF for code), 15 (RLVR — the basis of modern reasoning-RL on code).
 
 ## Prerequisites
 
-- Calculus (derivatives, chain rule, gradients), probability (expectations, distributions, KL divergence), basic linear algebra. The math is explained as it comes up.
-- Python at an intermediate level; PyTorch basics (the code uses PyTorch); NumPy.
-- Budget a few hours per lecture including coding and debugging.
-
-## Study notes that hold up
-
-- Type the code out. Don't paste it.
-- Break it on purpose — change a hyperparameter until it fails, then work out why.
-- If you can't explain a method simply, you don't have it yet.
-- After coding a method, read the original paper. It reads very differently once you've implemented it.
-- Print shapes when something's wrong. Most RL bugs are shape or sign errors.
-
-## Supplementary resources
-
-- Sutton & Barto, *Reinforcement Learning: An Introduction* (2nd ed.)
-- Spinning Up in Deep RL (OpenAI) — explanations plus reference implementations
-- David Silver's UCL lectures
-- Recent papers, by topic, in [`../reference/papers/`](../reference/papers/)
-
-The lectures are meant to stand on their own, but they'll make more sense alongside these.
+- Calculus (derivatives, chain rule, gradients), probability (expectations, distributions, KL divergence), basic linear algebra.
+- Python at an intermediate level; PyTorch basics; NumPy.
+- A few hours per lecture including coding and debugging.
diff --git a/Archive/2017-Course-Notes/CS294-DeepRL-Berkeley/imgs/cannon.svg b/notes/cs294-2017/imgs/cannon.svg
similarity index 100%
rename from Archive/2017-Course-Notes/CS294-DeepRL-Berkeley/imgs/cannon.svg
rename to notes/cs294-2017/imgs/cannon.svg
diff --git a/Archive/2017-Course-Notes/CS294-DeepRL-Berkeley/imgs/linear-lqr.png b/notes/cs294-2017/imgs/linear-lqr.png
similarity index 100%
rename from Archive/2017-Course-Notes/CS294-DeepRL-Berkeley/imgs/linear-lqr.png
rename to notes/cs294-2017/imgs/linear-lqr.png
diff --git a/Archive/2017-Course-Notes/CS294-DeepRL-Berkeley/imgs/nvidia-case.png b/notes/cs294-2017/imgs/nvidia-case.png
similarity index 100%
rename from Archive/2017-Course-Notes/CS294-DeepRL-Berkeley/imgs/nvidia-case.png
rename to notes/cs294-2017/imgs/nvidia-case.png
diff --git a/Archive/2017-Course-Notes/CS294-DeepRL-Berkeley/imgs/rl-imitation-learning.png b/notes/cs294-2017/imgs/rl-imitation-learning.png
similarity index 100%
rename from Archive/2017-Course-Notes/CS294-DeepRL-Berkeley/imgs/rl-imitation-learning.png
rename to notes/cs294-2017/imgs/rl-imitation-learning.png
diff --git a/Archive/2017-Course-Notes/CS294-DeepRL-Berkeley/readme.md b/notes/cs294-2017/readme.md
similarity index 99%
rename from Archive/2017-Course-Notes/CS294-DeepRL-Berkeley/readme.md
rename to notes/cs294-2017/readme.md
index d281c3a..387910c 100644
--- a/Archive/2017-Course-Notes/CS294-DeepRL-Berkeley/readme.md
+++ b/notes/cs294-2017/readme.md
@@ -1,3 +1,5 @@
+<!-- status: hand-written | provenance: notes by Yad Konrad while taking the courses (2017); kept as written -->
+
 
 ## Notes taken from CS 294: Deep Reinforcement Learning, Spring 2017 (Berkeley)
 
diff --git a/Archive/2017-Course-Notes/Elements-Of-RL/readme.md b/notes/sutton-barto-digest/readme.md
similarity index 91%
rename from Archive/2017-Course-Notes/Elements-Of-RL/readme.md
rename to notes/sutton-barto-digest/readme.md
index 24eaffb..69779ba 100644
--- a/Archive/2017-Course-Notes/Elements-Of-RL/readme.md
+++ b/notes/sutton-barto-digest/readme.md
@@ -1,3 +1,5 @@
+<!-- status: hand-written | provenance: notes by Yad Konrad while taking the courses (2017); kept as written -->
+
 #### Elements Of Reinforcement Learning: (Derived from Barto and Sutton '17 and Li '17)
 
 * A policy defines the learning agent’s way of behaving at a given time. Roughly speaking, a policy is a mapping from perceived states of the environment to actions to be taken when in those states.
diff --git a/readme.md b/readme.md
index a9bc791..35c8c5e 100644
--- a/readme.md
+++ b/readme.md
@@ -6,21 +6,28 @@ This is a personal study repo, not a library. It mixes notes a person wrote (som
 
 ## What's here
 
-- **`notes/`** — the lecture series (`notes/lectures/`), plus cheat sheets and diagrams. Currently unreviewed; see [`notes/README.md`](./notes/README.md) for the index and review status.
-- **`exercises/`** — small coding exercises with tests and reference solutions. Built to be worked through step by step, with a coding agent or on your own.
-- **`CURRICULUM.md`** — the ordered path through the lectures and exercises.
-- **`Archive/`** — the original 2017 course notes (CS294 Deep RL, Berkeley) and a short Sutton & Barto digest. Kept as written.
-- **`reference/papers/`** — reading lists of recent papers, collected from arXiv by the script in `tools/`.
-- **`tools/`** — `arxiv-collector/` (fetches arXiv papers), `lit-builder/` (conference-paper triage — a retuned copy of `iclr-lit-builder`: fetches ICLR/NeurIPS/ICML paper lists, keyword-filters, LLM-scores 0–3 with a reason), and `content-pipeline/` (drafts blog posts / threads from papers; auxiliary).
+**Trusted, hand-written:**
 
-## Start here
+- **[`notes/cs294-2017/`](./notes/cs294-2017/)** — personal student notes from CS 294 Deep RL (Berkeley, Spring 2017 — Levine, Schulman, Finn). 246 lines of real-time notes from the field being built. Idiosyncratic, opinionated, with the cannon-trajectory aside. Kept as written.
+- **[`notes/sutton-barto-digest/`](./notes/sutton-barto-digest/)** — short distillation of the four elements of an RL system, from Sutton & Barto.
+- **Talks, books, courses** — the curated external links below. The Pineau intro, Abbeel's deep RL talk, David Silver's UCL course, Sutton & Barto's book, CS285, Spinning Up. Here since 2015. Still the best place to start if you're new.
+- **[`exercises/`](./exercises/)** — five small coding exercises with `pytest` tests and reference solutions, verified to pass. Implement REINFORCE on CartPole, Q-learning on FrozenLake, value iteration on a gridworld, actor-critic, a tiny GRPO loop on a verifiable arithmetic task.
+
+**AI-drafted, useful as scaffold (`unreviewed` — treat with skepticism):**
+
+- **[`notes/lectures/`](./notes/lectures/)** — a 19-lecture series, MDPs through RLHF / DPO / GRPO / RLVR / agentic / offline. Editorial pass done (broken links, code bugs, made-up citations all caught and fixed) — but no person has read each one end-to-end. Cross-check the math against the cited papers before relying on it. Index and per-lecture status in [`notes/README.md`](./notes/README.md); ordered study path in [`CURRICULUM.md`](./CURRICULUM.md).
+- **`notes/cheat-sheets/`, `notes/diagrams/`** — quick reference. Same caveat. (The diagrams file caught and fixed two wrong loss diagrams during the audit, FWIW.)
+- **[`reference/papers/`](./reference/papers/)** — auto-collected paper lists from arXiv (~430 abstracts). Use as a search index, not a curated reading list.
+- **[`tools/`](./tools/)** — `arxiv-collector/` (fetches arXiv papers), `lit-builder/` (ICLR/NeurIPS/ICML triage with keyword filter + LLM scoring), `content-pipeline/` (drafts blog posts from papers; auxiliary).
 
-- New to RL: read [`CURRICULUM.md`](./CURRICULUM.md), then start `notes/lectures/01-mdps-bellman.md`. Do the exercises as you go.
-- Know RL, here for the LLM part: skim lectures 1–5, then 9 onward (reward modeling → PPO for LLMs → DPO → GRPO).
-- Want the original 2017 notes: `Archive/2017-Course-Notes/`.
-- Working in this repo with Claude Code or Codex? Read [`AGENTS.md`](./AGENTS.md) first.
+[`AGENTS.md`](./AGENTS.md) explains the `<!-- status: hand-written | reviewed | unreviewed -->` convention every doc carries.
+
+## Start here
 
-For the foundational external material the repo has always pointed at — talks, books, courses — see below. It's still the best starting point if you want lectures from the people who built the field.
+- **New to RL?** Start with the talks/books/courses below — Pineau's intro, then Sutton & Barto for foundations, then David Silver's UCL course or CS285 (Berkeley's current version of CS294). The 2017 CS294 notes ([`notes/cs294-2017/`](./notes/cs294-2017/)) give you one student's working notes through the same material if you like that genre.
+- **Want hands-on?** Do the [`exercises/`](./exercises/). They're tested and they actually run. Five of them, a couple of hours each.
+- **Curious about modern LLM RL?** The 19-lecture series in [`notes/lectures/`](./notes/lectures/) covers RLHF, DPO, GRPO, RLVR, agentic, offline. Drafts; cross-check the claims against the cited papers.
+- **Working in this repo with Claude Code or Codex?** Read [`AGENTS.md`](./AGENTS.md) first.
 
 ## The landscape
 
@@ -151,7 +158,7 @@ The map below shows where each family fits. The lectures fill in the details; [`
 	* Lecture 9: Exploration and Exploitation
 	* Lecture 10: Case Study: RL in Classic Games
 
-* [CS 294: Deep Reinforcement Learning, Spring 2017](https://rll.berkeley.edu/deeprlcourse-fa17/) by Sergey Levine, John Schulman, Chelsea Finn. My notes are archived at [`Archive/2017-Course-Notes/CS294-DeepRL-Berkeley/`](./Archive/2017-Course-Notes/CS294-DeepRL-Berkeley/).
+* [CS 294: Deep Reinforcement Learning, Spring 2017](https://rll.berkeley.edu/deeprlcourse-fa17/) by Sergey Levine, John Schulman, Chelsea Finn. My notes from taking it are at [`notes/cs294-2017/`](./notes/cs294-2017/).
 
 * [CS 285: Deep Reinforcement Learning (Berkeley)](https://rail.eecs.berkeley.edu/deeprlcourse/) — the current version of CS294, updated each year.