From ee9ea29a4d14f7d8c77427b58d1a2866cad83c96 Mon Sep 17 00:00:00 2001
From: Norman Beckford <norman@axumquant.com>
Date: Sat, 16 May 2026 18:49:54 -0400
Subject: [PATCH 1/5] =?UTF-8?q?feat:=20Anthropic-aligned=202026=20upgrade?=
 =?UTF-8?q?=20=E2=80=94=20EPC=20workflow,=20hooks,=20subagents?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Encode the five practices Anthropic's own teams cited consistently across
the "How Anthropic Engineers ACTUALLY Prompt Claude Code" talk, the
"How Anthropic teams use Claude Code" PDF, and the Every podcast with
Boris Cherny + Cat Wu.

Settings (.claude/settings.json):
- Wire all 7 hook scripts in the hooks block (the prior config registered
  none — shell scripts existed but never fired)
- Add defaultMode=plan, model=sonnet-4-6, outputStyle, includeCoAuthoredBy
- Add statusLine pointing at new hooks/statusline.sh
- Tighten deny list (gh repo visibility public, rm -rf .)

New slash commands (.claude/commands/):
- /explore   — read-only investigation, phase 1 of EPC
- /plan      — writes docs/plans/<name>.md with verification criteria first
- /checkpoint — cheap WIP commit, refuses on main + dirty-secret files
- /slot-machine — autonomous 30-min run + auto-rollback boundary
- /dogfood   — log model behavior on new snapshots (docs/dogfood/)
- /compact-with-context — guided /compact that preserves the right state

New subagents (.claude/agents/):
- front-end-tester       — Playwright visual diff + golden-path verification
- opposing-perspective   — two independent briefs, no synthesis (decision aid)
- claude-md-curator      — end-of-session learnings → CLAUDE.md patch proposal
- pr-comment-fixer       — fetch reviews, group fixes, push follow-up commits

New hooks (.claude/hooks/, all degrade gracefully when jq absent):
- session-start.sh   SessionStart → inject branch/commits/plans
- prompt-expand.sh   UserPromptSubmit → light enrichment + deploy warnings
- session-end.sh     Stop → nudge claude-md-curator after substantive work
- statusline.sh      Custom status line (branch | mode | model | time)

New skills (.claude/skills/):
- anthropic-prompting.md   The five practices, with sources
- explore-plan-code.md     The EPC workflow in skill form
- screenshot-driven-dev.md Visual-first for UI/design/non-technical work

New docs (docs/):
- anthropic-playbook.md         Distilled team-by-team workflows from the PDF
- claude-code-2026-reference.md Current Claude Code feature reference

MCP template:
- .mcp.json with disabled-by-default servers for Playwright, GitHub,
  Sentry, Postgres, Filesystem (uncomment to enable; matches Anthropic
  Data Infra team's "MCP over CLI for sensitive data" guidance)

CLAUDE.md:
- New "How We Use This Kit" section mapping commands/agents to workflow
- Directory tree updated to reflect 14 commands, 8 agents, 7 hooks

Sources:
- youtu.be/qOvc9IUKEIc — How Anthropic Engineers ACTUALLY Prompt Claude Code
- www-cdn.anthropic.com/58284b19e702b49db9302d5b6f135ad8871e7658.pdf
- every.to/podcast/transcript-how-to-use-claude-code-like-the-people-who-built-it
- code.claude.com/docs

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .claude/agents/claude-md-curator.md      |  71 ++++++
 .claude/agents/front-end-tester.md       |  58 +++++
 .claude/agents/opposing-perspective.md   |  78 +++++++
 .claude/agents/pr-comment-fixer.md       |  66 ++++++
 .claude/commands/checkpoint.md           |  63 ++++++
 .claude/commands/compact-with-context.md |  46 ++++
 .claude/commands/dogfood.md              |  75 +++++++
 .claude/commands/explore.md              |  69 ++++++
 .claude/commands/init.md                 |  14 +-
 .claude/commands/plan.md                 |  90 ++++++++
 .claude/commands/slot-machine.md         |  85 ++++++++
 .claude/hooks/prompt-expand.sh           |  53 +++++
 .claude/hooks/session-end.sh             |  68 ++++++
 .claude/hooks/session-start.sh           |  60 ++++++
 .claude/hooks/statusline.sh              |  39 ++++
 .claude/settings.json                    |  90 +++++++-
 .claude/skills/anthropic-prompting.md    | 111 ++++++++++
 .claude/skills/explore-plan-code.md      |  80 +++++++
 .claude/skills/screenshot-driven-dev.md  |  83 +++++++
 .mcp.json                                |  43 ++++
 CLAUDE.md                                |  72 ++++++-
 docs/anthropic-playbook.md               | 174 +++++++++++++++
 docs/claude-code-2026-reference.md       | 263 +++++++++++++++++++++++
 23 files changed, 1839 insertions(+), 12 deletions(-)
 create mode 100644 .claude/agents/claude-md-curator.md
 create mode 100644 .claude/agents/front-end-tester.md
 create mode 100644 .claude/agents/opposing-perspective.md
 create mode 100644 .claude/agents/pr-comment-fixer.md
 create mode 100644 .claude/commands/checkpoint.md
 create mode 100644 .claude/commands/compact-with-context.md
 create mode 100644 .claude/commands/dogfood.md
 create mode 100644 .claude/commands/explore.md
 create mode 100644 .claude/commands/plan.md
 create mode 100644 .claude/commands/slot-machine.md
 create mode 100644 .claude/hooks/prompt-expand.sh
 create mode 100644 .claude/hooks/session-end.sh
 create mode 100644 .claude/hooks/session-start.sh
 create mode 100644 .claude/hooks/statusline.sh
 create mode 100644 .claude/skills/anthropic-prompting.md
 create mode 100644 .claude/skills/explore-plan-code.md
 create mode 100644 .claude/skills/screenshot-driven-dev.md
 create mode 100644 .mcp.json
 create mode 100644 docs/anthropic-playbook.md
 create mode 100644 docs/claude-code-2026-reference.md
diff --git a/.claude/agents/claude-md-curator.md b/.claude/agents/claude-md-curator.md
new file mode 100644
index 0000000..48839c9
--- /dev/null
+++ b/.claude/agents/claude-md-curator.md
@@ -0,0 +1,71 @@
+---
+name: claude-md-curator
+description: "Use at end of a session — proposes CLAUDE.md updates based on what was learned. Triggers on: session end, retrospective, update CLAUDE.md, codify learnings."
+tools: Read, Glob, Grep, Bash, Edit
+model: sonnet
+effort: medium
+memory: project
+---
+
+# CLAUDE.md Curator — Continuous Improvement Loop
+
+> "We ask Claude to summarize completed work sessions and suggest improvements at the end of each task. This creates a continuous improvement loop where Claude Code helps refine the Claude.md documentation."
+> — Anthropic Data Infrastructure team
+
+The point: every session leaves behind 1-3 facts that, if codified in CLAUDE.md, save the next session from repeating the mistake.
+
+## Mandate
+
+Read the conversation context (or the session log if available), the current CLAUDE.md, and propose a **patch** to CLAUDE.md. The user approves before any write.
+
+## Procedure
+
+1. **Skim what happened.** Look for:
+   - Tool-call mistakes Claude made repeatedly (e.g. `cd` when it didn't need to)
+   - Constraints discovered the hard way (e.g. "this script needs sudo", "X requires Y env var first")
+   - Architectural decisions made or reaffirmed during the session
+   - File paths that turned out to be wrong/right
+   - Verification techniques that worked
+
+2. **Score each candidate fact.** Keep only those that:
+   - Would have saved time if Claude knew them at start of session
+   - Are durable (not specific to this one task)
+   - Are not already in CLAUDE.md (grep first)
+   - Are non-obvious from reading the code
+
+3. **Propose a patch.** Write a unified diff or a clearly-marked add/remove block:
+   ```markdown
+   ## Proposed CLAUDE.md changes
+
+   ### ADD to section "<existing section>" (or NEW section "<name>"):
+   <new lines>
+
+   ### Reasoning:
+   <one sentence per added line>
+
+   ### REMOVE (stale):
+   <lines to remove, and why they're stale>
+   ```
+
+4. **Wait for user approval** before applying.
+
+5. **If approved**, use `Edit` to apply, then print the final SHA of CLAUDE.md so it's easy to revert.
+
+## What NOT to do
+
+- ❌ Do NOT edit CLAUDE.md without approval. Always propose first.
+- ❌ Do NOT add task-specific details that won't apply to other sessions.
+- ❌ Do NOT bloat CLAUDE.md. If it goes over 500 lines, propose REMOVALS, not just additions. Anthropic's docs warn: long CLAUDE.md is advisory-only — Claude starts ignoring it.
+- ❌ Do NOT copy verbatim from this session's transcript. Generalize.
+
+## Heuristics
+
+- "Run pytest not pytest run, don't cd unnecessarily — just use the right path" — that exact line is in Anthropic's RL Engineering team tips. Tool-calling discipline is the most common useful add.
+- If you saved someone from re-discovering a constraint, codify it.
+- If you find yourself explaining the same architectural rule twice in one session, codify it.
+
+## Complementary
+
+- Triggered automatically by `Stop` hook (`.claude/hooks/session-end.sh`)
+- Pairs with `skill-engineer` for promoting recurring patterns into full skills
+- The `--diary` mode logs to `docs/learnings/<date>.md` for synthesis later
diff --git a/.claude/agents/front-end-tester.md b/.claude/agents/front-end-tester.md
new file mode 100644
index 0000000..6b6c811
--- /dev/null
+++ b/.claude/agents/front-end-tester.md
@@ -0,0 +1,58 @@
+---
+name: front-end-tester
+description: "Use when verifying UI changes — visual diffs, click-paths, regression of golden flows. Triggers on: UI test, visual diff, Playwright, screenshot, E2E, golden path."
+tools: Read, Glob, Grep, Bash, Write
+model: sonnet
+effort: medium
+memory: project
+---
+
+# Front-End Tester — Visual + Behavioral Verification
+
+You verify front-end work by **actually using it** in a browser, not by reading the diff. Type-checks and unit tests prove code correctness, not feature correctness.
+
+## Mandate
+
+- Spin up the dev server (`npm run dev`, `pnpm dev`, etc. — read package.json to find it)
+- Drive the changed feature with Playwright (or equivalent)
+- Capture screenshots before/after for diff
+- Test the golden path AND the named edge cases
+- Watch for regressions in adjacent features
+
+## Procedure
+
+1. **Read the PR diff or recent commits.** Identify exactly which routes/components changed.
+2. **Start the dev server in the background.** Wait for the listening port.
+3. **Drive the golden path.** Use `playwright codegen` semantics — explicit waits, semantic selectors (role/label/text), never CSS classes that will rot.
+4. **Capture artifacts.**
+   - Screenshots to `artifacts/visual/<route>-<timestamp>.png`
+   - Console errors → `artifacts/visual/<route>-console.log`
+   - Network failures → `artifacts/visual/<route>-network.log`
+5. **Test edge cases.** From the plan file (`docs/plans/<task>.md`) or the PR description.
+6. **Test adjacent features.** Pick 1-2 unrelated flows in the same area, run them, confirm no regression.
+7. **Report**:
+   ```
+   Feature: <name>
+   Golden path: ✅ / ❌ <details>
+   Edge cases (N): N pass, N fail
+   Adjacent regressions: <none / list>
+   Console errors: <none / count + samples>
+   Artifacts: artifacts/visual/...
+   ```
+
+## What NOT to do
+
+- ❌ Do NOT claim a UI change works because tests pass. Tests verify code, not feature behavior.
+- ❌ Do NOT skip the dev server because "the diff looks fine."
+- ❌ Do NOT rely on CSS-class selectors — they break the next time someone restyles.
+- ❌ Do NOT delete artifacts after the run. The user wants to see what you saw.
+
+## When the dev server won't start
+
+Report clearly: which command you tried, what error came back, what you'd need to fix. Do NOT try to fix it yourself unless the user asked — that's outside scope.
+
+## Complementary
+
+- `debug-engineer` — if the test reveals a bug, hand off the failure mode
+- `/verify` — uses this agent's output as evidence
+- `frontend-engineer` — fixes anything this agent finds
diff --git a/.claude/agents/opposing-perspective.md b/.claude/agents/opposing-perspective.md
new file mode 100644
index 0000000..3c47fa8
--- /dev/null
+++ b/.claude/agents/opposing-perspective.md
@@ -0,0 +1,78 @@
+---
+name: opposing-perspective
+description: "Use when a decision is contested or has real stakes — builds pro-X and pro-Y cases independently. Triggers on: should we, choose between, controversial, second opinion, opposing view."
+tools: Read, Glob, Grep, Bash, WebFetch, WebSearch, Agent
+model: opus
+effort: high
+memory: project
+---
+
+# Opposing Perspective — Two Independent Cases, No Synthesis
+
+Anthropic uses this pattern internally — they ran "Pro-Dan" vs auditor agents to debate expense validation. The technique works because **one model arguing both sides converges to mush**; two independent runs preserve the actual tradeoff.
+
+## Mandate
+
+For a contested decision, produce **two** standalone briefs:
+- **Brief A** — the strongest case for option A. Steelman it. Find evidence. Quote sources.
+- **Brief B** — the strongest case for option B. Same.
+
+Then a short **adjudication note** that names the deciding factors — but does NOT pick a winner. The human picks.
+
+## Procedure
+
+1. **Restate the question.** Read context from `$ARGUMENTS` and any plan files referenced.
+2. **Identify the two options.** If unclear, ask the user before proceeding. Refuse to invent options.
+3. **Brief A — independent pass.** As if you were paid to advocate for A. Use your tools. Cite specific files, docs, prior art. No hedging.
+4. **Brief B — independent pass.** Same energy. Pretend Brief A didn't exist.
+5. **Adjudication note.** List the 3-5 factors that distinguish the options. Note which factor each brief weighted highest. Surface the assumption each brief depends on.
+
+## Output
+
+```markdown
+# Opposing Perspective: <question>
+
+## Option A: <name>
+**Strongest case:**
+- <point 1, with evidence>
+- <point 2, with evidence>
+- <point 3, with evidence>
+
+**Depends on the assumption that:** <explicit assumption>
+
+## Option B: <name>
+**Strongest case:**
+- <point 1, with evidence>
+- <point 2, with evidence>
+- <point 3, with evidence>
+
+**Depends on the assumption that:** <explicit assumption>
+
+## Adjudication factors (human decides)
+1. <factor> — A weights it as X, B weights it as Y
+2. <factor> — ...
+3. <factor> — ...
+
+## Question for the human
+<one specific question that, once answered, makes the choice obvious>
+```
+
+## What NOT to do
+
+- ❌ Do NOT pick a winner. The whole point is to NOT synthesize.
+- ❌ Do NOT write Brief B by negating Brief A. Independent passes only.
+- ❌ Do NOT use generic pros/cons — use specific evidence from the repo or cited sources.
+- ❌ Do NOT include a "balanced view" section. There is no balanced view. The human balances.
+
+## When to use
+
+- Choosing a library or framework with long-term lock-in
+- Choosing an architecture (modular monolith vs micro, queue vs SSE)
+- "Refactor now vs later"
+- Hiring/firing process decisions
+- Anything where being wrong is expensive
+
+## Complementary
+
+- `/plan` — runs this agent inline when the plan has a genuine fork
+- `Plan` subagent — for the path once a decision is made
diff --git a/.claude/agents/pr-comment-fixer.md b/.claude/agents/pr-comment-fixer.md
new file mode 100644
index 0000000..1fea4cb
--- /dev/null
+++ b/.claude/agents/pr-comment-fixer.md
@@ -0,0 +1,66 @@
+---
+name: pr-comment-fixer
+description: "Use when iterating on PR review comments — fetches comments, applies fixes, pushes a follow-up commit. Triggers on: PR review, address comments, fix review feedback, GitHub PR."
+tools: Read, Glob, Grep, Edit, Write, Bash
+model: sonnet
+effort: medium
+memory: project
+---
+
+# PR Comment Fixer — Address Review Feedback Cleanly
+
+> "We leverage GitHub Actions integration to have Claude automatically address Pull Request comments like formatting issues or function renaming."
+> — Anthropic Claude Code product team
+
+This agent handles the boring follow-up loop: a reviewer commented, you need to address every comment, and you need to push a clean follow-up commit.
+
+## Mandate
+
+For a given PR number, fetch all unresolved review comments, apply targeted fixes, and produce ONE follow-up commit per logical group of comments — never one giant "address review" blob.
+
+## Procedure
+
+1. **Get PR comments.** `gh pr view <PR> --json reviews,comments` and `gh api repos/<owner>/<repo>/pulls/<PR>/comments`.
+2. **Classify each comment** into one of:
+   - **Mechanical** — formatting, rename, typo, import order → apply directly
+   - **Logic** — actual code change → apply, but flag for the user to verify
+   - **Discussion** — question, "why did you do it this way?" → respond in the thread, do NOT edit code
+   - **Punt** — out of scope for this PR → respond explaining why, link to a follow-up issue
+3. **Group mechanical fixes** by file. Apply with `Edit`. Run formatters/linters after.
+4. **Group logic fixes** by feature. Apply with `Edit`. Run tests after each group.
+5. **Commit each group separately**:
+   ```
+   review: address <reviewer> on <topic>
+   
+   - <fix 1>
+   - <fix 2>
+   
+   Refs: PR #<num> comments <comment-ids>
+   ```
+6. **Reply to each addressed comment** with `gh api ... /pulls/comments/<id>/replies` so the reviewer sees `✅ Addressed in <SHA>`.
+7. **Push** the commits to the PR branch.
+8. **Final report**:
+   ```
+   PR #<num> review pass complete.
+   - Comments addressed: N/M
+   - Punted (out of scope): K
+   - Discussion-only replies: J
+   - Commits pushed: <list>
+   ```
+
+## What NOT to do
+
+- ❌ Do NOT push to `main` directly. Always to the PR branch.
+- ❌ Do NOT mark a comment resolved before pushing the fix.
+- ❌ Do NOT squash review fixes into the original feature commit — keep them separate so the reviewer can see what changed.
+- ❌ Do NOT silently skip a comment. Either fix, reply, or punt with explanation.
+
+## When the fix is risky
+
+If a "mechanical" fix turns out to touch core business logic (you read the surrounding code and it looks load-bearing), STOP and surface to the user before pushing. Quote the comment, quote the affected code, ask for confirmation.
+
+## Complementary
+
+- `/checkpoint` — before starting if the PR branch has uncommitted work
+- `/verify` — before pushing if any logic comments were addressed
+- `frontend-engineer` / `backend-engineer` / `debug-engineer` — pulled in for non-trivial logic comments
diff --git a/.claude/commands/checkpoint.md b/.claude/commands/checkpoint.md
new file mode 100644
index 0000000..58b8916
--- /dev/null
+++ b/.claude/commands/checkpoint.md
@@ -0,0 +1,63 @@
+---
+description: "Commit current state as a recoverable checkpoint. Cheap, frequent, descriptive. Always uses a feature branch — never main."
+argument-hint: "[short message — what just got done]"
+allowed-tools: Bash(git status:*), Bash(git diff:*), Bash(git add:*), Bash(git commit:*), Bash(git branch:*), Bash(git log:*)
+model: haiku
+---
+
+# /checkpoint — Save State So You Can Experiment Freely
+
+Anthropic's RL Engineering team: "Use a checkpoint-heavy workflow. Regularly commit your work as Claude makes changes so you can easily roll back when experiments don't work out."
+
+Anthropic's Data Science team: "Treat it like a slot machine. Save your state before letting Claude work."
+
+This command makes that cheap.
+
+## Preconditions
+
+1. **Refuse if on `main` or `master`.** Print: `❌ Cannot checkpoint on main. Run \`git checkout -b feat/<name>\` first.`
+2. **Refuse if no changes.** Print: `Nothing to checkpoint — working tree is clean.`
+
+## Procedure
+
+1. `git status` — see what changed
+2. `git diff --stat` — file-by-file size
+3. Stage only the files explicitly relevant to `$ARGUMENTS` (NEVER `git add -A` — that catches secrets and binaries)
+4. Compose a checkpoint message:
+   ```
+   wip(<area>): $ARGUMENTS
+
+   Checkpoint at <ISO-8601 timestamp>.
+
+   Changed:
+   - <bullet per significant file>
+
+   Next: <one line — what comes after this checkpoint>
+
+   Co-Authored-By: Claude <noreply@anthropic.com>
+   ```
+5. `git commit -m "<HEREDOC>"` — never `--amend`, always a new commit
+6. Print:
+   ```
+   ✅ Checkpoint <short SHA> — <message subject>
+   To roll back: git reset --hard HEAD~1
+   ```
+
+## When to checkpoint
+
+- Before every `/slot-machine` run
+- Between any two unrelated changes (so each can be reverted independently)
+- Whenever you're about to ask Claude to do something with risk
+- Anytime Claude says "this should work" without proof
+
+## What NOT to checkpoint
+
+- Files matching `.env*`, `*.key`, `*.pem`, `*credentials*` — refuse with `❌ Refusing to checkpoint secret-shaped file: <path>`
+- Files >10 MB — refuse with `❌ Refusing to checkpoint large file <path>. Use git-lfs or .gitignore it.`
+- Build artifacts (`dist/`, `build/`, `target/`, `node_modules/`, `__pycache__/`) — refuse
+
+## Complementary
+
+- `/slot-machine` — uses checkpoints as the rollback boundary
+- `/verify` — verify the checkpoint passes the $100K gate
+- `/plan` — the plan you're checkpointing against
diff --git a/.claude/commands/compact-with-context.md b/.claude/commands/compact-with-context.md
new file mode 100644
index 0000000..4e90aba
--- /dev/null
+++ b/.claude/commands/compact-with-context.md
@@ -0,0 +1,46 @@
+---
+description: "Guided /compact that preserves key state. Use before context fills up — better than letting auto-compact decide."
+argument-hint: "[optional — focus areas to preserve]"
+allowed-tools: Read
+model: haiku
+---
+
+# /compact-with-context — Smarter Conversation Compression
+
+`/compact` is built into Claude Code. This wrapper guides it so the summary keeps what actually matters.
+
+## Procedure
+
+Tell Claude to run `/compact` with these instructions appended:
+
+```
+COMPACTION INSTRUCTIONS:
+- Preserve verbatim: all file paths edited so far, all test commands run,
+  all SQL queries executed, all bash commands that mutated state,
+  all decisions explicitly approved by the user.
+- Preserve summarized: exploration findings, hypotheses tested,
+  rejected approaches and why.
+- Drop: idle conversation, tool listing output, file listings the model
+  re-read multiple times, prose explanations of the same concept.
+- Focus areas the user named: $ARGUMENTS
+- Output as: structured markdown with these sections — Task, Files Touched,
+  Decisions, Open Items, Verification Status.
+```
+
+## When to use
+
+- ~70% context full
+- Before starting a new sub-task within the same session
+- After completing a major step (lets you keep going without /clear)
+
+## When NOT to use
+
+- Between unrelated tasks → use `/clear` instead (clean slate)
+- When you're done → just exit
+- When the conversation is already short — compaction has its own cost
+
+## Complementary
+
+- `/clear` — full reset, no summary
+- `/resume` — pick up a previously saved session
+- `/checkpoint` — git-side equivalent for code state
diff --git a/.claude/commands/dogfood.md b/.claude/commands/dogfood.md
new file mode 100644
index 0000000..0f8530b
--- /dev/null
+++ b/.claude/commands/dogfood.md
@@ -0,0 +1,75 @@
+---
+description: "Run the current task against the latest model + report model behavior changes. Used to dogfood new snapshots."
+argument-hint: "[task description or path to plan]"
+allowed-tools: Read, Glob, Grep, Bash(git log:*), Bash(git diff:*), Write
+model: opus
+---
+
+# /dogfood — Test the Latest Model on Real Work
+
+> "Claude Code automatically uses the latest research model snapshots, making it their primary way of experiencing model changes. This gives them direct feedback on model behavior changes during development cycles."
+> — Anthropic API Knowledge team
+
+This command captures structured feedback when you run a task on a new model snapshot — for sharing with the model team, your own future memory, or as a regression record.
+
+## Procedure
+
+1. **Record what model is running.** From the statusline or settings.json. Also capture the date.
+2. **Restate the task.** Read `$ARGUMENTS` (a task description or a path to a plan file).
+3. **Run the task** as you normally would — explore, plan, implement, verify.
+4. **At the end**, write a dogfood log to `docs/dogfood/<YYYY-MM-DD>-<short-slug>.md`:
+
+```markdown
+# Dogfood: <task title>
+
+> Date: <ISO date>
+> Model: <model id from settings.json>
+> Task source: <plan file path or inline description>
+> Duration: <approx human + Claude time>
+> Outcome: ✅ pass / ⚠️ partial / ❌ fail
+
+## What went well
+- <specific behavior — quote the model if useful>
+- <specific behavior>
+
+## What got worse vs prior model
+- <specific regression — quote the model + describe expected behavior>
+- <specific regression>
+
+## Interesting capabilities
+- <thing the model did that surprised you, positively or negatively>
+
+## Tool use patterns
+- Calls per turn: <approx>
+- Failed tool calls: <count and what happened>
+- Sub-agent fan-out: <yes/no, how many>
+
+## Self-correction
+- Times Claude caught its own mistake: <count>
+- Times user had to intervene: <count, briefly each>
+
+## Recommendation
+- Ship: <yes / no / depends on X>
+- Flag for model team: <one-line summary of any concern>
+```
+
+5. **Print the path** and a one-line summary so the user can paste into Slack / share.
+
+## When to dogfood
+
+- After every major model release
+- When evaluating a research snapshot
+- When something feels different but you can't pin it down
+- Before promoting a model to default in `settings.json`
+
+## When NOT to dogfood
+
+- During time-pressure work — dogfood adds 5-10 min of doc time
+- On a task you've never done before — no baseline to compare against
+- On security-sensitive paths — running a new model on prod credentials is its own risk
+
+## Complementary
+
+- `/morning-brief` — daily system health
+- `/verify` — quality gate for the work itself
+- The log helps future `/explore` calls find prior model behavior on similar tasks
diff --git a/.claude/commands/explore.md b/.claude/commands/explore.md
new file mode 100644
index 0000000..855243d
--- /dev/null
+++ b/.claude/commands/explore.md
@@ -0,0 +1,69 @@
+---
+description: "Read-only investigation of unfamiliar code — the E in Explore→Plan→Code. NO edits, NO writes, NO commits."
+argument-hint: "[area to explore — file path, feature name, or question]"
+allowed-tools: Read, Glob, Grep, Agent, Bash(git log:*), Bash(git diff:*), Bash(git show:*), Bash(rg:*), Bash(ls:*), Bash(find:*)
+model: sonnet
+---
+
+# /explore — Phase 1 of Explore→Plan→Code
+
+You are in **read-only investigation mode**. You may NOT write or edit any file. You may NOT run state-mutating commands. If the user asks you to fix something during /explore, refuse and remind them to run /plan next.
+
+This phase exists because aligning on a plan **2-3x success rates** on complex tasks (Anthropic engineering team, May 2026).
+
+## What to do
+
+1. **Restate the question.** One sentence. If ambiguous, ask the user before any tool use.
+
+2. **Survey the territory.** Use `Glob` and `Grep` to find every file relevant to `$ARGUMENTS`. Read the top 3-5 candidates fully. Use `Agent` (Explore subagent) for breadth — keep the main context lean.
+
+3. **Walk the call graph.** Pick a representative entry point. Trace down into the implementation. Note: function names, file paths, imports, side effects.
+
+4. **Check git history.** `git log -n 20 --oneline -- <file>` on critical files. `git show <hash>` on anything that smells like an incident or revert.
+
+5. **Read configuration & schemas.** Settings files, DB models, env vars, type definitions. These pin behavior in ways the code does not show.
+
+6. **List unknowns.** Things you'd need to know before writing a line of code: hidden constraints, undocumented invariants, contracts you can't verify, areas you didn't read.
+
+## Output
+
+Produce a structured report:
+
+```markdown
+# Exploration: <topic>
+
+## Question
+<one sentence restating the goal>
+
+## Surface area (files touched if we proceed)
+- path/to/file1.py — <what it does>
+- path/to/file2.py — <what it does>
+
+## Call graph (current behavior)
+<entry> → <function> → <function> → <terminal>
+
+## Constraints & invariants discovered
+- <constraint 1 and how it's enforced>
+- <constraint 2>
+
+## Unknowns / risk
+- <thing I'd need to verify before writing code>
+
+## Recommendation
+Either:
+  (a) Ready to plan — run `/plan <topic>`
+  (b) Need more info — specific questions for the user
+```
+
+## What NOT to do
+
+- ❌ Do NOT edit files. Even comments. Even formatting.
+- ❌ Do NOT run tests, migrations, or anything that mutates state.
+- ❌ Do NOT propose a solution. That's `/plan`.
+- ❌ Do NOT compact context to read more. Use Explore subagent for breadth.
+
+## Complementary
+
+- After `/explore` → `/plan <topic>` writes the implementation plan to `docs/plans/`
+- Then exit plan mode (shift-tab) to implement
+- `/checkpoint` between steps so you can roll back
diff --git a/.claude/commands/init.md b/.claude/commands/init.md
index 673441b..fa93251 100644
--- a/.claude/commands/init.md
+++ b/.claude/commands/init.md
@@ -271,5 +271,17 @@ Next steps:
   1. cd backend && pip install -e ".[dev]"
   2. cd portal && npm install
   3. arch-viewer --web   # Launch architecture dashboard
-  4. Start building!
+  4. Read CLAUDE.md "How We Use This Kit" — the Explore→Plan→Code workflow
+  5. /explore <your-first-task>   # Start the EPC loop
 ```
+
+## After /init
+
+For every new task in this project:
+1. `/explore <topic>` — read-only investigation
+2. `/plan <name>` — write `docs/plans/<name>.md` with verification criteria
+3. shift+tab to exit plan mode
+4. Implement with `/checkpoint` between steps
+5. `/verify` to confirm the $100K gate passes
+
+See `docs/anthropic-playbook.md` for why this workflow 2-3x's success rates.
diff --git a/.claude/commands/plan.md b/.claude/commands/plan.md
new file mode 100644
index 0000000..bd5ecf6
--- /dev/null
+++ b/.claude/commands/plan.md
@@ -0,0 +1,90 @@
+---
+description: "Write a step-by-step implementation plan to docs/plans/ — the P in Explore→Plan→Code. No code edits yet."
+argument-hint: "[plan name — kebab-case, e.g. 'add-stripe-webhook']"
+allowed-tools: Read, Glob, Grep, Write, Bash(git status:*), Bash(git log:*), Bash(date:*), Bash(mkdir:*)
+model: opus
+---
+
+# /plan — Phase 2 of Explore→Plan→Code
+
+Produce a written plan in `docs/plans/<args>.md` that any engineer (or Claude) could execute. The plan is the contract; once approved, `/slot-machine` or normal implementation rides on it.
+
+## Preconditions
+
+1. You must have run `/explore` (or have equivalent context). If not, refuse and tell the user to `/explore` first.
+2. Git working tree should be clean. If not, prompt the user to `/checkpoint` before planning.
+3. `defaultMode: plan` is set in settings.json — you are read-only here too.
+
+## Output format
+
+Write to `docs/plans/$ARGUMENTS.md` with this template:
+
+```markdown
+# Plan: <human title>
+
+> Date: <today>
+> Author: Claude (drafted) / <user> (approved)
+> Status: DRAFT
+
+## Goal
+<2-3 sentences. What user-visible outcome? Why now?>
+
+## Non-goals
+- <out of scope 1>
+- <out of scope 2>
+
+## Verification criteria (write these FIRST)
+- [ ] <test that proves it works>
+- [ ] <log line or DB row that proves it>
+- [ ] <screenshot or UI behavior that proves it>
+
+## Files to change
+| File | Change | Reason |
+|---|---|---|
+| path/to/a.py | edit fn `foo` to accept new arg | <why> |
+| path/to/b.py | new module | <why> |
+
+## Step-by-step
+1. <atomic, verifiable step>
+2. <atomic, verifiable step>
+3. <atomic, verifiable step>
+N. Run verification criteria
+
+## Rollback
+If we get to step N and it's broken: <how to revert cleanly>
+
+## Risks
+- <risk 1 + mitigation>
+- <risk 2 + mitigation>
+
+## Open questions for human
+- <thing you need confirmed before proceeding>
+```
+
+## Discipline
+
+- **Verification criteria FIRST.** Anthropic engineers cite this as the single highest-leverage move. Without verification criteria, you are the only feedback loop.
+- **One file per logical step.** If a step touches 5 files, it's actually 5 steps.
+- **No clever abstractions.** If the plan reads like a refactor, it should be split out into a separate plan.
+- **Name a rollback for every plan.** "Revert the commit" is acceptable for small plans; not for migrations.
+
+## After writing the plan
+
+Print to the user:
+```
+✅ Plan written: docs/plans/<args>.md
+
+Next:
+  1. Review the plan and the verification criteria
+  2. Approve, edit, or ask for changes
+  3. Shift-tab to exit plan mode and start implementing
+  4. /checkpoint between steps so you can roll back
+  5. /verify when done to run the $100K gate
+```
+
+## Complementary
+
+- `/explore` — must run before this
+- `/checkpoint` — between implementation steps
+- `/verify` — after implementation, before declaring done
+- `/slot-machine` — if you want autonomous execution against this plan
diff --git a/.claude/commands/slot-machine.md b/.claude/commands/slot-machine.md
new file mode 100644
index 0000000..07a0c69
--- /dev/null
+++ b/.claude/commands/slot-machine.md
@@ -0,0 +1,85 @@
+---
+description: "Autonomous 30-minute attempt at a problem. Checkpoint first, let Claude run, accept or rollback. NEVER on main."
+argument-hint: "[short problem description]"
+allowed-tools: Bash(git status:*), Bash(git diff:*), Bash(git log:*), Bash(git branch:*), Read, Write, Edit, Glob, Grep, Agent
+model: opus
+---
+
+# /slot-machine — Treat Claude Like a Slot Machine
+
+> "Treat it like a slot machine. Save your state before letting Claude work, let it run for 30 minutes, then either accept the result or start fresh rather than trying to wrestle with corrections. Starting over often has a higher success rate than trying to fix Claude's mistakes."
+> — Anthropic Data Science / ML Engineering team
+
+## Preconditions (all must pass)
+
+1. **Branch check.** Current branch is NOT `main` or `master`. If it is, refuse: `❌ Cannot slot-machine on main. \`git checkout -b feat/<name>\` first.`
+2. **Clean tree.** `git status --porcelain` is empty. If not, refuse: `❌ Working tree dirty. Run /checkpoint first.`
+3. **Plan exists.** A plan in `docs/plans/<arg>.md` exists OR the user explicitly waived this with `--no-plan`. If not, refuse: `❌ No plan found at docs/plans/<arg>.md. Run /plan <arg> first or pass --no-plan.`
+
+## Procedure
+
+### Setup (1 min)
+1. Record the starting SHA: `START_SHA=$(git rev-parse HEAD)`
+2. Print:
+   ```
+   🎰 Slot machine starting.
+   Branch: <branch>
+   Plan:   docs/plans/<arg>.md
+   Start:  <START_SHA>
+   Budget: 30 minutes of autonomous work
+   Rollback: git reset --hard <START_SHA>
+   ```
+
+### Run (up to 30 min)
+3. Load the plan. Execute it step by step.
+4. After each major step:
+   - Run the verification criteria from the plan
+   - `/checkpoint <step>` 
+   - Continue to next step
+5. If a step fails twice in a row: **stop**, don't try a third fix. Surface the failure to the user with the rollback command.
+
+### Wrap (5 min)
+6. Final verification: run `/verify` against the plan's verification criteria.
+7. Print one of:
+   ```
+   ✅ Slot machine win.
+   Commits since <START_SHA>:
+     <list>
+   Verification: PASS
+   Next: review and merge, or /verify again
+   ```
+   OR
+   ```
+   ⚠️ Slot machine partial.
+   Verification: <which criteria failed>
+   Recommendation: <human takes over> OR <git reset --hard START_SHA and retry with a tighter plan>
+   ```
+
+## What NOT to do
+
+- ❌ Do NOT try to fix Claude's mistakes more than twice in a row. Anthropic teams explicitly say: starting over has a higher success rate than wrestling with corrections.
+- ❌ Do NOT modify the plan mid-run. If the plan is wrong, stop and re-plan.
+- ❌ Do NOT skip the start SHA recording. The whole point is recoverability.
+- ❌ Do NOT slot-machine on main. EVER.
+
+## When to use
+
+✅ Good fit:
+- Peripheral features that don't touch core business logic
+- Refactors with strong test coverage (regression net)
+- "Boring" cross-language translation
+- Visualization apps and dashboards
+- Merge conflict resolution
+
+❌ Bad fit:
+- Schema migrations
+- Risk gates, position limits, money paths
+- Anything where rollback isn't safe (deployments, third-party API calls, sent emails)
+- Code touching multiple bounded contexts at once
+
+## Complementary
+
+- `/checkpoint` — used inside the loop
+- `/plan` — required precondition
+- `/verify` — required postcondition
+- `/explore` — if you don't have a plan yet
diff --git a/.claude/hooks/prompt-expand.sh b/.claude/hooks/prompt-expand.sh
new file mode 100644
index 0000000..b89d711
--- /dev/null
+++ b/.claude/hooks/prompt-expand.sh
@@ -0,0 +1,53 @@
+#!/usr/bin/env bash
+# ============================================================================
+# UserPromptSubmit hook — light prompt enrichment
+# ============================================================================
+# Triggers on every user prompt. Adds context when the prompt mentions
+# specific patterns:
+#   - "the plan" → inject path to latest docs/plans/*.md
+#   - "the bug" / "this error" → remind to ground in actual logs
+#   - "deploy" / "push to prod" → inject deployment safety reminder
+#
+# Keep this cheap. Long hooks slow every prompt.
+# ============================================================================
+
+set -euo pipefail
+
+# Graceful degradation: skip hook if jq isn't installed
+command -v jq >/dev/null 2>&1 || exit 0
+
+INPUT=$(cat)
+PROMPT=$(echo "$INPUT" | jq -r '.user_prompt // .prompt // ""' 2>/dev/null)
+PROJECT_DIR="${CLAUDE_PROJECT_DIR:-.}"
+
+ADDITIONS=""
+
+# Reference to "the plan" → surface latest plan file
+if echo "$PROMPT" | grep -qiE '(the plan|the plan file|docs/plans)'; then
+  LATEST_PLAN=$(find "$PROJECT_DIR/docs/plans" -maxdepth 2 -name "*.md" -printf '%T@ %p\n' 2>/dev/null | sort -nr | head -1 | cut -d' ' -f2-)
+  if [ -n "$LATEST_PLAN" ]; then
+    ADDITIONS+="Latest plan file: ${LATEST_PLAN}\n"
+  fi
+fi
+
+# References to deployment → safety reminder
+if echo "$PROMPT" | grep -qiE '(deploy|push to prod|production|railway up|kubectl apply)'; then
+  ADDITIONS+="⚠️ Deployment-adjacent prompt. Confirm: (1) on a feature branch? (2) /checkpoint first? (3) /verify passes? (4) rollback plan named?\n"
+fi
+
+# References to "the bug" / "this error" without context → ask for grounding
+if echo "$PROMPT" | grep -qiE '(the bug|this error|this crash|not working)' && ! echo "$PROMPT" | grep -qE '\.(log|txt|md|py|js|ts)|line [0-9]'; then
+  ADDITIONS+="ℹ️ Ground bug reports in actual evidence: traceback text, log lines with timestamps, exact reproduction. Vague bug reports waste context.\n"
+fi
+
+# Nothing to add → exit silently
+if [ -z "$ADDITIONS" ]; then
+  exit 0
+fi
+
+jq -n --arg ctx "$ADDITIONS" '{
+  "hookSpecificOutput": {
+    "hookEventName": "UserPromptSubmit",
+    "additionalContext": $ctx
+  }
+}'
diff --git a/.claude/hooks/session-end.sh b/.claude/hooks/session-end.sh
new file mode 100644
index 0000000..d66959b
--- /dev/null
+++ b/.claude/hooks/session-end.sh
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+# ============================================================================
+# Stop hook — end-of-session learning capture
+# ============================================================================
+# When Claude finishes a turn (Stop event), surface a gentle nudge:
+#   "Anything worth adding to CLAUDE.md? Try /agent claude-md-curator"
+#
+# Only surfaces if:
+#   - session has made >=3 file edits (tracked via audit log line count)
+#   - the working tree is dirty (real work happened)
+#   - CLAUDE.md was NOT edited in this session (no double-prompt)
+#
+# Non-blocking — never returns a `decision: block` or `additionalContext`
+# that requires the agent to act. Just informational.
+# ============================================================================
+
+set -euo pipefail
+
+# Graceful degradation: skip hook if required tools aren't installed
+command -v jq >/dev/null 2>&1 || exit 0
+command -v git >/dev/null 2>&1 || exit 0
+
+PROJECT_DIR="${CLAUDE_PROJECT_DIR:-.}"
+cd "$PROJECT_DIR" 2>/dev/null || exit 0
+
+# Skip if not a git repo
+if ! git rev-parse --git-dir >/dev/null 2>&1; then
+  exit 0
+fi
+
+# Skip if clean tree (no real work)
+if [ -z "$(git status --porcelain 2>/dev/null | head -1)" ]; then
+  exit 0
+fi
+
+# Skip if CLAUDE.md was touched this session
+if git diff --name-only HEAD 2>/dev/null | grep -q "^CLAUDE.md$"; then
+  exit 0
+fi
+
+# Count edits this session (from audit log if available)
+AUDIT_LOG="$PROJECT_DIR/.claude/audit.log"
+EDIT_COUNT=0
+if [ -f "$AUDIT_LOG" ]; then
+  TODAY=$(date -u '+%Y-%m-%d')
+  EDIT_COUNT=$(grep "^\[${TODAY}" "$AUDIT_LOG" 2>/dev/null | grep -cE 'EDIT|WRITE' || echo 0)
+fi
+
+# Only nudge if substantial work
+if [ "$EDIT_COUNT" -lt 3 ]; then
+  exit 0
+fi
+
+NUDGE="Session wrapping up. ${EDIT_COUNT} edits this session, CLAUDE.md unchanged.
+
+If you discovered something worth codifying (a tool-call mistake to avoid,
+a hidden constraint, a verification pattern that worked), consider:
+  /agent claude-md-curator
+
+That subagent proposes a CLAUDE.md patch for your approval. Continuous
+improvement loop > re-discovering the same thing next session."
+
+jq -n --arg ctx "$NUDGE" '{
+  "hookSpecificOutput": {
+    "hookEventName": "Stop",
+    "additionalContext": $ctx
+  }
+}'
diff --git a/.claude/hooks/session-start.sh b/.claude/hooks/session-start.sh
new file mode 100644
index 0000000..4de8ce3
--- /dev/null
+++ b/.claude/hooks/session-start.sh
@@ -0,0 +1,60 @@
+#!/usr/bin/env bash
+# ============================================================================
+# SessionStart hook — load recent context as additionalContext
+# ============================================================================
+# Surfaces:
+#   - current branch + dirty/clean state
+#   - last 5 commits (subject only)
+#   - any open plan files in docs/plans/
+#   - any TODO comments touched in the last 24h
+#
+# Fires once per session. Cheap and non-blocking.
+# ============================================================================
+
+set -euo pipefail
+
+# Graceful degradation: skip hook if required tools aren't installed
+command -v jq >/dev/null 2>&1 || exit 0
+command -v git >/dev/null 2>&1 || exit 0
+
+PROJECT_DIR="${CLAUDE_PROJECT_DIR:-.}"
+cd "$PROJECT_DIR" 2>/dev/null || exit 0
+
+# Skip if not a git repo
+if ! git rev-parse --git-dir >/dev/null 2>&1; then
+  exit 0
+fi
+
+BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "(detached)")
+DIRTY=""
+if [ -n "$(git status --porcelain 2>/dev/null | head -1)" ]; then
+  DIRTY=" (dirty)"
+fi
+
+LAST_COMMITS=$(git log -5 --pretty=format:"  - %h %s" 2>/dev/null || echo "  (no commits)")
+
+# Open plan files
+PLANS=""
+if [ -d docs/plans ]; then
+  PLANS=$(find docs/plans -maxdepth 2 -name "*.md" -mtime -7 2>/dev/null | head -5 | sed 's/^/  - /')
+fi
+[ -z "$PLANS" ] && PLANS="  (no recent plans)"
+
+CONTEXT="Session context (auto-loaded):
+
+Branch: ${BRANCH}${DIRTY}
+Recent commits:
+${LAST_COMMITS}
+
+Recent plans (docs/plans/, last 7 days):
+${PLANS}
+
+Reminder: defaultMode is 'plan' — exit with shift+tab before implementing.
+For new work: /explore → /plan → /checkpoint → implement → /verify"
+
+jq -n --arg ctx "$CONTEXT" '{
+  "hookSpecificOutput": {
+    "hookEventName": "SessionStart",
+    "additionalContext": $ctx
+  }
+}'
diff --git a/.claude/hooks/statusline.sh b/.claude/hooks/statusline.sh
new file mode 100644
index 0000000..836f6d4
--- /dev/null
+++ b/.claude/hooks/statusline.sh
@@ -0,0 +1,39 @@
+#!/usr/bin/env bash
+# ============================================================================
+# Status line — branch | mode | model | dirty indicator | time
+# ============================================================================
+# Renders into Claude Code's status line. Should complete in <50ms.
+# ============================================================================
+
+set -euo pipefail
+
+INPUT=$(cat 2>/dev/null || echo '{}')
+
+# Pull model and mode from the hook input (Claude Code sends both)
+# Falls back gracefully if jq isn't installed
+if command -v jq >/dev/null 2>&1; then
+  MODEL=$(echo "$INPUT" | jq -r '.model.display_name // .model // "claude"' 2>/dev/null)
+  MODE=$(echo "$INPUT" | jq -r '.permission_mode // .mode // ""' 2>/dev/null)
+else
+  MODEL="claude"
+  MODE=""
+fi
+
+# Git info — fast
+BRANCH=""
+DIRTY=""
+if git rev-parse --git-dir >/dev/null 2>&1; then
+  BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "?")
+  if [ -n "$(git status --porcelain 2>/dev/null | head -1)" ]; then
+    DIRTY="*"
+  fi
+fi
+
+# Compose
+LINE=""
+[ -n "$BRANCH" ] && LINE+="⎇ ${BRANCH}${DIRTY}"
+[ -n "$MODE" ] && LINE+=" | ${MODE}"
+[ -n "$MODEL" ] && LINE+=" | ${MODEL}"
+LINE+=" | $(date +%H:%M)"
+
+printf '%s' "$LINE"
diff --git a/.claude/settings.json b/.claude/settings.json
index af0d845..b57b699 100644
--- a/.claude/settings.json
+++ b/.claude/settings.json
@@ -1,7 +1,13 @@
 {
+  "$schema": "https://json.schemastore.org/claude-code-settings.json",
+  "model": "claude-sonnet-4-6",
+  "defaultMode": "plan",
+  "outputStyle": "default",
+  "includeCoAuthoredBy": true,
   "permissions": {
     "allow": [
       "Bash(git *)",
+      "Bash(gh *)",
       "Bash(npm *)",
       "Bash(npx *)",
       "Bash(node *)",
@@ -10,22 +16,100 @@
       "Bash(pytest *)",
       "Bash(ruff *)",
       "Bash(pyright *)",
+      "Bash(mypy *)",
       "Bash(docker *)",
-      "Bash(gh *)",
       "Bash(railway *)",
       "Bash(alembic *)",
       "Bash(uvicorn *)",
+      "Bash(curl *)",
+      "Bash(jq *)",
+      "Bash(rg *)",
       "Read",
       "Write",
       "Edit",
       "Glob",
-      "Grep"
+      "Grep",
+      "WebFetch",
+      "WebSearch",
+      "Agent"
     ],
     "deny": [
       "Bash(rm -rf /)",
       "Bash(rm -rf ~)",
+      "Bash(rm -rf .)",
       "Bash(git push --force origin main)",
-      "Bash(git push --force origin master)"
+      "Bash(git push --force origin master)",
+      "Bash(gh repo edit * --visibility public)"
+    ]
+  },
+  "statusLine": {
+    "type": "command",
+    "command": "bash .claude/hooks/statusline.sh"
+  },
+  "hooks": {
+    "SessionStart": [
+      {
+        "matcher": "*",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "bash .claude/hooks/session-start.sh"
+          }
+        ]
+      }
+    ],
+    "UserPromptSubmit": [
+      {
+        "matcher": "*",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "bash .claude/hooks/prompt-expand.sh"
+          }
+        ]
+      }
+    ],
+    "PreToolUse": [
+      {
+        "matcher": "Bash",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "bash .claude/hooks/security-guard.sh"
+          }
+        ]
+      },
+      {
+        "matcher": "WebFetch|WebSearch",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "bash .claude/hooks/research-gate.sh"
+          }
+        ]
+      }
+    ],
+    "PostToolUse": [
+      {
+        "matcher": "Write|Edit",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "bash .claude/hooks/verification-gate.sh"
+          }
+        ]
+      }
+    ],
+    "Stop": [
+      {
+        "matcher": "*",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "bash .claude/hooks/session-end.sh"
+          }
+        ]
+      }
     ]
   },
   "mcpServers": {}
diff --git a/.claude/skills/anthropic-prompting.md b/.claude/skills/anthropic-prompting.md
new file mode 100644
index 0000000..e0943a5
--- /dev/null
+++ b/.claude/skills/anthropic-prompting.md
@@ -0,0 +1,111 @@
+---
+name: anthropic-prompting
+description: The five practices Anthropic's own engineers use when prompting Claude Code. Codified from internal interviews + public talks. Read first on any new task.
+tags: [prompting, workflow, claude-code, anthropic]
+---
+
+# Anthropic Prompting — The Five Practices
+
+Sources: Anthropic's PDF *"How Anthropic teams use Claude Code"* (May 2026), the YouTube talk *"How Anthropic Engineers ACTUALLY Prompt Claude Code"* (qOvc9IUKEIc), the Every podcast with Boris Cherny + Cat Wu. Cited consistently across all three.
+
+---
+
+## 1. Verification criteria FIRST
+
+> *"Set up Claude to verify its own work by running builds, tests, and lints automatically. This allows Claude to work longer autonomously and catch its own mistakes."* — Claude Code product team
+
+Before writing any code, write **what would prove the change is correct**: a test, a log line, a SQL row, a screenshot, an HTTP response. If you skip this, you become the only feedback loop.
+
+**Apply via:** `/plan` requires verification criteria in the template. `/verify` checks them at the end.
+
+---
+
+## 2. Explore → Plan → Code
+
+> *"Aligning on plans can like 2-3x success rates pretty easily."* — Boris Cherny
+
+Three distinct phases, three distinct mindsets. **Plan mode** is read-only. **Code mode** is write. Don't mix them.
+
+| Phase | Mode | Tools | Output |
+|---|---|---|---|
+| Explore | read-only (plan mode) | Read, Glob, Grep, Agent | findings + unknowns |
+| Plan | read-only (plan mode) | Read + Write to docs/plans/ | `docs/plans/<name>.md` |
+| Code | write (auto-accept) | full | edits + checkpoints |
+
+**Apply via:** `/explore`, `/plan`, then shift+tab out of plan mode to implement.
+
+---
+
+## 3. Checkpoint-heavy "slot machine" workflow
+
+> *"Treat it like a slot machine. Save your state before letting Claude work, let it run for 30 minutes, then either accept the result or start fresh rather than trying to wrestle with corrections."* — Data Science / ML team
+
+Frequent commits are cheaper than recovering from drift. Starting over often has a higher success rate than fixing mistakes.
+
+**Rules:**
+- Never start Claude on autonomous work from a dirty tree
+- `/checkpoint` between every meaningful step
+- If two consecutive fixes don't work, roll back, don't try a third
+- Never on `main`
+
+**Apply via:** `/checkpoint`, `/slot-machine`.
+
+---
+
+## 4. Detailed CLAUDE.md + end-of-session updates
+
+> *"The better you document your workflows, tools, and expectations in Claude.md files, the better Claude Code performs."* — Data Infrastructure team
+>
+> *"Customize your Claude.md file for specific patterns... 'run pytest not pytest run and don't cd unnecessarily — just use the right path.'"* — RL Engineering team
+
+CLAUDE.md is most useful for tool-calling discipline and hidden constraints — things Claude can't infer from reading code. Update it whenever a session teaches you something durable.
+
+**Rules:**
+- Keep CLAUDE.md under ~500 lines. Beyond that, Claude starts ignoring it (Anthropic's own docs note this).
+- Every session ends with a `claude-md-curator` pass — propose additions, removals, sharpening.
+- Tool-call discipline > prose explanations.
+
+**Apply via:** `claude-md-curator` subagent, triggered automatically by the Stop hook.
+
+---
+
+## 5. Custom slash commands + sub-agents for parallel/opposing perspectives
+
+> *"Security engineering uses 50% of all custom slash command implementations in the entire monorepo."* — Security Engineering team
+>
+> *"Pro-Dan vs auditor agents for expense validation."* — Cat Wu, on multi-agent patterns
+
+Slash commands capture team-specific workflows. Sub-agents preserve context isolation. Both compound: every command you write makes the next task faster.
+
+**Use slash commands when:** the workflow repeats, the tool list is specific, the context budget for that work is bounded.
+
+**Use sub-agents when:** the task needs many file reads (Explore), needs an independent perspective (opposing-perspective), or has its own tool/permission profile (debug-engineer = read-only).
+
+**Apply via:** `.claude/commands/` for repeatable workflows. `.claude/agents/` for parallel/scoped work. `/agent-engineer` to add new ones.
+
+---
+
+## Bonus: visual-first when designs are involved
+
+> *"Paste mockup images into Claude Code, generate fully functional prototypes that engineers can immediately understand."* — Product Design team
+
+Designers, legal, growth marketing — all teams that paste screenshots first, describe second. Visual grounding is faster than prose.
+
+**Apply via:** Cmd+V (Mac) / Ctrl+V (Windows/Linux) to paste images directly. See `skills/screenshot-driven-dev.md` for full workflow.
+
+---
+
+## Anti-patterns (consistently flagged across all sources)
+
+- ❌ One mega-prompt for a complex task → break into steps with checkpoints
+- ❌ Letting Claude "fix" its own mistake more than twice → roll back, re-plan
+- ❌ CLAUDE.md as a wishlist that grows monotonically → curate ruthlessly
+- ❌ Sub-agents copying main-context work → use them for *different* work in parallel
+- ❌ Skipping verification criteria → "looks fine" is not done
+
+## See also
+
+- `skills/explore-plan-code.md` — the workflow in skill form
+- `skills/screenshot-driven-dev.md` — visual-first for design/UI work
+- `commands/explore.md`, `commands/plan.md`, `commands/checkpoint.md`, `commands/slot-machine.md`
+- `docs/anthropic-playbook.md` — full distillation from Anthropic's PDF
diff --git a/.claude/skills/explore-plan-code.md b/.claude/skills/explore-plan-code.md
new file mode 100644
index 0000000..6441142
--- /dev/null
+++ b/.claude/skills/explore-plan-code.md
@@ -0,0 +1,80 @@
+---
+name: explore-plan-code
+description: The three-phase workflow Anthropic engineers use for any non-trivial change. 2-3x success rate per Boris Cherny. Read before starting work that touches multiple files.
+tags: [workflow, plan-mode, prompting, anthropic]
+---
+
+# Explore → Plan → Code (EPC)
+
+The single highest-leverage workflow change you can make. Boris Cherny (Claude Code lead): aligning on plans **2-3x success rates** pretty easily.
+
+## The three phases
+
+### Phase 1 — Explore (read-only)
+Goal: understand the territory. NO edits.
+
+- Survey files: `Glob` + `Grep`
+- Read 3-5 candidates fully
+- Trace call graph from a representative entry point
+- Check git history (`git log -p <file>`)
+- Read configs, schemas, env vars
+- List unknowns
+
+Output: a structured report. NOT a solution.
+
+**Command:** `/explore <topic>`
+
+### Phase 2 — Plan (still read-only)
+Goal: write a contract any engineer (or Claude) could execute.
+
+- Verification criteria FIRST (tests, logs, screenshots, DB rows)
+- Files to change (table with reason per file)
+- Step-by-step (atomic, verifiable)
+- Rollback path
+- Risks + mitigations
+- Open questions for the human
+
+Output: `docs/plans/<name>.md`. NOT code.
+
+**Command:** `/plan <name>`
+
+### Phase 3 — Code (write mode)
+Goal: execute the plan with checkpoints.
+
+- Exit plan mode: **shift+tab**
+- One step at a time
+- `/checkpoint` between steps
+- Run verification after each step
+- If two fixes in a row don't work, roll back — don't try a third
+
+Output: working code + commits, verified against criteria.
+
+**Command:** just implement, or `/slot-machine <plan-name>` for autonomous run.
+
+## When to skip a phase
+
+- **Skip Explore** when you already wrote the code yesterday and you're tweaking it. You have the context.
+- **Skip Plan** for one-file, one-function changes with obvious verification (typo fixes, formatting, single-test additions).
+- **Never skip Code's checkpoint discipline.** Even tiny changes go on a feature branch.
+
+## Anti-pattern: the merged phase
+
+The most common mistake is starting in Phase 3 ("just code it") for a Phase 1 task ("I don't fully understand the system yet"). Symptoms:
+- Edits that get reverted within the session
+- Tests that "pass" but don't actually test the right thing
+- Discovery of a constraint *after* implementing the wrong thing
+
+If you notice this happening: stop, roll back, run `/explore` from scratch.
+
+## Why it works (the Anthropic engineering view)
+
+- **Plan mode is structurally read-only** — eliminates the temptation to "just fix this one thing" mid-exploration
+- **The plan file is artifact** — surfaces tradeoffs for the human; future Claude can re-read it
+- **Verification criteria as contract** — the work is done when the criteria pass; not "looks right"
+- **Checkpoint between steps** — every step has its own rollback boundary
+
+## See also
+
+- `commands/explore.md`, `commands/plan.md`, `commands/checkpoint.md`, `commands/slot-machine.md`
+- `skills/anthropic-prompting.md` — the five practices, of which EPC is #2
+- `docs/anthropic-playbook.md` — full quotes and sources
diff --git a/.claude/skills/screenshot-driven-dev.md b/.claude/skills/screenshot-driven-dev.md
new file mode 100644
index 0000000..0cfdd8b
--- /dev/null
+++ b/.claude/skills/screenshot-driven-dev.md
@@ -0,0 +1,83 @@
+---
+name: screenshot-driven-dev
+description: Visual-first workflow for UI, design, and non-technical tasks. Paste screenshots, iterate visually, skip prose. Used heavily by Anthropic's Design + Legal + Growth Marketing teams.
+tags: [ui, design, visual, screenshots, prototyping, multimodal]
+---
+
+# Screenshot-Driven Development
+
+Designers, lawyers, marketers, and engineers all hit the same problem: describing a UI in text is slow and lossy. Pasting a screenshot is fast and lossless.
+
+> *"By pasting mockup images into Claude Code, they generate fully functional prototypes that engineers can immediately understand."* — Anthropic Product Design team
+>
+> *"They frequently use screenshots to show Claude Code what they want interfaces to look like, then iterate based on visual feedback rather than describing features in text."* — Anthropic Legal team
+
+## When to use
+
+✅ Great fit:
+- "Build me a UI that looks like this" (paste mockup)
+- "This page is broken, here's what it looks like" (paste broken state)
+- "The Figma says X, the code shows Y, which is right?" (paste both)
+- "Walk me through this dashboard menu by menu" (paste screenshot, ask for guidance)
+- Onboarding flows, design reviews, legal copy verification
+
+❌ Bad fit:
+- Anything you can express precisely in text (data structures, function signatures)
+- Internal logic — pasting a stack trace is more useful than a screenshot of one
+- Long lists of features — text is denser
+
+## How to paste
+
+| Platform | Action |
+|---|---|
+| Claude Code Terminal (Mac) | `Cmd+V` while terminal has focus |
+| Claude Code Terminal (Win/Linux) | `Ctrl+V` while terminal has focus |
+| VS Code extension | Drag image into chat, or `Cmd/Ctrl+V` |
+| Web | drag-drop or paste |
+| Mobile | image picker → Camera roll |
+
+## Workflow
+
+1. **Frame the question with the image.**
+   - Good: *"Build a settings page that looks like this. The toggle should drive `user_settings.notifications_enabled`. Match the spacing exactly."*
+   - Bad: *"Here's a screenshot, do something with it."*
+
+2. **Iterate visually.**
+   - Implement
+   - Run dev server
+   - Take a screenshot of your version
+   - Paste it back: *"Here's what I got. The spacing on the right column is off vs the mockup. Fix that."*
+
+3. **Use the `front-end-tester` agent** to verify behavioral correctness once the visual is right.
+
+## Multi-image patterns
+
+**Mockup + current state.** Paste both, ask for the delta. Most powerful when refactoring.
+
+**Light mode + dark mode.** Paste both, ask whether color tokens are consistent.
+
+**Error state + happy state.** Paste both, ask whether edge cases are handled.
+
+**Before + after.** Paste both, ask for a one-paragraph release-note summary.
+
+## Anti-patterns
+
+- ❌ Pasting a screenshot of code — paste the code as text, it's denser
+- ❌ Pasting a screenshot of an error message — paste the text, it's searchable
+- ❌ Pasting low-resolution screenshots — Claude reads them, low-res = low signal
+- ❌ Pasting a screenshot without saying what you want done with it
+
+## For non-technical users
+
+Anthropic's Legal team built a working accessibility app in one hour by pasting screenshots of what they wanted, iterating visually, and asking Claude to "slow down and work step-by-step." If you're new to coding, that's the playbook:
+
+1. Paste mockup
+2. "Build this. Slow. One small step at a time."
+3. Run it. Screenshot the result. Paste it.
+4. "Now make X look like Y." Repeat.
+
+## See also
+
+- `agents/front-end-tester.md` — verifies behavior once visuals are right
+- `skills/anthropic-prompting.md` — practice #5 is the visual-first variant
+- `docs/anthropic-playbook.md` — Design + Legal team case studies
diff --git a/.mcp.json b/.mcp.json
new file mode 100644
index 0000000..e10129e
--- /dev/null
+++ b/.mcp.json
@@ -0,0 +1,43 @@
+{
+  "$schema": "https://json.schemastore.org/mcp.json",
+  "_comment_": "Project-scoped MCP servers for DevKit projects. Uncomment what you need and run `claude mcp list` to verify. Anthropic teams use Puppeteer/Playwright + Sentry + GitHub most often; data teams prefer MCP over CLI for sensitive resources (see docs/anthropic-playbook.md).",
+  "mcpServers": {
+    "_playwright_": {
+      "_doc_": "Browser automation. Common in frontend + design teams for visual diffs and golden-path testing.",
+      "command": "npx",
+      "args": ["-y", "@modelcontextprotocol/server-playwright"],
+      "_enabled_": false
+    },
+    "_github_": {
+      "_doc_": "Read issues, PRs, comments. Pair with the pr-comment-fixer subagent.",
+      "command": "npx",
+      "args": ["-y", "@modelcontextprotocol/server-github"],
+      "env": {
+        "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
+      },
+      "_enabled_": false
+    },
+    "_sentry_": {
+      "_doc_": "Recent errors, issue trends. Useful for /morning-brief and debug-engineer.",
+      "command": "npx",
+      "args": ["-y", "@modelcontextprotocol/server-sentry"],
+      "env": {
+        "SENTRY_AUTH_TOKEN": "${SENTRY_TOKEN}",
+        "SENTRY_ORG": "${SENTRY_ORG}"
+      },
+      "_enabled_": false
+    },
+    "_postgres_": {
+      "_doc_": "Read-only Postgres MCP. Anthropic Data Infrastructure team recommends MCP > CLI for sensitive data — gives you logging + scoped access.",
+      "command": "npx",
+      "args": ["-y", "@modelcontextprotocol/server-postgres", "${DATABASE_URL_READONLY}"],
+      "_enabled_": false
+    },
+    "_filesystem_": {
+      "_doc_": "Sandboxed filesystem access. Use when Claude needs to read outside the project root (e.g. design docs in ~/Documents).",
+      "command": "npx",
+      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed/dir"],
+      "_enabled_": false
+    }
+  }
+}
diff --git a/CLAUDE.md b/CLAUDE.md
index d8d933e..a50c932 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -53,13 +53,19 @@ devkit/
 ├── CLAUDE.md                ← YOU ARE HERE — read /init instructions above
 ├── devkit.config.yaml       ← Generated by /init (project-specific config)
 ├── .claude/
-│   ├── commands/
+│   ├── commands/                ← 14 slash commands
 │   │   ├── init.md              /init — THE ENTRY POINT (run this first)
+│   │   ├── explore.md           /explore — read-only investigation (E in EPC)
+│   │   ├── plan.md              /plan — writes docs/plans/<name>.md (P in EPC)
+│   │   ├── checkpoint.md        /checkpoint — cheap WIP commit on feature branch
+│   │   ├── slot-machine.md      /slot-machine — autonomous 30-min run + rollback
+│   │   ├── dogfood.md           /dogfood — log model behavior on new snapshots
+│   │   ├── compact-with-context.md  /compact-with-context — guided /compact
+│   │   ├── verify.md            /verify ($100K gate verification)
 │   │   ├── agent-engineer.md    /agent-engineer
 │   │   ├── security-audit.md    /security-audit
 │   │   ├── skill-engineer.md    /skill-engineer
 │   │   ├── test-engineer.md     /test-engineer
-│   │   ├── verify.md            /verify ($100K gate verification)
 │   │   ├── deploy-live.md       /deploy-live
 │   │   └── morning-brief.md     /morning-brief
 │   ├── skills/              ← 35+ skills (activated per devkit.config.yaml)
@@ -82,14 +88,19 @@ devkit/
 │   │   ├── browser-bridge-scraping.md
 │   │   ├── neural-scheduler-agents.md
 │   │   └── knowledge-graph-neo4j.md
-│   ├── hooks/               ← Security + verification hooks
-│   │   ├── security-guard.sh    Block destructive commands
-│   │   ├── verification-gate.sh Enforce verification before completion
-│   │   ├── research-gate.sh     Gate research actions
+│   ├── hooks/               ← Event hooks (wired in settings.json)
+│   │   ├── session-start.sh     SessionStart — load branch + commits + plans
+│   │   ├── prompt-expand.sh     UserPromptSubmit — light enrichment
+│   │   ├── security-guard.sh    PreToolUse(Bash) — block destructive commands
+│   │   ├── research-gate.sh     PreToolUse(Web*) — gate research actions
+│   │   ├── verification-gate.sh PostToolUse(Write|Edit) — $100K critical-path guard
+│   │   ├── session-end.sh       Stop — nudge claude-md-curator after work
+│   │   ├── statusline.sh        Renders the custom status line
 │   │   └── git_smart_push.py    Smart git push with safety
 │   ├── rules/               ← Architectural constraints (11 rules)
-│   ├── agents/              ← Named agent definitions (4 agents)
-│   └── settings.json        ← MCP permissions and tool allowlists
+│   ├── agents/              ← 8 subagents (debug/data/frontend/infra + 4 new)
+│   └── settings.json        ← Model, defaultMode, hooks, permissions, statusLine
+├── .mcp.json                ← Project-scoped MCP servers (opt-in)
 ├── .github/workflows/       ← CI/CD templates (configured by /init)
 ├── packages/
 │   └── arch-viewer/         ← git submodule → github.com/axumquant/arch-viewer
@@ -108,14 +119,59 @@ devkit/
 └── AGENTS.md                ← Workspace guide with token protocol
 ```
 
+## How We Use This Kit (Anthropic-aligned workflow)
+
+DevKit encodes the practices Anthropic's own teams use to drive Claude Code.
+The full distillation is in [`docs/anthropic-playbook.md`](docs/anthropic-playbook.md).
+The TL;DR is the **Explore → Plan → Code** loop with checkpoints:
+
+```
+/explore <topic>          # read-only investigation (plan mode)
+   ↓
+/plan <name>              # writes docs/plans/<name>.md with verification criteria
+   ↓
+shift+tab                 # exit plan mode
+   ↓
+implement (one step at a time)
+   ↓
+/checkpoint <progress>    # cheap commit on a feature branch
+   ↓ (between every meaningful step)
+/verify                   # $100K gate — does the work meet the plan's criteria?
+```
+
+For autonomous runs: `/slot-machine <plan-name>` runs the plan with 30-min budget + auto-rollback.
+
+| Command | When |
+|---|---|
+| `/explore` | Unfamiliar code or new task |
+| `/plan` | Before any change touching > 1 file |
+| `/checkpoint` | Between every step on a feature branch |
+| `/slot-machine` | Autonomous run against a saved plan |
+| `/verify` | Before declaring done |
+| `/dogfood` | Trying a new model snapshot |
+| `/compact-with-context` | At ~70% context |
+
+| Subagent | When |
+|---|---|
+| `Explore` (built-in) | Breadth searches that would bloat main context |
+| `opposing-perspective` | Contested decisions — produces 2 independent briefs |
+| `front-end-tester` | UI changes — visual diff + golden-path |
+| `claude-md-curator` | End of session — propose CLAUDE.md updates |
+| `pr-comment-fixer` | Iterating on PR review feedback |
+
+The **Stop hook** nudges you toward `claude-md-curator` after substantive sessions.
+The **SessionStart hook** injects branch + recent commits + open plans automatically.
+
 ## Key Rules (always enforced)
 
 - **Run `/init` first** — this is non-negotiable for new projects
+- **`defaultMode` is `plan`** — every session opens read-only. Shift+tab to implement.
 - All repos are **PRIVATE** by default (rules/never-public-repos.md)
 - Health checks are **mandatory** on all services (rules/health-check-mandate.md)
 - Security hooks block destructive commands with audit logging
 - Every project gets the **architecture viewer** installed — it is NOT optional
 - Secrets go in Vaultwarden or env vars, **never** in committed files
+- **Never slot-machine on `main`** — feature branch + checkpoint discipline always
 
 ## Template Versions
 
diff --git a/docs/anthropic-playbook.md b/docs/anthropic-playbook.md
new file mode 100644
index 0000000..a625150
--- /dev/null
+++ b/docs/anthropic-playbook.md
@@ -0,0 +1,174 @@
+# Anthropic Playbook — How Their Own Teams Use Claude Code
+
+Distilled from:
+- **Anthropic PDF** *"How Anthropic teams use Claude Code"* (May 2026) — 10 internal teams' workflows
+- **YouTube** *"How Anthropic Engineers ACTUALLY Prompt Claude Code"* — [qOvc9IUKEIc](https://www.youtube.com/watch?v=qOvc9IUKEIc)
+- **Every podcast** *"How to Use Claude Code Like the People Who Built It"* — Boris Cherny + Cat Wu — [transcript](https://every.to/podcast/transcript-how-to-use-claude-code-like-the-people-who-built-it)
+- **Official docs** — [code.claude.com/docs](https://code.claude.com/docs/en/overview)
+
+This doc is intentionally **quotation-heavy**. If you want a clean opinion piece, read `skills/anthropic-prompting.md`. If you want to see what each team actually said, read on.
+
+---
+
+## The five practices everyone shares
+
+1. **Detailed CLAUDE.md + end-of-session updates** — *"The better you document your workflows, tools, and expectations in Claude.md files, the better Claude Code performs."* (Data Infra)
+2. **Explore → Plan → Code** — *"Aligning on plans can like 2-3x success rates pretty easily."* (Boris Cherny)
+3. **Checkpoint-heavy "slot machine" workflow** — *"Treat it like a slot machine. Save your state before letting Claude work."* (Data Science)
+4. **Verification criteria up front, self-sufficient loops** — *"Set up Claude to verify its own work by running builds, tests, and lints automatically."* (Claude Code team)
+5. **Custom slash commands + sub-agents** — *"Security engineering uses 50% of all custom slash command implementations in the entire monorepo."* (Security)
+
+---
+
+## Team-by-team workflow notes
+
+### Data Infrastructure
+Use cases:
+- Kubernetes debugging by pasting Google Cloud screenshots
+- Plain-text workflow files for the finance team (no coding required)
+- Codebase navigation for new hires
+- End-of-session documentation updates
+- Multiple Claude instances in different repos, each with full context
+
+Top tips:
+- Write detailed CLAUDE.md files
+- Use MCP servers (not CLI) for sensitive data — better security control + logging
+- Hold internal sessions where members demo their workflows to each other
+
+### Product Development (Claude Code team itself)
+Use cases:
+- Fast prototyping with auto-accept mode (shift+tab) + autonomous loops
+- Synchronous coding for core business logic
+- Building Vim mode — 70% of code was Claude's autonomous work
+- Test generation, bug fixes, GitHub Actions PR-comment automation
+- Codebase exploration over Slack-asking
+
+Top tips:
+- Create self-sufficient verification loops (build, test, lint after every change)
+- Develop **task classification intuition** — async-suitable vs sync-supervision-required
+- Form clear, detailed prompts. Better prompts → more trust → longer autonomous runs
+
+### Security Engineering
+Use cases:
+- Complex infrastructure debugging (stack traces + control flow tracing)
+- Terraform plan review — *"what's this going to do? Am I going to regret this?"*
+- Documentation synthesis → markdown runbooks
+- Test-driven development workflow (replacing "design doc → janky code → give up on tests")
+- Project onboarding via markdown specs stored in the codebase
+
+Top tips:
+- Use custom slash commands extensively (50% of all in the monorepo)
+- Let Claude talk first — *"commit your work as you go"*, autonomous + periodic check-ins
+- Use for documentation synthesis, not just code
+
+### Inference
+Use cases:
+- Codebase comprehension and onboarding
+- Unit test generation with edge case coverage
+- ML concept explanation (80% research time reduction)
+- Cross-language code translation (e.g. write Rust without learning Rust)
+- Kubernetes command recall
+
+Top tips:
+- Test knowledge-base functionality first vs Google
+- Start with code generation, verify correctness, build trust
+- Use it for test writing — relieves significant pressure
+
+### Data Science / ML Engineering
+Use cases:
+- Building JavaScript/TypeScript dashboard apps (5,000-line apps without knowing JS/TS)
+- Repetitive refactoring as a "slot machine" — commit, 30-min autonomous, accept or restart
+- Persistent React analytics dashboards vs throwaway Jupyter notebooks
+- Zero-dependency task delegation in unfamiliar codebases
+
+Top tips:
+- **Treat it like a slot machine.** Save state, run 30 min, accept or restart. Restarting often beats wrestling with corrections.
+- Interrupt for simplicity when needed: *"Why are you doing this? Try something simpler."* Model defaults to complex.
+
+### API Knowledge
+Use cases:
+- "First-stop" workflow planning — *"which files do I examine for this?"*
+- Independent debugging in unfamiliar codebases
+- Model iteration testing through dogfooding (latest research snapshots)
+- Eliminating Claude.ai → Claude Code context-switching overhead
+
+Top tips:
+- Iterative partner, not one-shot solution
+- Use it to build confidence in unfamiliar areas
+- Start with minimal information, let Claude guide
+
+### Growth Marketing (non-technical team of one)
+Use cases:
+- Agentic Google Ads creative generation (CSV in, hundreds of variations out)
+- Figma plugin for mass creative production (100 ad variations in 0.5 sec per batch)
+- Meta Ads MCP server for campaign analytics in Claude Desktop
+- Prompt-engineering with memory systems (logs hypotheses + results)
+
+Top tips:
+- Identify API-enabled repetitive tasks → automation candidates
+- **Break complex workflows into specialized sub-agents** (headline agent + description agent separately)
+- Brainstorm and plan in Claude.ai first, then move to Claude Code for execution
+
+### Product Design
+Use cases:
+- Frontend polish + **state management changes** ("things you typically wouldn't see a designer making")
+- GitHub Actions ticketing → Claude proposes code solutions automatically
+- Rapid interactive prototyping from pasted mockups
+- Edge case discovery during design (not after)
+- Complex copy changes coordinated with legal in real-time (week → 2× 30-min calls)
+
+Top tips:
+- Get proper setup help from engineers — initial onboarding is the hard part
+- Custom memory files tell Claude you're a designer who needs detailed explanations
+- Use Cmd+V to paste screenshots liberally
+
+### RL Engineering
+Use cases:
+- Feature development with supervised autonomy
+- Test generation + code review on changes you wrote
+- Debugging with mixed results
+- Quick call-stack summaries (replacing manual code reading)
+- Kubernetes operations guidance
+
+Top tips:
+- Customize CLAUDE.md for tool-call patterns ("run pytest not pytest run; don't cd unnecessarily")
+- **Checkpoint-heavy workflow** — commit constantly so experiments are reversible
+- One-shot first (~1/3 success rate), then collaborate if needed
+
+### Legal
+Use cases:
+- Custom accessibility tools (predictive-text app for family in 1 hour)
+- Legal department "phone tree" prototype
+- G Suite weekly-update automation
+- Rapid prototyping for solution validation with domain experts
+
+Top tips:
+- **Plan extensively in Claude.ai first**, then move to Claude Code with a summary prompt
+- Work incrementally and visually — use screenshots liberally
+- Share imperfect prototypes — they spark cross-department innovation
+
+---
+
+## How DevKit encodes these practices
+
+| Practice | Where it lives in DevKit |
+|---|---|
+| Detailed CLAUDE.md + updates | `CLAUDE.md` + `agents/claude-md-curator.md` + Stop hook |
+| Explore → Plan → Code | `commands/explore.md`, `commands/plan.md`, `skills/explore-plan-code.md` |
+| Slot-machine workflow | `commands/slot-machine.md`, `commands/checkpoint.md` |
+| Verification criteria first | `commands/verify.md`, plan template requires criteria |
+| Custom slash commands | 14 commands in `.claude/commands/` |
+| Sub-agents for opposing perspectives | `agents/opposing-perspective.md` |
+| Visual-first workflow | `skills/screenshot-driven-dev.md` |
+| MCP for sensitive data | `.mcp.json` template with Postgres/GitHub/Sentry |
+| Test discipline | `agents/front-end-tester.md`, `commands/test-engineer.md` |
+| PR-comment automation | `agents/pr-comment-fixer.md` |
+| Dogfooding new model snapshots | `commands/dogfood.md` |
+
+## Sources
+
+- [How Anthropic teams use Claude Code (PDF, May 2026)](https://www-cdn.anthropic.com/58284b19e702b49db9302d5b6f135ad8871e7658.pdf)
+- [How Anthropic Engineers ACTUALLY Prompt Claude Code (YouTube)](https://www.youtube.com/watch?v=qOvc9IUKEIc)
+- [How to Use Claude Code Like the People Who Built It (Every transcript)](https://every.to/podcast/transcript-how-to-use-claude-code-like-the-people-who-built-it)
+- [Claude Code official docs](https://code.claude.com/docs/en/overview)
+- [Best practices for agentic coding](https://www.anthropic.com/engineering/claude-code-best-practices)
diff --git a/docs/claude-code-2026-reference.md b/docs/claude-code-2026-reference.md
new file mode 100644
index 0000000..e03c27c
--- /dev/null
+++ b/docs/claude-code-2026-reference.md
@@ -0,0 +1,263 @@
+# Claude Code — 2026 Feature Reference (DevKit edition)
+
+Quick reference for the Claude Code features DevKit configures and uses. Pulled from the official docs at [code.claude.com](https://code.claude.com/docs/en/overview) as of May 2026.
+
+---
+
+## CLAUDE.md / Project Memory
+
+**What:** Persistent instructions Claude reads at start of every session.
+
+**Locations** (loaded in order, system → project):
+- `~/.claude/CLAUDE.md` — user-wide
+- `./CLAUDE.md` — project, in git
+- `./CLAUDE.local.md` — personal, in `.gitignore`
+- Parent `CLAUDE.md` files through repo root (monorepo support)
+
+**Best practice:** Keep under ~500 lines. Use for tool-call discipline + hidden constraints. Update via `claude-md-curator` subagent.
+
+**Docs:** [code.claude.com/docs/en/memory](https://code.claude.com/docs/en/memory)
+
+---
+
+## settings.json
+
+**Location:** `.claude/settings.json` (project) or `~/.claude/settings.json` (user).
+
+**Schema essentials** (what DevKit's settings.json sets):
+
+| Key | DevKit value | Meaning |
+|---|---|---|
+| `model` | `claude-sonnet-4-6` | Default model for this project |
+| `defaultMode` | `plan` | Start sessions in plan mode |
+| `outputStyle` | `default` | Stock output styling |
+| `includeCoAuthoredBy` | `true` | Adds Co-Authored-By to commits |
+| `permissions.allow` | curated list | Pre-approved bash + tool patterns |
+| `permissions.deny` | safety list | Blocked patterns (rm -rf /, force push to main) |
+| `statusLine.command` | `.claude/hooks/statusline.sh` | Custom status line |
+| `hooks` | event → command map | Wires the 8 hook scripts |
+| `mcpServers` | empty by default | Filled from `.mcp.json` |
+
+**Docs:** [code.claude.com/docs/en/settings](https://code.claude.com/docs/en/settings)
+
+---
+
+## Plan Mode
+
+**What:** Permission mode where Claude can read + plan but NOT execute writes/bash that mutates state.
+
+**Invoke:**
+- `defaultMode: "plan"` in settings.json (DevKit default)
+- `shift+tab` to toggle in-session
+
+**DevKit usage:**
+- `/explore` — read-only investigation
+- `/plan` — write the plan to `docs/plans/`
+- Exit plan mode (shift+tab) → implement
+- `/checkpoint` between implementation steps
+- `/verify` when done
+
+**Docs:** [code.claude.com/docs/en/permission-modes](https://code.claude.com/docs/en/permission-modes)
+
+---
+
+## Hooks
+
+**Events DevKit wires:**
+
+| Event | DevKit script | Purpose |
+|---|---|---|
+| `SessionStart` | `session-start.sh` | Inject git branch + recent commits + recent plans |
+| `UserPromptSubmit` | `prompt-expand.sh` | Light enrichment: latest plan path, deploy warnings |
+| `PreToolUse` (Bash) | `security-guard.sh` | Block `rm -rf /`, force-push to main, visibility-public, DROP TABLE |
+| `PreToolUse` (Web*) | `research-gate.sh` | Gate research actions |
+| `PostToolUse` (Write/Edit) | `verification-gate.sh` | Inject $100K guard context for critical paths |
+| `Stop` | `session-end.sh` | Nudge `claude-md-curator` after substantive sessions |
+
+**Hook contract:** read JSON from stdin, write JSON to stdout. Exit code 2 = block. Exit 0 + `{hookSpecificOutput: {additionalContext: "..."}}` = inject context.
+
+**Docs:** [code.claude.com/docs/en/hooks](https://code.claude.com/docs/en/hooks)
+
+---
+
+## Slash Commands
+
+**Location:** `.claude/commands/<name>.md`. Invoke as `/<name>`.
+
+**Frontmatter DevKit uses:**
+```yaml
+---
+description: "One-liner — Claude uses this to decide when to invoke"
+argument-hint: "[what to type after the command]"
+allowed-tools: Read, Glob, Grep, Bash(git:*)  # whitelist for this command
+model: sonnet                                  # override default model
+---
+```
+
+**`$ARGUMENTS`** in the body is substituted with whatever the user typed after the command name.
+
+**DevKit commands (14 total):**
+
+Universal:
+- `/init` — project initialization wizard
+- `/explore`, `/plan`, `/checkpoint`, `/slot-machine`, `/dogfood`, `/compact-with-context`
+- `/verify` — $100K gate
+- `/morning-brief` — system health
+- `/security-audit`
+
+Domain:
+- `/test-engineer`, `/skill-engineer`, `/agent-engineer`
+- `/deploy-live`
+
+**Docs:** [code.claude.com/docs/en/slash-commands](https://code.claude.com/docs/en/slash-commands)
+
+---
+
+## Subagents
+
+**Location:** `.claude/agents/<name>.md`. Invoke via the `Agent` tool with `subagent_type: <name>`.
+
+**Frontmatter DevKit uses:**
+```yaml
+---
+name: <agent-name>
+description: "When to use, what triggers it"
+tools: Read, Glob, Grep, Bash, Agent          # scoped tool list
+model: sonnet                                  # per-agent model
+effort: high                                   # opus = high, sonnet = medium, haiku = low
+memory: project                                # which CLAUDE.md tier loads
+disallowedTools: Edit, Write                   # explicit excludes
+---
+```
+
+**DevKit subagents (8 total):**
+
+Original (4):
+- `data-engineer`, `frontend-engineer`, `infra-engineer`, `debug-engineer`
+
+Added (4):
+- `front-end-tester` — visual diff + Playwright
+- `opposing-perspective` — independent pro/con briefs
+- `claude-md-curator` — end-of-session learnings
+- `pr-comment-fixer` — review-comment iteration
+
+**Docs:** [code.claude.com/docs/en/subagents](https://code.claude.com/docs/en/subagents)
+
+---
+
+## Skills
+
+**Location:** `.claude/skills/<name>.md` (single-file) OR `.claude/skills/<name>/SKILL.md` (with supporting files).
+
+**Frontmatter:**
+```yaml
+---
+name: skill-name
+description: One-liner. Claude uses this to decide when to load the skill.
+tags: [topic, topic, topic]
+---
+```
+
+**Lifecycle:** Loads on first invocation, stays in context for rest of session. Keep under ~500 lines per skill.
+
+**DevKit skills (14 total):**
+
+Original (11):
+- `arch-viewer`, `browser-bridge-scraping`, `chrome-extension-mv3`, `cloud-stt-pipeline`, `ddd-domain-events`, `knowledge-graph-neo4j`, `multi-provider-llm-gateway`, `neural-scheduler-agents`, `pydantic-settings-config`, `realtime-websocket-coaching`, `supabase-multitenant-auth`
+
+Added (3):
+- `anthropic-prompting` — the five practices
+- `explore-plan-code` — the EPC workflow
+- `screenshot-driven-dev` — visual-first workflow
+
+**Docs:** [code.claude.com/docs/en/skills](https://code.claude.com/docs/en/skills)
+
+---
+
+## MCP (Model Context Protocol)
+
+**Location options:**
+- `.claude/settings.json` → `mcpServers` block
+- `.mcp.json` at project root (DevKit's template lives here)
+- `~/.claude/.mcp.json` user-scope
+- `claude mcp add` CLI
+
+**DevKit's `.mcp.json`** ships disabled-by-default examples for:
+- Playwright (browser automation)
+- GitHub (PR/issue access)
+- Sentry (error monitoring)
+- Postgres (read-only, recommended over CLI for sensitive data — see playbook)
+- Filesystem (sandboxed access outside project root)
+
+**Anthropic team tip:** *Use MCP servers instead of CLI for sensitive data* — better security control + logging.
+
+**Docs:** [code.claude.com/docs/en/mcp](https://code.claude.com/docs/en/mcp)
+
+---
+
+## Status Line
+
+**What:** Custom display at the bottom of the terminal.
+
+**DevKit shows:** `⎇ branch* | mode | model | HH:MM`
+
+**Settings:**
+```json
+"statusLine": {
+  "type": "command",
+  "command": "bash .claude/hooks/statusline.sh"
+}
+```
+
+The hook receives a JSON blob on stdin with `model.display_name` and `permission_mode`; prints a single line to stdout.
+
+**Docs:** [code.claude.com/docs/en/statusline](https://code.claude.com/docs/en/statusline)
+
+---
+
+## Background Tasks
+
+**`/background`** — detach current session, keep working in background.
+**`/tasks`** — list background work in current session.
+**`/agent-view`** — monitor all background agents globally.
+
+**Run-in-background semantics:** Bash tool accepts `run_in_background: true` to avoid blocking the conversation.
+
+**Use for:** long migrations, parallel worktrees, fan-out batch operations.
+
+**Docs:** [code.claude.com/docs/en/agent-view](https://code.claude.com/docs/en/agent-view)
+
+---
+
+## Core conversation commands
+
+| Command | Purpose |
+|---|---|
+| `/resume` | Pick a saved session to continue |
+| `/clear` | Reset context entirely (between unrelated tasks) |
+| `/compact` | Summarize conversation to free context |
+| `/compact-with-context` (DevKit) | Guided `/compact` that preserves the right things |
+| `/rewind` | Restore to a checkpoint (conversation, code, or both) |
+| `/rename` | Label a session for easy resuming |
+
+---
+
+## Quick start checklist for new DevKit projects
+
+After `/init`, verify:
+
+- [ ] `.claude/settings.json` exists and has `hooks` block populated
+- [ ] `bash .claude/hooks/statusline.sh` exits 0 and prints a line
+- [ ] `bash .claude/hooks/session-start.sh` produces valid JSON
+- [ ] `CLAUDE.md` is < 500 lines and describes your specific stack
+- [ ] `docs/plans/` directory exists (for `/plan` output)
+- [ ] `.mcp.json` is configured for whatever MCP servers you actually need (uncomment + add `_enabled_: true`, then `claude mcp list` to verify)
+
+---
+
+## See also
+
+- [`docs/anthropic-playbook.md`](anthropic-playbook.md) — engineering practices from Anthropic teams
+- [`.claude/skills/anthropic-prompting.md`](../.claude/skills/anthropic-prompting.md) — the five practices, codified
+- [`.claude/skills/explore-plan-code.md`](../.claude/skills/explore-plan-code.md) — the EPC workflow
+- Official: [code.claude.com/docs](https://code.claude.com/docs/en/overview)

From 9f38f7a802e94520efd6ac2001399acd6f7de4d2 Mon Sep 17 00:00:00 2001
From: Norman Beckford <norman@axumquant.com>
Date: Sat, 16 May 2026 19:22:34 -0400
Subject: [PATCH 2/5] docs: explicit jq install instructions for all platforms
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The hooks degrade gracefully when jq is missing, which means they exit
silently and you lose the security guard, verification gate, session-start
context injection, and Stop-hook learnings nudge. That's a confusing
failure mode — hooks "work" but do nothing.

Add install instructions in three places so this can't be missed:

- CLAUDE.md "Prerequisites" — install table for macOS, apt, dnf, Alpine,
  Arch, winget, Choco, Scoop, manual, Docker
- .claude/commands/init.md — Step 0 actively checks `jq --version`
  before any questionnaire; refuses to proceed if missing
- docs/claude-code-2026-reference.md — quickstart checklist now has
  "Before /init" prerequisites section
- session-start.sh — comment points readers at the CLAUDE.md table

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .claude/commands/init.md           | 31 ++++++++++++++++++++++++++++++
 .claude/hooks/session-start.sh     |  4 +++-
 CLAUDE.md                          | 29 ++++++++++++++++++++++++++++
 docs/claude-code-2026-reference.md |  8 +++++++-
 4 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/.claude/commands/init.md b/.claude/commands/init.md
index fa93251..ff24240 100644
--- a/.claude/commands/init.md
+++ b/.claude/commands/init.md
@@ -11,6 +11,37 @@ You are the DevKit initialization agent. Your job is to interview the user with
 
 ---
 
+## Step 0 — Prerequisites check (run before any questions)
+
+Before asking anything, verify the user has `jq` installed. All `.claude/hooks/`
+scripts need it; without `jq` they exit silently and the security guard +
+verification gate + session-start context loader don't fire.
+
+Run: `command -v jq && jq --version`
+
+If `jq` is NOT installed, **STOP the questionnaire** and tell the user:
+
+> ⚠️ `jq` is not installed. DevKit hooks require it to parse JSON from Claude Code.
+>
+> Install it for your platform:
+>
+> | macOS | `brew install jq` |
+> | Ubuntu / Debian | `sudo apt-get install -y jq` |
+> | Fedora / RHEL | `sudo dnf install -y jq` |
+> | Windows (winget) | `winget install jqlang.jq` |
+> | Windows (Choco) | `choco install jq` |
+> | Windows (Scoop) | `scoop install jq` |
+>
+> After installing, run `/init` again. See CLAUDE.md "Prerequisites" for full
+> instructions and verification.
+
+Also verify `git --version` and `bash --version`. Same deal — stop and explain
+if either is missing.
+
+Only proceed to Step 1 once all three are present.
+
+---
+
 ## Step 1: Project Identity
 
 Ask the user:
diff --git a/.claude/hooks/session-start.sh b/.claude/hooks/session-start.sh
index 4de8ce3..3355003 100644
--- a/.claude/hooks/session-start.sh
+++ b/.claude/hooks/session-start.sh
@@ -13,7 +13,9 @@
 
 set -euo pipefail
 
-# Graceful degradation: skip hook if required tools aren't installed
+# Graceful degradation: skip hook if required tools aren't installed.
+# `jq` is required — install per CLAUDE.md "Prerequisites" section.
+# Without it the hook exits silently and no context is injected.
 command -v jq >/dev/null 2>&1 || exit 0
 command -v git >/dev/null 2>&1 || exit 0
 
diff --git a/CLAUDE.md b/CLAUDE.md
index a50c932..0c19c2b 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -8,6 +8,35 @@
 
 ---
 
+## Prerequisites — install `jq` before `/init`
+
+All hooks in `.claude/hooks/` parse JSON from Claude Code via `jq`. They
+degrade silently when `jq` is missing, which means **your hooks won't fire**
+and you'll lose the security guard, the verification gate, the session-start
+context injector, and the end-of-session learnings nudge.
+
+Install once per machine:
+
+| Platform | Command |
+|---|---|
+| **macOS** | `brew install jq` |
+| **Ubuntu / Debian** | `sudo apt-get update && sudo apt-get install -y jq` |
+| **Fedora / RHEL** | `sudo dnf install -y jq` |
+| **Alpine** | `apk add --no-cache jq` |
+| **Arch** | `sudo pacman -S jq` |
+| **Windows (winget)** | `winget install jqlang.jq` |
+| **Windows (Chocolatey)** | `choco install jq` |
+| **Windows (Scoop)** | `scoop install jq` |
+| **Windows (manual)** | Download from https://jqlang.github.io/jq/download/ and add to PATH |
+| **Docker / CI** | Add to Dockerfile: `RUN apt-get update && apt-get install -y jq` |
+
+Verify: `jq --version` should print `jq-1.7` or newer.
+
+> Also required: `git` (everywhere) and `bash` (Windows users: Git for Windows
+> ships bash; verify with `bash --version`).
+
+---
+
 ## What This Is
 
 A plug-and-play foundation for building any product — B2B SaaS, consumer app,
diff --git a/docs/claude-code-2026-reference.md b/docs/claude-code-2026-reference.md
index e03c27c..cf5a28b 100644
--- a/docs/claude-code-2026-reference.md
+++ b/docs/claude-code-2026-reference.md
@@ -244,7 +244,13 @@ The hook receives a JSON blob on stdin with `model.display_name` and `permission
 
 ## Quick start checklist for new DevKit projects
 
-After `/init`, verify:
+**Before `/init`** (machine-level, once):
+
+- [ ] `jq --version` works — required by every hook. Install per the table in CLAUDE.md "Prerequisites" if missing.
+- [ ] `git --version` works
+- [ ] `bash --version` works (Windows: install [Git for Windows](https://git-scm.com/downloads/win))
+
+**After `/init`** (project-level):
 
 - [ ] `.claude/settings.json` exists and has `hooks` block populated
 - [ ] `bash .claude/hooks/statusline.sh` exits 0 and prints a line

From e6dce9cdcbaf17fbd7ad2e34b21f30ecd3343a6e Mon Sep 17 00:00:00 2001
From: Norman Beckford <norman@axumquant.com>
Date: Sat, 16 May 2026 19:27:16 -0400
Subject: [PATCH 3/5] fix: address Codex review comments + unblock CI on
 template repo
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Addresses 3 review comments from chatgpt-codex-connector on PR #1 plus
the long-standing CI failure (the workflow tested directories that don't
exist until /init runs).

P1 — .claude/hooks/prompt-expand.sh:27
  The "the plan" branch ran `find docs/plans/` unconditionally. On a
  fresh repo the directory doesn't exist, find returned non-zero, and
  pipefail killed the hook. Codex reproduced this with
  {"user_prompt":"check the plan"} → exit 1.
  Fix: guard with `[ -d docs/plans ]` before find. Also replace the
  GNU-only -printf with a portable stat-based mtime sort so the hook
  works on macOS BSD find too.

P2 — .claude/hooks/session-end.sh:46
  The EDIT_COUNT greps `.claude/audit.log` for EDIT|WRITE, but
  security-guard.sh only writes `BASH:` entries. EDIT_COUNT was always 0,
  so the >=3-edits threshold never triggered and the claude-md-curator
  nudge never fired.
  Fix: count from `git status --porcelain | wc -l` instead. That's the
  actual file-mutation count we care about and is independent of the
  audit log format.

P2 — .claude/settings.json:83
  PreToolUse matcher was "WebFetch|WebSearch", so research-gate.sh's
  mcp__firecrawl__* branches were unreachable. Firecrawl calls bypassed
  the gate entirely.
  Fix: extend matcher to include firecrawl_scrape, firecrawl_search,
  firecrawl_crawl — the three branches research-gate.sh actually handles.

CI — .github/workflows/ci.yml
  Failing on every PR (and main) since May 13 because the workflow ran
  `pip install -e backend/`, `npm ci portal/`, and node-check on
  chrome-extension/, but those dirs only exist after /init scaffolds them.
  Fix: each job now checks `if [ -d <dir> && -f <required-file> ]` first
  and skips cleanly when not scaffolded, with a GitHub notice explaining
  why. Adds a new `DevKit Itself` job that always runs and validates the
  template artifacts (settings.json + .mcp.json parse, hooks pass
  shellcheck, hooks don't crash on empty input, template configs parse).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .claude/hooks/prompt-expand.sh | 17 +++++--
 .claude/hooks/session-end.sh   | 16 +++----
 .claude/settings.json          |  2 +-
 .github/workflows/ci.yml       | 82 ++++++++++++++++++++++++++++++++++
 4 files changed, 105 insertions(+), 12 deletions(-)

diff --git a/.claude/hooks/prompt-expand.sh b/.claude/hooks/prompt-expand.sh
index b89d711..63c0dc3 100644
--- a/.claude/hooks/prompt-expand.sh
+++ b/.claude/hooks/prompt-expand.sh
@@ -23,10 +23,21 @@ PROJECT_DIR="${CLAUDE_PROJECT_DIR:-.}"
 ADDITIONS=""
 
 # Reference to "the plan" → surface latest plan file
+# Guard the directory check first — find on a missing path returns non-zero
+# and pipefail would kill the whole hook. -printf is also GNU-only (would
+# break on macOS BSD find), so use a portable -ls + awk path instead.
 if echo "$PROMPT" | grep -qiE '(the plan|the plan file|docs/plans)'; then
-  LATEST_PLAN=$(find "$PROJECT_DIR/docs/plans" -maxdepth 2 -name "*.md" -printf '%T@ %p\n' 2>/dev/null | sort -nr | head -1 | cut -d' ' -f2-)
-  if [ -n "$LATEST_PLAN" ]; then
-    ADDITIONS+="Latest plan file: ${LATEST_PLAN}\n"
+  if [ -d "$PROJECT_DIR/docs/plans" ]; then
+    LATEST_PLAN=$(find "$PROJECT_DIR/docs/plans" -maxdepth 2 -name "*.md" -type f 2>/dev/null \
+      | while read -r f; do
+          # portable mtime: use stat in unix-y form if available, else date fallback
+          ts=$(stat -c '%Y' "$f" 2>/dev/null || stat -f '%m' "$f" 2>/dev/null || echo 0)
+          printf '%s\t%s\n' "$ts" "$f"
+        done \
+      | sort -rn | head -1 | cut -f2-)
+    if [ -n "$LATEST_PLAN" ]; then
+      ADDITIONS+="Latest plan file: ${LATEST_PLAN}\n"
+    fi
   fi
 fi
 
diff --git a/.claude/hooks/session-end.sh b/.claude/hooks/session-end.sh
index d66959b..e654701 100644
--- a/.claude/hooks/session-end.sh
+++ b/.claude/hooks/session-end.sh
@@ -38,15 +38,15 @@ if git diff --name-only HEAD 2>/dev/null | grep -q "^CLAUDE.md$"; then
   exit 0
 fi
 
-# Count edits this session (from audit log if available)
-AUDIT_LOG="$PROJECT_DIR/.claude/audit.log"
-EDIT_COUNT=0
-if [ -f "$AUDIT_LOG" ]; then
-  TODAY=$(date -u '+%Y-%m-%d')
-  EDIT_COUNT=$(grep "^\[${TODAY}" "$AUDIT_LOG" 2>/dev/null | grep -cE 'EDIT|WRITE' || echo 0)
-fi
+# Count edits this session.
+# We count changed files from `git diff` (both staged and unstaged) rather
+# than grepping the audit log — the audit log only contains `BASH:` lines
+# from security-guard.sh, so an EDIT|WRITE grep would always be 0.
+# git diff gives us the actual files mutated, which is what we care about.
+EDIT_COUNT=$(git status --porcelain 2>/dev/null | wc -l | tr -d ' ')
+EDIT_COUNT=${EDIT_COUNT:-0}
 
-# Only nudge if substantial work
+# Only nudge if substantial work (3+ files touched)
 if [ "$EDIT_COUNT" -lt 3 ]; then
   exit 0
 fi
diff --git a/.claude/settings.json b/.claude/settings.json
index b57b699..f3936c9 100644
--- a/.claude/settings.json
+++ b/.claude/settings.json
@@ -80,7 +80,7 @@
         ]
       },
       {
-        "matcher": "WebFetch|WebSearch",
+        "matcher": "WebFetch|WebSearch|mcp__firecrawl__firecrawl_scrape|mcp__firecrawl__firecrawl_search|mcp__firecrawl__firecrawl_crawl",
         "hooks": [
           {
             "type": "command",
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index d68932d..1913338 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -1,5 +1,10 @@
 name: CI
 
+# DevKit is a template repo — `backend/`, `portal/`, and `chrome-extension/`
+# do not exist until a user runs `/init`. Each job below checks whether its
+# target directory exists at the repo root and skips cleanly if not, so this
+# workflow is green on the bare template AND on scaffolded projects.
+
 on:
   push:
     branches: [main]
@@ -12,19 +17,33 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - uses: actions/checkout@v4
+      - name: Check whether backend/ exists
+        id: has_backend
+        run: |
+          if [ -d backend ] && [ -f backend/pyproject.toml ]; then
+            echo "exists=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "exists=false" >> "$GITHUB_OUTPUT"
+            echo "::notice::backend/ not scaffolded yet — skipping Python CI. Run /init in your project to scaffold."
+          fi
       - uses: actions/setup-python@v5
+        if: steps.has_backend.outputs.exists == 'true'
         with:
           python-version: "3.12"  # Match templates/backend/pyproject.toml
           cache: pip
       - run: pip install -e ".[dev]"
+        if: steps.has_backend.outputs.exists == 'true'
         working-directory: backend
       - name: Lint (ruff)
+        if: steps.has_backend.outputs.exists == 'true'
         run: ruff check .
         working-directory: backend
       - name: Type check (pyright)
+        if: steps.has_backend.outputs.exists == 'true'
         run: pyright
         working-directory: backend
       - name: Test
+        if: steps.has_backend.outputs.exists == 'true'
         run: pytest --tb=short -q
         working-directory: backend
         env:
@@ -36,16 +55,29 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - uses: actions/checkout@v4
+      - name: Check whether portal/ exists
+        id: has_portal
+        run: |
+          if [ -d portal ] && [ -f portal/package.json ]; then
+            echo "exists=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "exists=false" >> "$GITHUB_OUTPUT"
+            echo "::notice::portal/ not scaffolded yet — skipping Next.js CI. Run /init in your project to scaffold."
+          fi
       - uses: actions/setup-node@v4
+        if: steps.has_portal.outputs.exists == 'true'
         with:
           node-version: "22"  # LTS, required for Next.js 16
           cache: npm
           cache-dependency-path: portal/package-lock.json
       - run: npm ci
+        if: steps.has_portal.outputs.exists == 'true'
         working-directory: portal
       - run: npm run build
+        if: steps.has_portal.outputs.exists == 'true'
         working-directory: portal
       - run: npm test -- --passWithNoTests
+        if: steps.has_portal.outputs.exists == 'true'
         working-directory: portal
 
   extension:
@@ -53,13 +85,63 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - uses: actions/checkout@v4
+      - name: Check whether chrome-extension/ exists
+        id: has_ext
+        run: |
+          if [ -d chrome-extension ] && [ -f chrome-extension/manifest.json ]; then
+            echo "exists=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "exists=false" >> "$GITHUB_OUTPUT"
+            echo "::notice::chrome-extension/ not scaffolded yet — skipping MV3 CI. Run /init with extension=full/popup-only to scaffold."
+          fi
       - uses: actions/setup-node@v4
+        if: steps.has_ext.outputs.exists == 'true'
         with:
           node-version: "22"
       - name: Syntax check
+        if: steps.has_ext.outputs.exists == 'true'
         run: |
           for f in chrome-extension/*.js; do
             node --check "$f" || exit 1
           done
       - name: Validate manifest
+        if: steps.has_ext.outputs.exists == 'true'
         run: node -e "JSON.parse(require('fs').readFileSync('chrome-extension/manifest.json'))"
+
+  hooks-and-templates:
+    # Always runs — validates the kit itself (hooks parse, settings.json is valid,
+    # template files are syntactically clean) on every PR, even before /init.
+    name: DevKit Itself (hooks + templates)
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install jq + shellcheck
+        run: sudo apt-get update && sudo apt-get install -y jq shellcheck
+      - name: Validate settings.json
+        run: |
+          python3 -c "import json; json.load(open('.claude/settings.json'))"
+          python3 -c "import json; json.load(open('.mcp.json'))"
+      - name: shellcheck hooks
+        run: |
+          for f in .claude/hooks/*.sh; do
+            shellcheck --severity=warning "$f" || echo "::warning::shellcheck issues in $f"
+          done
+      - name: Smoke-test hooks (must not crash with empty input)
+        run: |
+          set -e
+          bash .claude/hooks/session-start.sh < /dev/null
+          echo '{}' | bash .claude/hooks/statusline.sh
+          echo '{"user_prompt":"hello"}' | bash .claude/hooks/prompt-expand.sh
+          echo '{"user_prompt":"check the plan"}' | bash .claude/hooks/prompt-expand.sh
+          echo '{"user_prompt":"deploy to prod"}' | bash .claude/hooks/prompt-expand.sh
+          bash .claude/hooks/session-end.sh < /dev/null
+          echo '{"tool_input":{"command":"ls"}}' | bash .claude/hooks/security-guard.sh
+          echo "All hooks exited cleanly."
+      - name: Python syntax check on hooks
+        run: python3 -m py_compile .claude/hooks/git_smart_push.py
+      - name: Template backend pyproject is valid
+        run: python3 -c "import tomllib; tomllib.load(open('templates/backend/pyproject.toml','rb'))"
+      - name: Template portal package.json is valid
+        run: python3 -c "import json; json.load(open('templates/portal/package.json'))"
+      - name: Template extension manifest is valid
+        run: python3 -c "import json; json.load(open('templates/extension/manifest.json'))"

From b101a4b288858f1e4197dc105cf4a7f8f5c38a00 Mon Sep 17 00:00:00 2001
From: Norman Beckford <norman@axumquant.com>
Date: Sat, 16 May 2026 19:28:58 -0400
Subject: [PATCH 4/5] fix: wrap research-gate.sh output in hookSpecificOutput
 envelope
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Addresses P1 review comment 3253722688 from chatgpt-codex-connector.

The hook was emitting `{"additionalContext": "..."}` at top level, but
Claude Code's PreToolUse schema expects the envelope:

  {"hookSpecificOutput": {
      "hookEventName": "PreToolUse",
      "additionalContext": "..."
  }}

Without the wrapper Claude Code silently ignored the output, so every
research-gate routing hint was inert — WebFetch/WebSearch/Firecrawl
calls fell through with no guidance despite the hook being wired and
firing.

Replaced 6 inline `echo '{"additionalContext": ...}'` lines with a
single `emit_ctx` helper that produces the correctly-wrapped JSON.
Also added the standard `command -v jq` graceful-degradation guard
that the other hooks have.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .claude/hooks/research-gate.sh | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/.claude/hooks/research-gate.sh b/.claude/hooks/research-gate.sh
index dc5de39..81169fc 100644
--- a/.claude/hooks/research-gate.sh
+++ b/.claude/hooks/research-gate.sh
@@ -1,45 +1,60 @@
 #!/usr/bin/env bash
 # Research Gate Hook — smart routing across Context7, Tavily, and Firecrawl
 # Fires on PreToolUse for WebFetch, WebSearch, and Firecrawl tools
-# Returns additionalContext guiding agents to the cheapest effective tool
+# Returns additionalContext (wrapped in hookSpecificOutput per Claude Code's
+# PreToolUse schema) guiding agents to the cheapest effective tool.
 
 set -euo pipefail
 
+# Graceful degradation: skip silently if jq isn't installed
+command -v jq >/dev/null 2>&1 || exit 0
+
 INPUT=$(cat)
 TOOL_NAME=$(echo "$INPUT" | jq -r '.tool_name // ""')
 URL=$(echo "$INPUT" | jq -r '.tool_input.url // ""')
 
+# emit_ctx <message> — wrap context in the PreToolUse hookSpecificOutput
+# envelope. Without this wrapper, Claude Code ignores the output entirely.
+emit_ctx() {
+  jq -n --arg ctx "$1" '{
+    "hookSpecificOutput": {
+      "hookEventName": "PreToolUse",
+      "additionalContext": $ctx
+    }
+  }'
+}
+
 case "$TOOL_NAME" in
 
   # ── WebFetch: Most expensive path — always suggest alternatives ──
   WebFetch)
     if echo "$URL" | grep -qiE "docs\.|documentation|readthedocs|pypi\.org|npmjs\.com|crates\.io"; then
-        echo '{"additionalContext": "⚡ RESEARCH GATE: Library docs detected. Use Context7 (resolve-library-id → query-docs) — 97% fewer tokens than WebFetch. If Context7 lacks coverage, use firecrawl_extract with onlyMainContent:true."}'
+        emit_ctx "⚡ RESEARCH GATE: Library docs detected. Use Context7 (resolve-library-id → query-docs) — 97% fewer tokens than WebFetch. If Context7 lacks coverage, use firecrawl_extract with onlyMainContent:true."
     elif echo "$URL" | grep -qiE "github\.com/.*/issues|stackoverflow\.com|stackexchange\.com"; then
-        echo '{"additionalContext": "⚡ RESEARCH GATE: Use tavily_extract for GitHub issues/SO answers (~80% fewer tokens). Or firecrawl_scrape with onlyMainContent:true if you need code blocks preserved."}'
+        emit_ctx "⚡ RESEARCH GATE: Use tavily_extract for GitHub issues/SO answers (~80% fewer tokens). Or firecrawl_scrape with onlyMainContent:true if you need code blocks preserved."
     else
-        echo '{"additionalContext": "⚡ RESEARCH GATE: WebFetch is the most expensive option. Cheaper alternatives: Context7 for library docs (~500 tokens), tavily_search for broad topics (~1K), firecrawl_extract for structured data from URLs (~2K), firecrawl_scrape with onlyMainContent:true (~3K). Read .claude/skills/smart-research/SKILL.md."}'
+        emit_ctx "⚡ RESEARCH GATE: WebFetch is the most expensive option. Cheaper alternatives: Context7 for library docs (~500 tokens), tavily_search for broad topics (~1K), firecrawl_extract for structured data from URLs (~2K), firecrawl_scrape with onlyMainContent:true (~3K). Read .claude/skills/smart-research/SKILL.md."
     fi
     ;;
 
   # ── WebSearch: Suggest Tavily search (AI-summarized) ──
   WebSearch)
-    echo '{"additionalContext": "⚡ RESEARCH GATE: Use tavily_search (AI-summarized, fewer tokens) or tavily_research (deep multi-source). For site-specific search, use firecrawl_search. Combine: tavily_search finds URLs → firecrawl_extract gets structured data."}'
+    emit_ctx "⚡ RESEARCH GATE: Use tavily_search (AI-summarized, fewer tokens) or tavily_research (deep multi-source). For site-specific search, use firecrawl_search. Combine: tavily_search finds URLs → firecrawl_extract gets structured data."
     ;;
 
   # ── Firecrawl Scrape: Remind about selectors and alternatives ──
   mcp__firecrawl__firecrawl_scrape)
-    echo '{"additionalContext": "⚡ FIRECRAWL TIP: Always use onlyMainContent:true to skip nav/footer (~50% fewer tokens). For structured data, use firecrawl_extract with a JSON schema instead — it returns only the fields you need. For library docs, check Context7 first (cheapest)."}'
+    emit_ctx "⚡ FIRECRAWL TIP: Always use onlyMainContent:true to skip nav/footer (~50% fewer tokens). For structured data, use firecrawl_extract with a JSON schema instead — it returns only the fields you need. For library docs, check Context7 first (cheapest)."
     ;;
 
   # ── Firecrawl Search: Guide combination with extract ──
   mcp__firecrawl__firecrawl_search)
-    echo '{"additionalContext": "⚡ FIRECRAWL TIP: firecrawl_search is great for discovery. Combine with firecrawl_extract on the best results for structured data. If you just need summaries, tavily_search may be cheaper. For known URLs, skip search — go straight to firecrawl_scrape or tavily_extract."}'
+    emit_ctx "⚡ FIRECRAWL TIP: firecrawl_search is great for discovery. Combine with firecrawl_extract on the best results for structured data. If you just need summaries, tavily_search may be cheaper. For known URLs, skip search — go straight to firecrawl_scrape or tavily_extract."
     ;;
 
   # ── Firecrawl Crawl: Remind about scope limits ──
   mcp__firecrawl__firecrawl_crawl)
-    echo '{"additionalContext": "⚡ FIRECRAWL TIP: Set a low limit (5-10 pages) to control token cost. Use firecrawl_map first to discover the site structure, then crawl only the relevant sections. For library docs, Context7 already has it indexed — check there first."}'
+    emit_ctx "⚡ FIRECRAWL TIP: Set a low limit (5-10 pages) to control token cost. Use firecrawl_map first to discover the site structure, then crawl only the relevant sections. For library docs, Context7 already has it indexed — check there first."
     ;;
 
   # ── Anything else: pass through ──

From 6b79dad2743b0f35fa1b95c4548b4c7b840844d5 Mon Sep 17 00:00:00 2001
From: Norman Beckford <norman@axumquant.com>
Date: Sat, 16 May 2026 19:37:21 -0400
Subject: [PATCH 5/5] fix: anchor hook paths + close fail-open gap in
 security-guard
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Addresses two Codex review comments on b101a4b.

P1 — settings.json hook paths
  All 7 hook commands used relative paths like `bash .claude/hooks/X.sh`.
  Claude Code runs hooks in the *current working directory*, so after the
  agent `cd`s into a subdirectory (very common during builds, tests, or
  exploring monorepos) the path no longer resolves and the hook silently
  stops firing — including the security guard.
  Fix: prefix every hook command (and the statusLine command) with
  `${CLAUDE_PROJECT_DIR}/` so they resolve from the repo root regardless
  of cwd. Verified all 7 commands now use the anchor.

P2 — security-guard.sh fail-open without jq
  When jq is missing, security-guard.sh fails with command-not-found
  before any guard check runs. Claude Code treats the non-zero exit as
  "hook failed" and the Bash call proceeds anyway — a silent fail-open
  on a destructive-command guard.
  Fix: explicit `command -v jq` precheck at the top. If absent, exits 0
  (fail-open so the session keeps working) but emits a LOUD stderr
  warning so the gap is visible to the user. Pairs with CLAUDE.md
  Prerequisites which already tells users to install jq.

  Also added the same `command -v jq` guard to verification-gate.sh
  (silent exit — it only injects advisory context, no enforcement).

The 4 hooks I authored in the initial PR already had the jq precheck;
this commit applies the pattern to the 2 pre-existing devkit hooks too
so the whole hook directory has consistent behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .claude/hooks/security-guard.sh    | 11 +++++++++++
 .claude/hooks/verification-gate.sh |  5 +++++
 .claude/settings.json              | 14 +++++++-------
 3 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/.claude/hooks/security-guard.sh b/.claude/hooks/security-guard.sh
index 4184232..cb42624 100644
--- a/.claude/hooks/security-guard.sh
+++ b/.claude/hooks/security-guard.sh
@@ -4,6 +4,17 @@
 
 set -euo pipefail
 
+# Hard dependency on jq. Without it we cannot parse the tool input to know
+# which command is about to run, so the destructive-command guard cannot
+# function. We exit 0 (fail-open so the user's session isn't bricked) but
+# emit a LOUD stderr warning so the gap is visible — silent fail-open on
+# a security guard would be worse than failing closed.
+# Install per CLAUDE.md "Prerequisites" section.
+if ! command -v jq >/dev/null 2>&1; then
+  echo "[security-guard] ⚠️  jq is NOT installed — destructive-command guard is DISABLED for this session. Install jq (see CLAUDE.md Prerequisites) to re-enable. The Bash call will proceed unchecked." >&2
+  exit 0
+fi
+
 INPUT=$(cat)
 COMMAND=$(echo "$INPUT" | jq -r '.tool_input.command // ""')
 TIMESTAMP=$(date -u '+%Y-%m-%dT%H:%M:%SZ')
diff --git a/.claude/hooks/verification-gate.sh b/.claude/hooks/verification-gate.sh
index da5ce3a..c63e8c3 100644
--- a/.claude/hooks/verification-gate.sh
+++ b/.claude/hooks/verification-gate.sh
@@ -9,6 +9,11 @@
 
 set -euo pipefail
 
+# Graceful degradation: skip silently if jq isn't installed.
+# This hook only injects advisory context (no enforcement), so silent
+# fail-open is appropriate here. See CLAUDE.md Prerequisites for jq install.
+command -v jq >/dev/null 2>&1 || exit 0
+
 # Read hook input from stdin
 INPUT=$(cat)
 
diff --git a/.claude/settings.json b/.claude/settings.json
index f3936c9..b8f1fce 100644
--- a/.claude/settings.json
+++ b/.claude/settings.json
@@ -44,7 +44,7 @@
   },
   "statusLine": {
     "type": "command",
-    "command": "bash .claude/hooks/statusline.sh"
+    "command": "bash ${CLAUDE_PROJECT_DIR}/.claude/hooks/statusline.sh"
   },
   "hooks": {
     "SessionStart": [
@@ -53,7 +53,7 @@
         "hooks": [
           {
             "type": "command",
-            "command": "bash .claude/hooks/session-start.sh"
+            "command": "bash ${CLAUDE_PROJECT_DIR}/.claude/hooks/session-start.sh"
           }
         ]
       }
@@ -64,7 +64,7 @@
         "hooks": [
           {
             "type": "command",
-            "command": "bash .claude/hooks/prompt-expand.sh"
+            "command": "bash ${CLAUDE_PROJECT_DIR}/.claude/hooks/prompt-expand.sh"
           }
         ]
       }
@@ -75,7 +75,7 @@
         "hooks": [
           {
             "type": "command",
-            "command": "bash .claude/hooks/security-guard.sh"
+            "command": "bash ${CLAUDE_PROJECT_DIR}/.claude/hooks/security-guard.sh"
           }
         ]
       },
@@ -84,7 +84,7 @@
         "hooks": [
           {
             "type": "command",
-            "command": "bash .claude/hooks/research-gate.sh"
+            "command": "bash ${CLAUDE_PROJECT_DIR}/.claude/hooks/research-gate.sh"
           }
         ]
       }
@@ -95,7 +95,7 @@
         "hooks": [
           {
             "type": "command",
-            "command": "bash .claude/hooks/verification-gate.sh"
+            "command": "bash ${CLAUDE_PROJECT_DIR}/.claude/hooks/verification-gate.sh"
           }
         ]
       }
@@ -106,7 +106,7 @@
         "hooks": [
           {
             "type": "command",
-            "command": "bash .claude/hooks/session-end.sh"
+            "command": "bash ${CLAUDE_PROJECT_DIR}/.claude/hooks/session-end.sh"
           }
         ]
       }