darkroomengineering · arzafran · Jun 3, 2026 · Jun 3, 2026 · Jun 3, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -12,6 +12,13 @@ All notable changes to cc-settings are documented here.
   - **Removed** — `.github/workflows/upstream-sync.yml`; `openSyncPr`/`saveManifest` and the `--open-pr` branch in `src/upstream/scan.ts` (now a pure dry-run detector, no `writeFile`/`runProcessFull` imports); stale "daily GH Action"/"bot owns this file" references in the manifest `description`/`source`, `skills/cc/SKILL.md`, `src/schemas/skill.ts`, and the `.github/workflows/ci.yml` `upstream-scan` comment.
   - **Kept** — `bun run upstream:scan` (the dry-run detector) and its non-gating CI job, now the manual way to spot drift before running `/cc sync`.
 - **Upstream sync to Claude Code 2.1.161.** Manifest bumped `2.1.160` → `2.1.161`. The release is otherwise all bug fixes / UI / perf with no cc-settings surface; the one tracked addition is `CLAUDE_CODE_TMPDIR` (surfaced by the `EADDRINUSE`/Unix-socket fix), added to the manifest `knownEnvVars` and the `docs/settings-reference.md` env-var table. `OTEL_RESOURCE_ATTRIBUTES` now feeds metric-datapoint labels but is a generic OTEL SDK var outside our CC-specific tracking convention — deliberately skipped.
+- **Removed the dead TLDR session-stats telemetry pair** (`src/scripts/track-tldr.ts` + `tldr-stats.ts`, their hook registrations, tests, and doc rows). It was a closed read-loop — track-tldr accumulated an *estimated* token-savings counter that only tldr-stats read, to print a vanity box at session end. No external consumer; TLDR itself is unaffected. Verified `swarm-log.ts` (feeds `/review-batch`) and `session-title.ts` (powers `claude --resume <name>`) are load-bearing and kept.
+- **Folded Anthropic's dynamic-workflows learnings into existing guidance — no new files, no new skills.** Distilled the "A harness for every task" write-up into the places that already decide *when to orchestrate*, rather than adding surface:
+  - `skills/orchestrate/SKILL.md` — the dynamic-workflows section now leads with the **trigger** (the three single-window failure modes: agentic laziness, self-preferential bias, goal drift) instead of task type, names the canonical shapes (classify-and-act, fan-out-and-synthesize, generate-and-filter, tournament, loop-until-done), and surfaces quick workflows, token budgets, and the quarantine pattern for untrusted-input triage.
+  - `CLAUDE-FULL.md` — failure-mode trigger added to the delegation decision tree.
+  - `skills/oracle/SKILL.md` — compare mode now prefers pairwise/tournament judgment over absolute weighted scores for close or taste-heavy calls (comparative judgment resists the score-compression that clusters everything at 6–8).
+  - `MANUAL.md` — token-budget directive for workflows; `skills/nuclear-review/SKILL.md` — its example workflow reframed as a template to adapt, not run verbatim.
+  - **Deliberately not done**: the proposed `/ship` → `bun run proof` consolidation was rejected — `bun run proof` is a cc-settings-repo-local script, and `/ship` runs in *any* project, so the cut would have broken it everywhere else. Caught by verifying the (adversarially-generated) cut list instead of trusting it.
 - **Shared team-knowledge migrated from GitHub Project #7 to a markdown repo** (`darkroomengineering/team-knowledge`). The board was structurally hostile to its primary consumers (agents): network-gated GraphQL reads, not greppable offline, and no linter-enforced structure — `gh project item-create` never set the `Kind` field, so every `/share-learning` post landed `kind: None` and was invisible to a Kind-filtered query. The repo is one note per file + a generated `INDEX.md`, mirroring the local auto-memory tier. Decision + roadmap: `docs/plans/knowledge-repo-migration.md` (weighted comparison scored repo 795 vs board 515).
   - **New** — `src/schemas/knowledge.ts` (zod frontmatter contract: `name`/`kind`/`tags`/`added-by`/`supersedes`), `src/lib/lint-knowledge.ts` + `src/scripts/lint-knowledge.ts` (`bun run lint:knowledge`), `src/scripts/new-note.ts` (`bun run new-note`), `tests/lint-knowledge.test.ts`, emitted `schemas/knowledge.schema.json`.
   - **Changed** — `/share-learning` now reads `INDEX.md` for dedup and writes notes via `gh api` (was `gh project item-list`/`item-create`); `docs/knowledge-system.md`, the `AGENTS.md` Knowledge Routing section, and assorted docs retargeted; env `KNOWLEDGE_PROJECT_NUMBER` → `KNOWLEDGE_REPO`.
@@ -25,6 +32,8 @@ All notable changes to cc-settings are documented here.
 
 ### Fixed
 
+- **Stale dynamic-workflow trigger keyword** — `skills/orchestrate/SKILL.md` said the one-shot trigger was the word `workflow`; the force-keyword was renamed `workflow` → `ultracode` in CC v2.1.160. Corrected to `ultracode` (natural-language "use a workflow" still works as opt-in).
+- **`skills/ship/SKILL.md` had two `Step 8` headers** — the post-push CI-watch step is now `Step 9`.
 - **`--rollback` now restores `~/.claude.json`** — `createBackup`/`cmdRollback` in `src/setup.ts` archived only `settings.json`/`CLAUDE.md`/`AGENTS.md` under `~/.claude`, so the MCP config `installMcpToClaudeJson` rewrites (at `~/.claude.json`, *outside* `~/.claude`) had no rollback path. Backups are now `$HOME`-relative and include it; `cmdRollback` sniffs the archive layout so older `~/.claude`-relative backups still restore to the right place.
 - **`src/scripts/session-title.ts`** read the prompt from `process.env.PROMPT` only, silently no-op'ing whenever Claude Code delivered `UserPromptSubmit` data via stdin JSON (the primary path). It now reads via `readHookInput<{ session_id, prompt }>` with env fallback, matching the sibling `delegation-detector.ts` hook on the same event.
 

diff --git a/CLAUDE-FULL.md b/CLAUDE-FULL.md
@@ -36,6 +36,7 @@ The Edit tool uses exact string matching. Follow these rules:
 - **Code review on changes touching 3+ files** → `Agent(reviewer, "...")`
 - **Expert second opinions / blast-radius / "why is this here" questions** → `Agent(explore, "...")`
 - **Full-feature orchestration across 3+ agents** → `Agent(maestro, "...")`
+- **Tasks structurally prone to single-window failure** — agentic laziness (quitting at 20 of 50 items), self-preferential bias (judging your own output), goal drift across compaction → a [dynamic workflow](https://code.claude.com/docs/en/workflows) / `/effort ultracode` (see `skills/orchestrate/SKILL.md`)
 
 > **Briefing contract for `implementer`**: as a subagent it gets only your prompt — no conversation context, none of the files you've read — so every prompt MUST contain actual content, not references: the user's ask verbatim, exact file paths and line ranges, the change to make (paste the planner output; never write "based on findings" or "according to plan"), the verification command, and a scope boundary. Thin prompts cause regressions; the agent will refuse them. It runs in the live working tree and leaves changes **uncommitted** for you to review before they land. Full contract: `agents/implementer.md` REQUIRED BRIEFING. This applies equally to `explore` → `implementer` and `planner` → `implementer` chains.
 

diff --git a/MANUAL.md b/MANUAL.md
@@ -329,7 +329,7 @@ The idea (from the "Orchestration Tax"): your review throughput is the real bott
 
 Say: *"think harder"* or *"quick fix"*
 
-Triggers `/effort` — adjusts reasoning depth. Levels: `low`, `medium`, `high`, `xhigh`, `max`. cc-settings pins **`xhigh`** as the default via `CLAUDE_CODE_EFFORT_LEVEL` — on Opus 4.8 the model's own default is `high`, and the pin overrides it to hold reasoning depth steady across model upgrades. `ultracode` is a session-only mode (`/effort ultracode`) that layers automatic [dynamic-workflow](https://code.claude.com/docs/en/workflows) orchestration on top of `xhigh`; it can't be persisted as an effort level.
+Triggers `/effort` — adjusts reasoning depth. Levels: `low`, `medium`, `high`, `xhigh`, `max`. cc-settings pins **`xhigh`** as the default via `CLAUDE_CODE_EFFORT_LEVEL` — on Opus 4.8 the model's own default is `high`, and the pin overrides it to hold reasoning depth steady across model upgrades. `ultracode` is a session-only mode (`/effort ultracode`) that layers automatic [dynamic-workflow](https://code.claude.com/docs/en/workflows) orchestration on top of `xhigh`; it can't be persisted as an effort level. Workflows use more tokens than a single window — cap one by prompting a budget (e.g. _"use a workflow, budget 10k tokens"_).
 
 ### Model on AWS / Bedrock / Vertex / Foundry
 

diff --git a/config/40-hooks.json b/config/40-hooks.json
@@ -89,17 +89,6 @@
 					}
 				]
 			},
-			{
-				"matcher": "mcp__tldr",
-				"hooks": [
-					{
-						"type": "command",
-						"command": "bun \"$HOME/.claude/src/scripts/track-tldr.ts\" \"$TOOL_NAME\"",
-						"async": true,
-						"timeout": 10
-					}
-				]
-			},
 			{
 				"matcher": "Bash",
 				"hooks": [
@@ -202,7 +191,7 @@
 				"hooks": [
 					{
 						"type": "command",
-						"command": "bun \"$HOME/.claude/src/scripts/tldr-stats.ts\"; bun \"$HOME/.claude/src/scripts/handoff.ts\" create",
+						"command": "bun \"$HOME/.claude/src/scripts/handoff.ts\" create",
 						"async": true,
 						"timeout": 30
 					}

diff --git a/docs/hooks-reference.md b/docs/hooks-reference.md
@@ -353,12 +353,6 @@ Logs are used by the `/audit` skill (`claude-audit.ts`) to analyze command patte
 
 **Retention:** Controlled by `CLAUDE_LOG_RETENTION_DAYS` env var (default: 1 day, today only). Old logs are pruned automatically on each hook fire.
 
-### PostToolUse (mcp__tldr matcher)
-
-| Script | Purpose | Async |
-|--------|---------|-------|
-| `track-tldr.ts "$TOOL_NAME"` | Tracks TLDR MCP usage statistics | Yes |
-
 ### PostToolUse (no matcher — every tool)
 
 | Script | Purpose | Async |
@@ -411,7 +405,7 @@ Logs are used by the `/audit` skill (`claude-audit.ts`) to analyze command patte
 
 | Script | Purpose | Async |
 |--------|---------|-------|
-| `tldr-stats.ts` + `handoff.ts create` | Prints TLDR session stats and saves final handoff state | Yes |
+| `handoff.ts create` | Saves final handoff state at session end | Yes |
 
 ### CwdChanged
 
@@ -490,7 +484,6 @@ Run a Claude Code session and trigger the relevant event. Check logs for output
 | `~/.claude/swarm.log` | Subagent start/stop and task created/completed events |
 | `~/.claude/sessions.log` | Session lifecycle events |
 | `~/.claude/session-titles/` | Per-session title flags (set once per session by `session-title.ts`) |
-| `~/.claude/tldr-session-stats.json` | TLDR tool usage statistics per session |
 | `~/.claude/logs/tool-failures.log` | Tool failure events (from `post-failure.ts`) |
 | `~/.claude/safety-net.log` | Blocked command audit log (from `safety-net.ts`) |
 | `~/.claude/logs/bash-*.log` | Daily Bash command logs (from `log-bash.ts`, analyzed by `/audit`) |

diff --git a/hooks/README.md b/hooks/README.md
@@ -20,15 +20,14 @@ For full hook documentation (all 29 events, configuration format, matchers, debu
 | `PreToolUse` | Before Edit calls (Edit matcher) | `pre-edit-validate.ts` (harness optimization) | No |
 | `PostToolUse` | After Write/Edit | `post-edit.ts` (auto-format with Biome) | No |
 | `PostToolUse` | After Write/Edit | `post-edit-tsc.ts` (async TypeScript type check) | **Yes** |
-| `PostToolUse` | After TLDR MCP calls | `track-tldr.ts` (usage stats) | **Yes** |
 | `PostToolUse` | After Bash commands | `log-bash.ts` (command audit log) | **Yes** |
 | `PostToolUse` | After every tool (no matcher) | `parallelmax-nudge.ts` (counts consecutive non-Agent calls; at N=8 injects delegation reminder; resets on Agent tool; 60s debounce) | No |
 | `PostToolUseFailure` | Tool execution fails | `post-failure.ts` (logs failures, warns on repeats) | No |
 | `PreCompact` | Before context compaction | `handoff.ts create` (saves state) | No |
 | `PostCompact` | After context compaction | `post-compact.ts` (recovery steps reminder) | No |
 | `Stop` | Claude finishes | `stop-summary.ts` (learning reminder if >5 files changed) | No |
 | `StopFailure` | Turn ends due to API error | `stop-failure.ts` (logs, surfaces rate limit info) | No |
-| `SessionEnd` | Session ending | `tldr-stats.ts` + `handoff.ts create` | **Yes** |
+| `SessionEnd` | Session ending | `handoff.ts create` | **Yes** |
 | `SubagentStart` | Subagent spawns | Logs to `~/.claude/swarm.log` | **Yes** |
 | `SubagentStop` | Subagent finishes | Logs to `~/.claude/swarm.log` | **Yes** |
 | `Notification` | Task completion | `notify.ts` (macOS/Linux notification) | **Yes** |

diff --git a/skills/nuclear-review/SKILL.md b/skills/nuclear-review/SKILL.md
@@ -24,7 +24,7 @@ A typical sequence: `/nuclear-review` produces findings → engineers cherry-pic
 
 > **Tip (Claude Code v2.1.154+)**: run `/effort ultracode` before invoking this skill. The whole-codebase audit is the canonical shape that benefits from dynamic workflows — phase state lives in the workflow script rather than Claude's context window, individual modules can be reviewed in parallel (up to 16 concurrent), and the run can resume from cached agent results within the session.
 >
-> A ready-made example ships at `references/nuclear-review.workflow.js` (installed to `~/.claude/skills/nuclear-review/references/`). It is **opt-in, not a dependency** — the skill above works with no Workflow tool. Run it with `Workflow({ scriptPath: "~/.claude/skills/nuclear-review/references/nuclear-review.workflow.js" })`, or copy it into `.claude/workflows/`. It maps the repo, fans out one structural reviewer per module, audits dependencies, and synthesizes one report.
+> A ready-made example ships at `references/nuclear-review.workflow.js` (installed to `~/.claude/skills/nuclear-review/references/`). It is **opt-in, not a dependency** — the skill above works with no Workflow tool. Run it with `Workflow({ scriptPath: "~/.claude/skills/nuclear-review/references/nuclear-review.workflow.js" })`, or copy it into `.claude/workflows/`. It maps the repo, fans out one structural reviewer per module, audits dependencies, and synthesizes one report. Treat it as a **template** — adapt its module list, schemas, and phases to the repo at hand rather than running it verbatim.
 
 ## Scope
 

diff --git a/skills/oracle/SKILL.md b/skills/oracle/SKILL.md
@@ -208,6 +208,8 @@ Produce the final output in ADR format.
 Score format: raw (weighted) where weighted = raw * weight
 ```
 
+> **When the call is close or taste-heavy, prefer pairwise judgment over absolute scores.** Absolute 1–10 scoring drifts and compresses — everything clusters at 6–8, and the "winner" can hinge on one evaluator's mood. Comparative judgment is more reliable: have evaluators judge A-vs-B head-to-head and run a small tournament (each comparison its own call). The weighted matrix above stays right for criteria-driven calls with hard trade-offs; reach for pairwise when the totals land within a few points of each other or the decision is about taste (naming, design, API feel).
+
 > This extends the base ADR template from `agents/planner.md` with scoring matrix and detailed risk sections.
 
 #### ADR Template

diff --git a/skills/orchestrate/SKILL.md b/skills/orchestrate/SKILL.md
@@ -55,12 +55,22 @@ Use full parallel team fan-out instead of sequential subagent delegation when:
 
 ### Alternative: dynamic workflows (research preview, v2.1.154+)
 
-Tasks that match the **Large codebase analysis** or **wide-blast-radius migration** shapes above can alternatively be expressed as a [dynamic workflow](https://code.claude.com/docs/en/workflows) — a JavaScript orchestration script that holds plan state outside Claude's context window, supports up to 16 concurrent agents per script (1000 total), and resumes from cached agent results within a session. Two entry points:
+A [dynamic workflow](https://code.claude.com/docs/en/workflows) is a JS harness that spawns subagents, holds plan state *outside* your context window, runs up to 16 agents concurrently (1000 total), and resumes from cached results within a session. The trigger isn't task *size* — it's whether the task risks one of three failure modes a single context window is prone to:
 
-- One-shot: include the word `workflow` in your prompt (e.g. _"Run a workflow to audit all SVG icons for accessibility issues"_) — Claude generates and runs the script.
-- Session-wide: `/effort ultracode` — Claude auto-orchestrates a workflow for every substantive task in the session.
+- **Agentic laziness** — stopping at 20 of 50 items and declaring done. → *fan-out-and-synthesize*: one agent per item, barrier-join the results.
+- **Self-preferential bias** — preferring your own output when you're also the judge. → *adversarial verification*: a separate agent refutes each finding; or a *tournament* of pairwise comparisons (more reliable than absolute scoring for ranking or taste).
+- **Goal drift** — losing "don't do X" constraints across compaction. → each subagent gets a focused, isolated goal that can't drift.
 
-The maestro `Agent()` fan-out documented above is the **default** orchestration mode in cc-settings; workflows are an option to reach for when you need replayability or scale beyond what subagent fan-out provides. Don't rewire skills to *depend* on the Workflow tool — its API is still preview-stage — but a skill may ship an *opt-in* example workflow (see `nuclear-review`'s `references/nuclear-review.workflow.js`): an example you run by choice, never a runtime dependency.
+Shapes worth naming when you build one: **classify-and-act**, **fan-out-and-synthesize**, **generate-and-filter**, **tournament**, **loop-until-done** (spawn until a stop condition, not a fixed count). Not only for marathons — a **quick workflow** is valid: _"quick workflow to adversarially check this one assumption."_
+
+- **Budget** — workflows burn more tokens; cap with _"…budget 10k tokens"_ and the harness enforces it.
+- **Quarantine** — for triage over untrusted input, agents that read public/untrusted content must not also take privileged actions; split reading from acting so an injected page can't trigger a privileged step.
+
+Two entry points:
+- One-shot: say _"use a workflow to …"_ or the keyword `ultracode` in your prompt (the force-keyword was renamed `workflow` → `ultracode` in v2.1.160). Pair with `/loop` for repeatable triage/verification/research.
+- Session-wide: `/effort ultracode` — auto-orchestrates a workflow for every substantive task.
+
+The maestro `Agent()` fan-out above is the **default** in cc-settings; workflows are for replayability or scale beyond subagent fan-out. Don't rewire skills to *depend* on the Workflow tool — its API is still preview-stage — but a skill may ship an *opt-in* example (see `nuclear-review`'s `references/nuclear-review.workflow.js`): a template you adapt, never a runtime dependency.
 
 ## Output
 

diff --git a/skills/ship/SKILL.md b/skills/ship/SKILL.md
@@ -141,7 +141,7 @@ Write "What this does" from the *diff and its purpose*, not by pasting commit
 subjects. If you can't explain it plainly, the change is unclear — say so, don't
 reach for bigger words.
 
-### Step 8: Watch CI Until Green (post-push)
+### Step 9: Watch CI Until Green (post-push)
 
 After `gh pr create`, watch the PR-attached checks until all pass. Use `gh pr checks` as the source of truth — it covers all PR checks, not just GitHub Actions runs (which `gh run list` would miss).