From b65ae03c94eef0982f3c44491292ec096207d697 Mon Sep 17 00:00:00 2001 From: arzafran Date: Wed, 3 Jun 2026 10:09:02 -0300 Subject: [PATCH 1/2] docs: fold dynamic-workflows learnings into orchestration guidance MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Distill Anthropic's 'A harness for every task' write-up into the places that already decide when to orchestrate — no new files or skills. - orchestrate: failure-mode triggers (agentic laziness / self-preferential bias / goal drift), pattern vocabulary, quick workflows, token budgets, quarantine pattern; fix stale 'workflow' trigger keyword (now 'ultracode') - CLAUDE-FULL: failure-mode trigger in the delegation decision tree - oracle: pairwise/tournament judgment over absolute scores for close/taste calls - MANUAL: workflow token-budget directive - nuclear-review: example workflow reframed as a template - ship: fix duplicate 'Step 8' header Rejected the /ship -> bun run proof consolidation: proof is repo-local and ship runs in any project, so the cut would regress it elsewhere. Net +21 LOC; consolidation skill-merges deferred to a separate deliberate pass. --- CHANGELOG.md | 8 ++++++++ CLAUDE-FULL.md | 1 + MANUAL.md | 2 +- skills/nuclear-review/SKILL.md | 2 +- skills/oracle/SKILL.md | 2 ++ skills/orchestrate/SKILL.md | 18 ++++++++++++++---- skills/ship/SKILL.md | 2 +- 7 files changed, 28 insertions(+), 7 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index cad5343..1850f28 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,6 +12,12 @@ All notable changes to cc-settings are documented here. - **Removed** — `.github/workflows/upstream-sync.yml`; `openSyncPr`/`saveManifest` and the `--open-pr` branch in `src/upstream/scan.ts` (now a pure dry-run detector, no `writeFile`/`runProcessFull` imports); stale "daily GH Action"/"bot owns this file" references in the manifest `description`/`source`, `skills/cc/SKILL.md`, `src/schemas/skill.ts`, and the `.github/workflows/ci.yml` `upstream-scan` comment. - **Kept** — `bun run upstream:scan` (the dry-run detector) and its non-gating CI job, now the manual way to spot drift before running `/cc sync`. - **Upstream sync to Claude Code 2.1.161.** Manifest bumped `2.1.160` → `2.1.161`. The release is otherwise all bug fixes / UI / perf with no cc-settings surface; the one tracked addition is `CLAUDE_CODE_TMPDIR` (surfaced by the `EADDRINUSE`/Unix-socket fix), added to the manifest `knownEnvVars` and the `docs/settings-reference.md` env-var table. `OTEL_RESOURCE_ATTRIBUTES` now feeds metric-datapoint labels but is a generic OTEL SDK var outside our CC-specific tracking convention — deliberately skipped. +- **Folded Anthropic's dynamic-workflows learnings into existing guidance — no new files, no new skills.** Distilled the "A harness for every task" write-up into the places that already decide *when to orchestrate*, rather than adding surface: + - `skills/orchestrate/SKILL.md` — the dynamic-workflows section now leads with the **trigger** (the three single-window failure modes: agentic laziness, self-preferential bias, goal drift) instead of task type, names the canonical shapes (classify-and-act, fan-out-and-synthesize, generate-and-filter, tournament, loop-until-done), and surfaces quick workflows, token budgets, and the quarantine pattern for untrusted-input triage. + - `CLAUDE-FULL.md` — failure-mode trigger added to the delegation decision tree. + - `skills/oracle/SKILL.md` — compare mode now prefers pairwise/tournament judgment over absolute weighted scores for close or taste-heavy calls (comparative judgment resists the score-compression that clusters everything at 6–8). + - `MANUAL.md` — token-budget directive for workflows; `skills/nuclear-review/SKILL.md` — its example workflow reframed as a template to adapt, not run verbatim. + - **Deliberately not done**: the proposed `/ship` → `bun run proof` consolidation was rejected — `bun run proof` is a cc-settings-repo-local script, and `/ship` runs in *any* project, so the cut would have broken it everywhere else. Caught by verifying the (adversarially-generated) cut list instead of trusting it. - **Shared team-knowledge migrated from GitHub Project #7 to a markdown repo** (`darkroomengineering/team-knowledge`). The board was structurally hostile to its primary consumers (agents): network-gated GraphQL reads, not greppable offline, and no linter-enforced structure — `gh project item-create` never set the `Kind` field, so every `/share-learning` post landed `kind: None` and was invisible to a Kind-filtered query. The repo is one note per file + a generated `INDEX.md`, mirroring the local auto-memory tier. Decision + roadmap: `docs/plans/knowledge-repo-migration.md` (weighted comparison scored repo 795 vs board 515). - **New** — `src/schemas/knowledge.ts` (zod frontmatter contract: `name`/`kind`/`tags`/`added-by`/`supersedes`), `src/lib/lint-knowledge.ts` + `src/scripts/lint-knowledge.ts` (`bun run lint:knowledge`), `src/scripts/new-note.ts` (`bun run new-note`), `tests/lint-knowledge.test.ts`, emitted `schemas/knowledge.schema.json`. - **Changed** — `/share-learning` now reads `INDEX.md` for dedup and writes notes via `gh api` (was `gh project item-list`/`item-create`); `docs/knowledge-system.md`, the `AGENTS.md` Knowledge Routing section, and assorted docs retargeted; env `KNOWLEDGE_PROJECT_NUMBER` → `KNOWLEDGE_REPO`. @@ -25,6 +31,8 @@ All notable changes to cc-settings are documented here. ### Fixed +- **Stale dynamic-workflow trigger keyword** — `skills/orchestrate/SKILL.md` said the one-shot trigger was the word `workflow`; the force-keyword was renamed `workflow` → `ultracode` in CC v2.1.160. Corrected to `ultracode` (natural-language "use a workflow" still works as opt-in). +- **`skills/ship/SKILL.md` had two `Step 8` headers** — the post-push CI-watch step is now `Step 9`. - **`--rollback` now restores `~/.claude.json`** — `createBackup`/`cmdRollback` in `src/setup.ts` archived only `settings.json`/`CLAUDE.md`/`AGENTS.md` under `~/.claude`, so the MCP config `installMcpToClaudeJson` rewrites (at `~/.claude.json`, *outside* `~/.claude`) had no rollback path. Backups are now `$HOME`-relative and include it; `cmdRollback` sniffs the archive layout so older `~/.claude`-relative backups still restore to the right place. - **`src/scripts/session-title.ts`** read the prompt from `process.env.PROMPT` only, silently no-op'ing whenever Claude Code delivered `UserPromptSubmit` data via stdin JSON (the primary path). It now reads via `readHookInput<{ session_id, prompt }>` with env fallback, matching the sibling `delegation-detector.ts` hook on the same event. diff --git a/CLAUDE-FULL.md b/CLAUDE-FULL.md index 2f4910d..4c7bc7b 100644 --- a/CLAUDE-FULL.md +++ b/CLAUDE-FULL.md @@ -36,6 +36,7 @@ The Edit tool uses exact string matching. Follow these rules: - **Code review on changes touching 3+ files** → `Agent(reviewer, "...")` - **Expert second opinions / blast-radius / "why is this here" questions** → `Agent(explore, "...")` - **Full-feature orchestration across 3+ agents** → `Agent(maestro, "...")` +- **Tasks structurally prone to single-window failure** — agentic laziness (quitting at 20 of 50 items), self-preferential bias (judging your own output), goal drift across compaction → a [dynamic workflow](https://code.claude.com/docs/en/workflows) / `/effort ultracode` (see `skills/orchestrate/SKILL.md`) > **Briefing contract for `implementer`**: as a subagent it gets only your prompt — no conversation context, none of the files you've read — so every prompt MUST contain actual content, not references: the user's ask verbatim, exact file paths and line ranges, the change to make (paste the planner output; never write "based on findings" or "according to plan"), the verification command, and a scope boundary. Thin prompts cause regressions; the agent will refuse them. It runs in the live working tree and leaves changes **uncommitted** for you to review before they land. Full contract: `agents/implementer.md` REQUIRED BRIEFING. This applies equally to `explore` → `implementer` and `planner` → `implementer` chains. diff --git a/MANUAL.md b/MANUAL.md index f5982aa..acd3c22 100644 --- a/MANUAL.md +++ b/MANUAL.md @@ -329,7 +329,7 @@ The idea (from the "Orchestration Tax"): your review throughput is the real bott Say: *"think harder"* or *"quick fix"* -Triggers `/effort` — adjusts reasoning depth. Levels: `low`, `medium`, `high`, `xhigh`, `max`. cc-settings pins **`xhigh`** as the default via `CLAUDE_CODE_EFFORT_LEVEL` — on Opus 4.8 the model's own default is `high`, and the pin overrides it to hold reasoning depth steady across model upgrades. `ultracode` is a session-only mode (`/effort ultracode`) that layers automatic [dynamic-workflow](https://code.claude.com/docs/en/workflows) orchestration on top of `xhigh`; it can't be persisted as an effort level. +Triggers `/effort` — adjusts reasoning depth. Levels: `low`, `medium`, `high`, `xhigh`, `max`. cc-settings pins **`xhigh`** as the default via `CLAUDE_CODE_EFFORT_LEVEL` — on Opus 4.8 the model's own default is `high`, and the pin overrides it to hold reasoning depth steady across model upgrades. `ultracode` is a session-only mode (`/effort ultracode`) that layers automatic [dynamic-workflow](https://code.claude.com/docs/en/workflows) orchestration on top of `xhigh`; it can't be persisted as an effort level. Workflows use more tokens than a single window — cap one by prompting a budget (e.g. _"use a workflow, budget 10k tokens"_). ### Model on AWS / Bedrock / Vertex / Foundry diff --git a/skills/nuclear-review/SKILL.md b/skills/nuclear-review/SKILL.md index e2e8ddf..0155d06 100644 --- a/skills/nuclear-review/SKILL.md +++ b/skills/nuclear-review/SKILL.md @@ -24,7 +24,7 @@ A typical sequence: `/nuclear-review` produces findings → engineers cherry-pic > **Tip (Claude Code v2.1.154+)**: run `/effort ultracode` before invoking this skill. The whole-codebase audit is the canonical shape that benefits from dynamic workflows — phase state lives in the workflow script rather than Claude's context window, individual modules can be reviewed in parallel (up to 16 concurrent), and the run can resume from cached agent results within the session. > -> A ready-made example ships at `references/nuclear-review.workflow.js` (installed to `~/.claude/skills/nuclear-review/references/`). It is **opt-in, not a dependency** — the skill above works with no Workflow tool. Run it with `Workflow({ scriptPath: "~/.claude/skills/nuclear-review/references/nuclear-review.workflow.js" })`, or copy it into `.claude/workflows/`. It maps the repo, fans out one structural reviewer per module, audits dependencies, and synthesizes one report. +> A ready-made example ships at `references/nuclear-review.workflow.js` (installed to `~/.claude/skills/nuclear-review/references/`). It is **opt-in, not a dependency** — the skill above works with no Workflow tool. Run it with `Workflow({ scriptPath: "~/.claude/skills/nuclear-review/references/nuclear-review.workflow.js" })`, or copy it into `.claude/workflows/`. It maps the repo, fans out one structural reviewer per module, audits dependencies, and synthesizes one report. Treat it as a **template** — adapt its module list, schemas, and phases to the repo at hand rather than running it verbatim. ## Scope diff --git a/skills/oracle/SKILL.md b/skills/oracle/SKILL.md index 199d916..b6644be 100644 --- a/skills/oracle/SKILL.md +++ b/skills/oracle/SKILL.md @@ -208,6 +208,8 @@ Produce the final output in ADR format. Score format: raw (weighted) where weighted = raw * weight ``` +> **When the call is close or taste-heavy, prefer pairwise judgment over absolute scores.** Absolute 1–10 scoring drifts and compresses — everything clusters at 6–8, and the "winner" can hinge on one evaluator's mood. Comparative judgment is more reliable: have evaluators judge A-vs-B head-to-head and run a small tournament (each comparison its own call). The weighted matrix above stays right for criteria-driven calls with hard trade-offs; reach for pairwise when the totals land within a few points of each other or the decision is about taste (naming, design, API feel). + > This extends the base ADR template from `agents/planner.md` with scoring matrix and detailed risk sections. #### ADR Template diff --git a/skills/orchestrate/SKILL.md b/skills/orchestrate/SKILL.md index c81d03d..df91d2f 100644 --- a/skills/orchestrate/SKILL.md +++ b/skills/orchestrate/SKILL.md @@ -55,12 +55,22 @@ Use full parallel team fan-out instead of sequential subagent delegation when: ### Alternative: dynamic workflows (research preview, v2.1.154+) -Tasks that match the **Large codebase analysis** or **wide-blast-radius migration** shapes above can alternatively be expressed as a [dynamic workflow](https://code.claude.com/docs/en/workflows) — a JavaScript orchestration script that holds plan state outside Claude's context window, supports up to 16 concurrent agents per script (1000 total), and resumes from cached agent results within a session. Two entry points: +A [dynamic workflow](https://code.claude.com/docs/en/workflows) is a JS harness that spawns subagents, holds plan state *outside* your context window, runs up to 16 agents concurrently (1000 total), and resumes from cached results within a session. The trigger isn't task *size* — it's whether the task risks one of three failure modes a single context window is prone to: -- One-shot: include the word `workflow` in your prompt (e.g. _"Run a workflow to audit all SVG icons for accessibility issues"_) — Claude generates and runs the script. -- Session-wide: `/effort ultracode` — Claude auto-orchestrates a workflow for every substantive task in the session. +- **Agentic laziness** — stopping at 20 of 50 items and declaring done. → *fan-out-and-synthesize*: one agent per item, barrier-join the results. +- **Self-preferential bias** — preferring your own output when you're also the judge. → *adversarial verification*: a separate agent refutes each finding; or a *tournament* of pairwise comparisons (more reliable than absolute scoring for ranking or taste). +- **Goal drift** — losing "don't do X" constraints across compaction. → each subagent gets a focused, isolated goal that can't drift. -The maestro `Agent()` fan-out documented above is the **default** orchestration mode in cc-settings; workflows are an option to reach for when you need replayability or scale beyond what subagent fan-out provides. Don't rewire skills to *depend* on the Workflow tool — its API is still preview-stage — but a skill may ship an *opt-in* example workflow (see `nuclear-review`'s `references/nuclear-review.workflow.js`): an example you run by choice, never a runtime dependency. +Shapes worth naming when you build one: **classify-and-act**, **fan-out-and-synthesize**, **generate-and-filter**, **tournament**, **loop-until-done** (spawn until a stop condition, not a fixed count). Not only for marathons — a **quick workflow** is valid: _"quick workflow to adversarially check this one assumption."_ + +- **Budget** — workflows burn more tokens; cap with _"…budget 10k tokens"_ and the harness enforces it. +- **Quarantine** — for triage over untrusted input, agents that read public/untrusted content must not also take privileged actions; split reading from acting so an injected page can't trigger a privileged step. + +Two entry points: +- One-shot: say _"use a workflow to …"_ or the keyword `ultracode` in your prompt (the force-keyword was renamed `workflow` → `ultracode` in v2.1.160). Pair with `/loop` for repeatable triage/verification/research. +- Session-wide: `/effort ultracode` — auto-orchestrates a workflow for every substantive task. + +The maestro `Agent()` fan-out above is the **default** in cc-settings; workflows are for replayability or scale beyond subagent fan-out. Don't rewire skills to *depend* on the Workflow tool — its API is still preview-stage — but a skill may ship an *opt-in* example (see `nuclear-review`'s `references/nuclear-review.workflow.js`): a template you adapt, never a runtime dependency. ## Output diff --git a/skills/ship/SKILL.md b/skills/ship/SKILL.md index 8f66774..2e0bc1a 100644 --- a/skills/ship/SKILL.md +++ b/skills/ship/SKILL.md @@ -141,7 +141,7 @@ Write "What this does" from the *diff and its purpose*, not by pasting commit subjects. If you can't explain it plainly, the change is unclear — say so, don't reach for bigger words. -### Step 8: Watch CI Until Green (post-push) +### Step 9: Watch CI Until Green (post-push) After `gh pr create`, watch the PR-attached checks until all pass. Use `gh pr checks` as the source of truth — it covers all PR checks, not just GitHub Actions runs (which `gh run list` would miss). From 5f4c4142d79c2a480f8ec052f3a3610ff73624e5 Mon Sep 17 00:00:00 2001 From: arzafran Date: Wed, 3 Jun 2026 11:18:02 -0300 Subject: [PATCH 2/2] refactor: remove dead TLDR session-stats telemetry pair MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit track-tldr.ts accumulated an *estimated* token-savings counter that only tldr-stats.ts read — to print a vanity box at session end, then delete the file. Closed read-loop, no external consumer; removing it loses only the end-of-session estimate display (TLDR itself is unaffected). Verified load-bearing and KEPT (not telemetry): swarm-log.ts feeds /review-batch (review-batch.ts reads ~/.claude/swarm.log); session-title.ts powers 'claude --resume '. Removes both scripts, their hook registrations (PostToolUse mcp__tldr, SessionEnd tldr-stats), the integration test, and doc rows. handoff.ts create still runs at SessionEnd. --- CHANGELOG.md | 1 + config/40-hooks.json | 13 +--------- docs/hooks-reference.md | 9 +------ hooks/README.md | 3 +-- src/scripts/tldr-stats.ts | 37 ---------------------------- src/scripts/track-tldr.ts | 47 ------------------------------------ tests/audit-hooks.test.ts | 2 +- tests/phase2-scripts.test.ts | 26 -------------------- 8 files changed, 5 insertions(+), 133 deletions(-) delete mode 100644 src/scripts/tldr-stats.ts delete mode 100644 src/scripts/track-tldr.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index 1850f28..9279309 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,6 +12,7 @@ All notable changes to cc-settings are documented here. - **Removed** — `.github/workflows/upstream-sync.yml`; `openSyncPr`/`saveManifest` and the `--open-pr` branch in `src/upstream/scan.ts` (now a pure dry-run detector, no `writeFile`/`runProcessFull` imports); stale "daily GH Action"/"bot owns this file" references in the manifest `description`/`source`, `skills/cc/SKILL.md`, `src/schemas/skill.ts`, and the `.github/workflows/ci.yml` `upstream-scan` comment. - **Kept** — `bun run upstream:scan` (the dry-run detector) and its non-gating CI job, now the manual way to spot drift before running `/cc sync`. - **Upstream sync to Claude Code 2.1.161.** Manifest bumped `2.1.160` → `2.1.161`. The release is otherwise all bug fixes / UI / perf with no cc-settings surface; the one tracked addition is `CLAUDE_CODE_TMPDIR` (surfaced by the `EADDRINUSE`/Unix-socket fix), added to the manifest `knownEnvVars` and the `docs/settings-reference.md` env-var table. `OTEL_RESOURCE_ATTRIBUTES` now feeds metric-datapoint labels but is a generic OTEL SDK var outside our CC-specific tracking convention — deliberately skipped. +- **Removed the dead TLDR session-stats telemetry pair** (`src/scripts/track-tldr.ts` + `tldr-stats.ts`, their hook registrations, tests, and doc rows). It was a closed read-loop — track-tldr accumulated an *estimated* token-savings counter that only tldr-stats read, to print a vanity box at session end. No external consumer; TLDR itself is unaffected. Verified `swarm-log.ts` (feeds `/review-batch`) and `session-title.ts` (powers `claude --resume `) are load-bearing and kept. - **Folded Anthropic's dynamic-workflows learnings into existing guidance — no new files, no new skills.** Distilled the "A harness for every task" write-up into the places that already decide *when to orchestrate*, rather than adding surface: - `skills/orchestrate/SKILL.md` — the dynamic-workflows section now leads with the **trigger** (the three single-window failure modes: agentic laziness, self-preferential bias, goal drift) instead of task type, names the canonical shapes (classify-and-act, fan-out-and-synthesize, generate-and-filter, tournament, loop-until-done), and surfaces quick workflows, token budgets, and the quarantine pattern for untrusted-input triage. - `CLAUDE-FULL.md` — failure-mode trigger added to the delegation decision tree. diff --git a/config/40-hooks.json b/config/40-hooks.json index 56f84f6..f826b6a 100644 --- a/config/40-hooks.json +++ b/config/40-hooks.json @@ -89,17 +89,6 @@ } ] }, - { - "matcher": "mcp__tldr", - "hooks": [ - { - "type": "command", - "command": "bun \"$HOME/.claude/src/scripts/track-tldr.ts\" \"$TOOL_NAME\"", - "async": true, - "timeout": 10 - } - ] - }, { "matcher": "Bash", "hooks": [ @@ -202,7 +191,7 @@ "hooks": [ { "type": "command", - "command": "bun \"$HOME/.claude/src/scripts/tldr-stats.ts\"; bun \"$HOME/.claude/src/scripts/handoff.ts\" create", + "command": "bun \"$HOME/.claude/src/scripts/handoff.ts\" create", "async": true, "timeout": 30 } diff --git a/docs/hooks-reference.md b/docs/hooks-reference.md index a4d8b92..49808f7 100644 --- a/docs/hooks-reference.md +++ b/docs/hooks-reference.md @@ -353,12 +353,6 @@ Logs are used by the `/audit` skill (`claude-audit.ts`) to analyze command patte **Retention:** Controlled by `CLAUDE_LOG_RETENTION_DAYS` env var (default: 1 day, today only). Old logs are pruned automatically on each hook fire. -### PostToolUse (mcp__tldr matcher) - -| Script | Purpose | Async | -|--------|---------|-------| -| `track-tldr.ts "$TOOL_NAME"` | Tracks TLDR MCP usage statistics | Yes | - ### PostToolUse (no matcher — every tool) | Script | Purpose | Async | @@ -411,7 +405,7 @@ Logs are used by the `/audit` skill (`claude-audit.ts`) to analyze command patte | Script | Purpose | Async | |--------|---------|-------| -| `tldr-stats.ts` + `handoff.ts create` | Prints TLDR session stats and saves final handoff state | Yes | +| `handoff.ts create` | Saves final handoff state at session end | Yes | ### CwdChanged @@ -490,7 +484,6 @@ Run a Claude Code session and trigger the relevant event. Check logs for output | `~/.claude/swarm.log` | Subagent start/stop and task created/completed events | | `~/.claude/sessions.log` | Session lifecycle events | | `~/.claude/session-titles/` | Per-session title flags (set once per session by `session-title.ts`) | -| `~/.claude/tldr-session-stats.json` | TLDR tool usage statistics per session | | `~/.claude/logs/tool-failures.log` | Tool failure events (from `post-failure.ts`) | | `~/.claude/safety-net.log` | Blocked command audit log (from `safety-net.ts`) | | `~/.claude/logs/bash-*.log` | Daily Bash command logs (from `log-bash.ts`, analyzed by `/audit`) | diff --git a/hooks/README.md b/hooks/README.md index b3c9ab5..6d859fa 100644 --- a/hooks/README.md +++ b/hooks/README.md @@ -20,7 +20,6 @@ For full hook documentation (all 29 events, configuration format, matchers, debu | `PreToolUse` | Before Edit calls (Edit matcher) | `pre-edit-validate.ts` (harness optimization) | No | | `PostToolUse` | After Write/Edit | `post-edit.ts` (auto-format with Biome) | No | | `PostToolUse` | After Write/Edit | `post-edit-tsc.ts` (async TypeScript type check) | **Yes** | -| `PostToolUse` | After TLDR MCP calls | `track-tldr.ts` (usage stats) | **Yes** | | `PostToolUse` | After Bash commands | `log-bash.ts` (command audit log) | **Yes** | | `PostToolUse` | After every tool (no matcher) | `parallelmax-nudge.ts` (counts consecutive non-Agent calls; at N=8 injects delegation reminder; resets on Agent tool; 60s debounce) | No | | `PostToolUseFailure` | Tool execution fails | `post-failure.ts` (logs failures, warns on repeats) | No | @@ -28,7 +27,7 @@ For full hook documentation (all 29 events, configuration format, matchers, debu | `PostCompact` | After context compaction | `post-compact.ts` (recovery steps reminder) | No | | `Stop` | Claude finishes | `stop-summary.ts` (learning reminder if >5 files changed) | No | | `StopFailure` | Turn ends due to API error | `stop-failure.ts` (logs, surfaces rate limit info) | No | -| `SessionEnd` | Session ending | `tldr-stats.ts` + `handoff.ts create` | **Yes** | +| `SessionEnd` | Session ending | `handoff.ts create` | **Yes** | | `SubagentStart` | Subagent spawns | Logs to `~/.claude/swarm.log` | **Yes** | | `SubagentStop` | Subagent finishes | Logs to `~/.claude/swarm.log` | **Yes** | | `Notification` | Task completion | `notify.ts` (macOS/Linux notification) | **Yes** | diff --git a/src/scripts/tldr-stats.ts b/src/scripts/tldr-stats.ts deleted file mode 100644 index ef2ea16..0000000 --- a/src/scripts/tldr-stats.ts +++ /dev/null @@ -1,37 +0,0 @@ -#!/usr/bin/env bun -// SessionEnd hook companion — display + reset TLDR token-saving stats. -// Port of scripts/tldr-stats.sh. - -import { unlink } from "node:fs/promises"; -import { homedir } from "node:os"; -import { join } from "node:path"; - -const STATS_FILE = join(homedir(), ".claude", "tldr-session-stats.json"); - -const f = Bun.file(STATS_FILE); -if (!(await f.exists())) process.exit(0); - -let calls = 0; -let tokens = 0; -try { - const raw = (await f.json()) as { calls?: number; tokens_saved?: number }; - calls = typeof raw.calls === "number" ? raw.calls : 0; - tokens = typeof raw.tokens_saved === "number" ? raw.tokens_saved : 0; -} catch { - // Malformed file — same as bash `// 0` fallback. -} - -if (calls > 0) { - // Parity with the bash-drawn box. printf %-39s ensures consistent width. - const pad = (s: string): string => s + " ".repeat(Math.max(0, 39 - s.length)); - console.log(""); - console.log("┌─────────────────────────────────────────┐"); - console.log("│ 📊 TLDR Session Stats │"); - console.log("├─────────────────────────────────────────┤"); - console.log(`│ ${pad(`Calls: ${calls}`)} │`); - console.log(`│ ${pad(`Est. tokens saved: ~${tokens}`)} │`); - console.log("└─────────────────────────────────────────┘"); -} - -// Reset for next session. -await unlink(STATS_FILE).catch(() => {}); diff --git a/src/scripts/track-tldr.ts b/src/scripts/track-tldr.ts deleted file mode 100644 index 365929d..0000000 --- a/src/scripts/track-tldr.ts +++ /dev/null @@ -1,47 +0,0 @@ -#!/usr/bin/env bun -// PostToolUse hook for mcp__tldr tools — increment session stats counter. -// Port of scripts/track-tldr.sh. Receives the tool name as argv[0]. -// -// State file: ~/.claude/tldr-session-stats.json (atomic write via tmp + rename). - -import { mkdir } from "node:fs/promises"; -import { homedir } from "node:os"; -import { dirname, join } from "node:path"; -import { atomicWriteString } from "../lib/json-io.ts"; - -const STATS_FILE = join(homedir(), ".claude", "tldr-session-stats.json"); -const toolName = process.argv[2] ?? "unknown"; - -// Same savings table as scripts/track-tldr.sh. -function savingsFor(name: string): number { - if (name.includes("context")) return 500; - if (name.includes("semantic")) return 1000; - if (name.includes("impact") || name.includes("slice")) return 800; - if (name.includes("arch")) return 600; - return 300; -} - -type Stats = { calls: number; tokens_saved: number }; - -async function loadStats(): Promise { - const f = Bun.file(STATS_FILE); - if (!(await f.exists())) return { calls: 0, tokens_saved: 0 }; - try { - const raw = (await f.json()) as Partial; - return { - calls: typeof raw.calls === "number" ? raw.calls : 0, - tokens_saved: typeof raw.tokens_saved === "number" ? raw.tokens_saved : 0, - }; - } catch { - return { calls: 0, tokens_saved: 0 }; - } -} - -const current = await loadStats(); -const next: Stats = { - calls: current.calls + 1, - tokens_saved: current.tokens_saved + savingsFor(toolName), -}; - -await mkdir(dirname(STATS_FILE), { recursive: true }); -await atomicWriteString(STATS_FILE, `${JSON.stringify(next)}\n`); diff --git a/tests/audit-hooks.test.ts b/tests/audit-hooks.test.ts index 56ea12e..eb71579 100644 --- a/tests/audit-hooks.test.ts +++ b/tests/audit-hooks.test.ts @@ -35,7 +35,7 @@ describe("classifyHookCommand — trusted patterns", () => { test("compound of two trusted commands", () => { const r = classifyHookCommand( - 'bun "$HOME/.claude/src/scripts/tldr-stats.ts"; bun "$HOME/.claude/src/scripts/handoff.ts" create', + 'bun "$HOME/.claude/src/scripts/session-start.ts"; bun "$HOME/.claude/src/scripts/handoff.ts" create', ); expect(r.severity).toBe("trusted"); }); diff --git a/tests/phase2-scripts.test.ts b/tests/phase2-scripts.test.ts index 9538234..eb3ac27 100644 --- a/tests/phase2-scripts.test.ts +++ b/tests/phase2-scripts.test.ts @@ -189,32 +189,6 @@ describe("post-edit.ts", () => { }); }); -describe("track-tldr.ts + tldr-stats.ts", () => { - test("track increments a stats file, stats reads + clears it", async () => { - const { mkdtempSync, rmSync } = await import("node:fs"); - const { tmpdir } = await import("node:os"); - const { join } = await import("node:path"); - const sandbox = mkdtempSync(join(tmpdir(), "cc-tldr-test-")); - try { - // We can't easily redirect HOME per spawn — just call the scripts in sequence - // with HOME overridden. - const env = { HOME: sandbox }; - const rTrack = await run("track-tldr.ts", { env, args: ["mcp__tldr__semantic"] }); - expect(rTrack.exit).toBe(0); - const rStats = await run("tldr-stats.ts", { env }); - expect(rStats.exit).toBe(0); - expect(rStats.stdout).toContain("Calls: 1"); - expect(rStats.stdout).toContain("1000"); // semantic = 1000 saved - // After stats runs, file should be deleted. - const rStats2 = await run("tldr-stats.ts", { env }); - expect(rStats2.exit).toBe(0); - expect(rStats2.stdout).toBe(""); - } finally { - rmSync(sandbox, { recursive: true, force: true }); - } - }); -}); - describe("log-bash.ts", () => { test("logs a bash command line to dated file", async () => { const { mkdtempSync, rmSync, readdirSync, readFileSync } = await import("node:fs");