Skip to content

routine: critic-effort-levels (2026-05-25)#13

Draft
jim4226 wants to merge 1 commit into
mainfrom
claude/daily-2026-05-25-critic-effort-levels
Draft

routine: critic-effort-levels (2026-05-25)#13
jim4226 wants to merge 1 commit into
mainfrom
claude/daily-2026-05-25-critic-effort-levels

Conversation

@jim4226
Copy link
Copy Markdown
Owner

@jim4226 jim4226 commented May 25, 2026

Summary

Adds CriticEffortLevel (low/medium/high/max) to csis/verification/critic_stack.py so the Coordinator can request a cheap sanity-check critic inside tight per-iteration budgets and an exhaustive pre-promotion critic at the promotion gate — without hard-coding token counts at every call site.

Source

  • URL: https://code.claude.com/docs/en/changelog (v2.1.147, 2026-05-21)
  • Key entry: "Renamed /simplify to /code-review with effort levels (e.g., /code-review high)" — parameterised review depth is now a first-class concept in Claude Code tooling.

Theme

Theme 5 — Self-improvement loops. The Critic is the adversarial evaluation gate in CSIS's critique-fix cycle. Currently run_critic() has a single min_attempts knob; callers must pick a number without knowing what the cost-vs-thoroughness trade-off looks like. Effort levels make that trade-off explicit and controlled by the Coordinator, which is the right abstraction layer.

What changed

  • csis/verification/critic_stack.py: CriticEffortLevel enum (low/medium/high/max); frozen _EffortParams(min_attempts, max_tokens); EFFORT_PARAMS dict; run_critic() gains effort: CriticEffortLevel = CriticEffortLevel.medium keyword arg. Callers that pass min_attempts explicitly keep that value (overrides effort-derived count for backward compat).
  • tests/test_verification.py: four new tests — ordering invariant, prompt-captures-count, explicit-override, medium-matches-historical-baseline.

No cycle-9 chokepoints touched

The single chokepoint is run_critic() — a leaf function. Coordinator.__init__, _BackendTracker, writer_iteration_id, and the promotion CAS are untouched. The default (CriticEffortLevel.medium) maps to min_attempts=3, max_tokens=2000, exactly the Phase-0 baseline, so all existing callers are unmodified.

Test plan

python -m pytest tests/test_verification.py -v  # 16 passed (12 before + 4 new)
python -m pytest tests/ -q                       # 254 passed, 0 failed

Generated by Claude Code

Claude Code v2.1.147 (2026-05-21) renamed /simplify to /code-review and
introduced effort tiers (low/medium/high/max). The same parameterisation
belongs in CSIS's Critic: the Coordinator needs to call a cheap critic
inside tight per-iteration budgets and an exhaustive one at promotion
gates, without hard-coding token counts at every call site.

Add CriticEffortLevel (low/medium/high/max), _EffortParams (min_attempts
+ max_tokens), and EFFORT_PARAMS mapping. run_critic() gains an `effort`
kwarg (default: medium, preserving the Phase-0 baseline of 3 attempts /
2 000 tokens). Callers that pass `min_attempts` explicitly keep that
value; it overrides the effort-derived count for backward compat.

Adds four regression tests: ordering invariant (higher effort → more
attempts + tokens), prompt captures the correct count, explicit override
works, and medium matches the historical baseline.

https://claude.ai/code/session_01PnhitjmmouJzNbN5zwxVfU
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants