Skip to content

routine: opus-4-8-effort (2026-05-28)#20

Draft
jim4226 wants to merge 1 commit into
mainfrom
claude/daily-2026-05-28-opus-4-8-effort
Draft

routine: opus-4-8-effort (2026-05-28)#20
jim4226 wants to merge 1 commit into
mainfrom
claude/daily-2026-05-28-opus-4-8-effort

Conversation

@jim4226
Copy link
Copy Markdown
Owner

@jim4226 jim4226 commented May 28, 2026

Summary

Upgrades CSIS's builder checkpoint from claude-opus-4-7 to claude-opus-4-8 and exposes the new effort parameter so the Coordinator can tune reasoning depth per-call.

Source

  • URL: https://www.anthropic.com/news/claude-opus-4-8
  • Published: 2026-05-28
  • Key quote: "Opus 4.8 is roughly four times less likely than Opus 4.7 to overlook code flaws, actively flagging uncertainties about its work rather than making unsupported claims."

Theme

Theme 6 — Substrate / capability boundaries (also Theme 2 — Trust + verification). CSIS's Builder and Researcher agents run on the alpha checkpoint. Upgrading to a model that is 4× less likely to miss code flaws directly improves artifact quality before it reaches the Verifier. The effort parameter maps to CSIS's existing cost-vs-thoroughness design axis (Theme 5: self-improvement loops) — the same axis that informed CriticEffortLevel in PR #13.

What changed

File Change
csis/backends/anthropic.py _DEFAULT_MODEL_MAP "alpha"/"mock-alpha": claude-opus-4-7claude-opus-4-8; complete() passes effort to API when LLMRequest.effort is non-None
csis/backends/base.py LLMRequest gains `effort: str
tests/test_backends.py 7 new regression tests: effort field defaults, round-trips, model-map correctness, API pass-through when set, API omission when None

No cycle-9 chokepoints touched

_DEFAULT_MODEL_MAP and LLMRequest are leaf declarations. Coordinator.__init__, _BackendTracker, writer_iteration_id, and the promotion CAS are all untouched. The default effort=None means existing code paths pass no effort argument to the API — model's own default ("high" for Opus 4.8) applies, so runtime behaviour is identical to the previous model version until a caller explicitly sets effort.

Test plan

python -m pytest tests/test_backends.py -v   # 7 passed (all new)
python -m pytest tests/ -q                   # 257 passed, 0 failed

Generated by Claude Code

Claude Opus 4.8 (released 2026-05-28) is 4× less likely to overlook
code flaws than 4.7 — a direct quality gain for CSIS's Builder step.
It also introduces a per-call `effort` parameter (low/medium/high)
that the Coordinator can use to trade latency for reasoning depth.

Changes:
- _DEFAULT_MODEL_MAP "alpha"/"mock-alpha": 4-7 → 4-8
- LLMRequest gains `effort: str | None = None` (no change to callers)
- AnthropicBackend.complete() passes `effort` to the API iff non-None
- 7 regression tests (model-map, field round-trip, API pass-through)

https://claude.ai/code/session_019J1NTixfHK2kKzNVH5mwnF
This was referenced May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants