Skip to content

Add thinking-level benchmarking support (off/minimal/low/medium/high/xhigh/adaptive)#76

Open
ForceConstant wants to merge 5 commits intopinchbench:mainfrom
ForceConstant:continue-thinking-levels
Open

Add thinking-level benchmarking support (off/minimal/low/medium/high/xhigh/adaptive)#76
ForceConstant wants to merge 5 commits intopinchbench:mainfrom
ForceConstant:continue-thinking-levels

Conversation

@ForceConstant
Copy link
Copy Markdown

This is a rebase of #12 by @jb510

Note I haven't gotten to test this out yet, but it does seem valid as I am not currently setup to run.

OpenClaw Agent and others added 5 commits March 25, 2026 17:12
- Add --thinking CLI argument to specify comma-separated thinking levels
- Pass thinking level to OpenClaw agent via --thinking flag
- Run each task across all specified thinking levels
- Include thinking_level in task results
- Add thinking_aggregates section with per-level statistics
- Support levels: off, minimal, low, medium, high
- Update SKILL.md and README.md with documentation

Closes pinchbench#9
- Add xhigh and adaptive to valid thinking levels (matching OpenClaw)
- Add model-aware xhigh validation (only GPT-5.x models support it)
- Validate thinking levels before passing to OpenClaw subprocess
- Document model-specific restrictions in help text and docs
- Follow existing code style (Optional[str] instead of str | None)
- No unnecessary changes to existing code
- Add strict xhigh model matching (provider-aware)
- Add adaptive support detection (Anthropic Claude 4.6 family)
- Deduplicate requested thinking levels while preserving order
- Fail fast when --thinking is provided but no valid levels remain
- Keep subprocess input constrained to validated levels
@jb510
Copy link
Copy Markdown

jb510 commented Mar 25, 2026

That's why I didn't do the rebase I uninstalled PinchBench so I couldn't test :D. It was tested and working when I originally PR'd it. Dealing with other OpenClaw issues at the momment so can't test, hope someone can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants