Add thinking-level benchmarking support (off/minimal/low/medium/high/xhigh/adaptive)#76
Open
ForceConstant wants to merge 5 commits intopinchbench:mainfrom
Open
Add thinking-level benchmarking support (off/minimal/low/medium/high/xhigh/adaptive)#76ForceConstant wants to merge 5 commits intopinchbench:mainfrom
ForceConstant wants to merge 5 commits intopinchbench:mainfrom
Conversation
- Add --thinking CLI argument to specify comma-separated thinking levels - Pass thinking level to OpenClaw agent via --thinking flag - Run each task across all specified thinking levels - Include thinking_level in task results - Add thinking_aggregates section with per-level statistics - Support levels: off, minimal, low, medium, high - Update SKILL.md and README.md with documentation Closes pinchbench#9
- Add xhigh and adaptive to valid thinking levels (matching OpenClaw) - Add model-aware xhigh validation (only GPT-5.x models support it) - Validate thinking levels before passing to OpenClaw subprocess - Document model-specific restrictions in help text and docs - Follow existing code style (Optional[str] instead of str | None) - No unnecessary changes to existing code
- Add strict xhigh model matching (provider-aware) - Add adaptive support detection (Anthropic Claude 4.6 family) - Deduplicate requested thinking levels while preserving order - Fail fast when --thinking is provided but no valid levels remain - Keep subprocess input constrained to validated levels
|
That's why I didn't do the rebase I uninstalled PinchBench so I couldn't test :D. It was tested and working when I originally PR'd it. Dealing with other OpenClaw issues at the momment so can't test, hope someone can. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a rebase of #12 by @jb510
Note I haven't gotten to test this out yet, but it does seem valid as I am not currently setup to run.