Skip to content

feat: expand HomeSec-Bench to 143 tests, add perf metrics, enable ski…#163

Merged
solderzzc merged 2 commits intodevelopfrom
feature/benchmark-thinking-mode-fix
Mar 18, 2026
Merged

feat: expand HomeSec-Bench to 143 tests, add perf metrics, enable ski…#163
solderzzc merged 2 commits intodevelopfrom
feature/benchmark-thinking-mode-fix

Conversation

@solderzzc
Copy link
Copy Markdown
Member

…ll auto-start

  • Update benchmark paper: 131→143 tests (VLM Scene 35→47, 3 new dedup scenarios, 4 new tool-use scenarios)
  • Add performance metrics to run-benchmark.cjs (TTFT, decode throughput tracking)
  • Fix tool_call argument serialization for non-string arguments
  • Enable auto_start for yolo-detection-2026 and depth-estimation skills
  • Add LaTeX build artifacts .gitignore

solderzzc and others added 2 commits March 17, 2026 20:52
…ll auto-start

- Update benchmark paper: 131→143 tests (VLM Scene 35→47, 3 new dedup scenarios, 4 new tool-use scenarios)
- Add performance metrics to run-benchmark.cjs (TTFT, decode throughput tracking)
- Fix tool_call argument serialization for non-string arguments
- Enable auto_start for yolo-detection-2026 and depth-estimation skills
- Add LaTeX build artifacts .gitignore
@solderzzc solderzzc merged commit 7d117e9 into develop Mar 18, 2026
1 check passed
@solderzzc solderzzc deleted the feature/benchmark-thinking-mode-fix branch March 18, 2026 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant