Skip to content

feat: Add benchmark CLI command for MCP tool timing (closes #63)#67

Merged
evansenter merged 2 commits into
mainfrom
issue-63-benchmark
Jan 8, 2026
Merged

feat: Add benchmark CLI command for MCP tool timing (closes #63)#67
evansenter merged 2 commits into
mainfrom
issue-63-benchmark

Conversation

@evansenter

Copy link
Copy Markdown
Owner

Summary

  • Adds session-analytics-cli benchmark command to measure response times for all 26 MCP tools
  • Calculates timing statistics (median, p95, p99) and flags slow queries (>5s)
  • Updates CLAUDE.md to remind developers to add new tools to benchmark infrastructure

Test plan

  • session-analytics-cli benchmark runs successfully against production DB
  • All 303 tests pass including new benchmark tests
  • make check passes (format, lint, tests)

Output example

Benchmark Results (3 iterations per tool)
Total tools: 26
Slow tools (>5s): 0

TOOL                                MEDIAN        P95        P99     STATUS
-----------------------------------------------------------------------------
get_insights                        1.234s     1.456s     1.456s        OK
analyze_trends                      0.145s     0.234s     0.234s        OK
get_tool_frequency                  0.045s     0.048s     0.048s        OK

🤖 Generated with Claude Code

Adds `session-analytics-cli benchmark` to measure response times for all
MCP tools against the production database. Calculates median, p95, p99
statistics and flags slow queries (>5s).

- Add cmd_benchmark() with 26 tool functions
- Add benchmark formatter for table output
- Add tests for command and formatter
- Update docs to remind adding new tools to benchmark

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude

claude Bot commented Jan 8, 2026

Copy link
Copy Markdown

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR adds a session-analytics-cli benchmark command that measures response times for 26 MCP tools, calculating median/p95/p99 statistics and flagging slow queries (>5s). It includes proper documentation updates and unit tests.

Issues Found

Critical

None

Important

None

Suggestions

  • cli.py:564 - The error formatting line has inconsistent column widths. When there's an error, the status column shows the error message instead of using a dedicated error display. Consider using {err_msg:>10} for consistency with the status column, or showing "ERROR" in the status column and the full error message on a separate line.

  • cli.py:1036-1038 - The p95/p99 percentile calculation with 3 iterations will always return the same value (the max). With n=3, int(3 * 0.95) = 2 and int(3 * 0.99) = 2, so both p95 and p99 will be times[2] (the max). Consider documenting this limitation or recommending a higher --iterations value (e.g., 10-20) for meaningful percentile statistics.

Verdict

REQUEST_CHANGES - Found suggestions for improving error display formatting and documenting percentile calculation limitations with low iteration counts.


Automated review by Claude Code

Address reviewer feedback about p95/p99 needing more iterations
for meaningful percentile statistics.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@evansenter

Copy link
Copy Markdown
Owner Author

Feedback Addressed

Implemented

  • [Suggestion] Added help text note for --iterations: "use 10+ for meaningful p95/p99"

Skipped

  • [Suggestion] Error display formatting - Current inline display is readable and shows error directly where users would look

@claude

claude Bot commented Jan 8, 2026

Copy link
Copy Markdown

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR adds a session-analytics-cli benchmark command that measures response times for 26 MCP tools with configurable iterations, calculating median/p95/p99 statistics and flagging slow queries (>5s). The implementation includes comprehensive tests and documentation updates.

Previous Feedback Status

Both suggestions from the previous review have been addressed:

  • Error display formatting - Skipped with valid reasoning (inline display is readable)
  • p95/p99 iteration note - Implemented (help text now says "use 10+ for meaningful p95/p99")

Issues Found

Critical

None

Important

None

Suggestions

None

Verdict

APPROVE - Code is well-structured, follows project conventions, includes tests, and addresses all requirements from issue #63. Previous feedback has been appropriately addressed.


Automated review by Claude Code

@evansenter evansenter merged commit be80597 into main Jan 8, 2026
3 checks passed
@evansenter evansenter deleted the issue-63-benchmark branch January 8, 2026 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant