feat: Add benchmark CLI command for MCP tool timing (closes #63) by evansenter · Pull Request #67 · evansenter/agent-session-analytics

evansenter · 2026-01-08T22:37:05Z

Summary

Adds session-analytics-cli benchmark command to measure response times for all 26 MCP tools
Calculates timing statistics (median, p95, p99) and flags slow queries (>5s)
Updates CLAUDE.md to remind developers to add new tools to benchmark infrastructure

Test plan

session-analytics-cli benchmark runs successfully against production DB
All 303 tests pass including new benchmark tests
make check passes (format, lint, tests)

Output example

Benchmark Results (3 iterations per tool)
Total tools: 26
Slow tools (>5s): 0

TOOL                                MEDIAN        P95        P99     STATUS
-----------------------------------------------------------------------------
get_insights                        1.234s     1.456s     1.456s        OK
analyze_trends                      0.145s     0.234s     0.234s        OK
get_tool_frequency                  0.045s     0.048s     0.048s        OK

🤖 Generated with Claude Code

Adds `session-analytics-cli benchmark` to measure response times for all MCP tools against the production database. Calculates median, p95, p99 statistics and flags slow queries (>5s). - Add cmd_benchmark() with 26 tool functions - Add benchmark formatter for table output - Add tests for command and formatter - Update docs to remind adding new tools to benchmark 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2026-01-08T22:38:53Z

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR adds a session-analytics-cli benchmark command that measures response times for 26 MCP tools, calculating median/p95/p99 statistics and flagging slow queries (>5s). It includes proper documentation updates and unit tests.

Issues Found

Critical

None

Important

None

Suggestions

cli.py:564 - The error formatting line has inconsistent column widths. When there's an error, the status column shows the error message instead of using a dedicated error display. Consider using {err_msg:>10} for consistency with the status column, or showing "ERROR" in the status column and the full error message on a separate line.
cli.py:1036-1038 - The p95/p99 percentile calculation with 3 iterations will always return the same value (the max). With n=3, int(3 * 0.95) = 2 and int(3 * 0.99) = 2, so both p95 and p99 will be times[2] (the max). Consider documenting this limitation or recommending a higher --iterations value (e.g., 10-20) for meaningful percentile statistics.

Verdict

REQUEST_CHANGES - Found suggestions for improving error display formatting and documenting percentile calculation limitations with low iteration counts.

Automated review by Claude Code

Address reviewer feedback about p95/p99 needing more iterations for meaningful percentile statistics. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

evansenter · 2026-01-08T22:45:23Z

Feedback Addressed

Implemented

[Suggestion] Added help text note for --iterations: "use 10+ for meaningful p95/p99"

Skipped

[Suggestion] Error display formatting - Current inline display is readable and shows error directly where users would look

claude · 2026-01-08T22:46:36Z

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR adds a session-analytics-cli benchmark command that measures response times for 26 MCP tools with configurable iterations, calculating median/p95/p99 statistics and flagging slow queries (>5s). The implementation includes comprehensive tests and documentation updates.

Previous Feedback Status

Both suggestions from the previous review have been addressed:

Error display formatting - Skipped with valid reasoning (inline display is readable)
p95/p99 iteration note - Implemented (help text now says "use 10+ for meaningful p95/p99")

Issues Found

Critical

None

Important

None

Suggestions

None

Verdict

APPROVE - Code is well-structured, follows project conventions, includes tests, and addresses all requirements from issue #63. Previous feedback has been appropriately addressed.

Automated review by Claude Code

docs: Add iterations help note for p95/p99 percentiles

f0415ac

Address reviewer feedback about p95/p99 needing more iterations for meaningful percentile statistics. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

evansenter merged commit be80597 into main Jan 8, 2026
3 checks passed

evansenter deleted the issue-63-benchmark branch January 8, 2026 22:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add benchmark CLI command for MCP tool timing (closes #63)#67

feat: Add benchmark CLI command for MCP tool timing (closes #63)#67
evansenter merged 2 commits into
mainfrom
issue-63-benchmark

evansenter commented Jan 8, 2026

Uh oh!

claude Bot commented Jan 8, 2026

Uh oh!

evansenter commented Jan 8, 2026

Uh oh!

claude Bot commented Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

evansenter commented Jan 8, 2026

Summary

Test plan

Output example

Uh oh!

claude Bot commented Jan 8, 2026

Code Review

Summary

Issues Found

Critical

Important

Suggestions

Verdict

Uh oh!

evansenter commented Jan 8, 2026

Feedback Addressed

Implemented

Skipped

Uh oh!

claude Bot commented Jan 8, 2026

Code Review

Summary

Previous Feedback Status

Issues Found

Critical

Important

Suggestions

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant