feat: Close drill-down gaps in MCP API (RFC #49)#52
Conversation
Self-play testing revealed that LLMs could see aggregate counts but couldn't drill down to actionable specifics. This closes those gaps: - **error_examples in analyze_failures**: When errors_by_tool shows "Bash: 5 errors", error_examples now reveals WHICH commands failed (top 5 per tool with counts) - **classification_factors in classify_sessions**: Sessions now include the trigger threshold and relevant metrics explaining WHY they were categorized (e.g., "error_rate > 15%", error_rate: 33.2%) - **fnmatch for permission_gaps**: Patterns like Bash(make*) now correctly match commands using glob patterns, not just exact matches - **Clearer sample_sequences error**: Message now says "must be alphanumeric or underscores" (was misleading about underscores) Closes #45, #46, #48, #49 Supersedes #47 (was already working via tool_id joins) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Code ReviewSummaryThis PR closes drill-down gaps in the MCP API by adding Issues FoundCriticalNone ImportantNone Suggestions
VerdictREQUEST_CHANGES - The implementation is solid and addresses the linked issues well. The suggestions above are minor improvements for code clarity and edge case handling. Automated review by Claude Code |
Address reviewer feedback: use timedelta(minutes=59) instead of timedelta(hours=1, minutes=-1) for clarity in test timestamps. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Feedback AddressedImplemented
Skipped
|
Code ReviewSummaryThis PR closes drill-down gaps in the MCP API by adding Issues FoundCriticalNone ImportantNone SuggestionsNone Previous review feedback was properly addressed:
VerdictAPPROVE - The implementation is solid, tests are comprehensive, documentation is updated, and all previous feedback has been appropriately addressed. The PR successfully closes the drill-down gaps identified in RFC #49. Automated review by Claude Code |
Summary
Self-play testing revealed that LLMs could see aggregate counts but couldn't drill down to actionable specifics. This PR closes those gaps with 4 targeted fixes:
errors_by_toolshows "Bash: 5 errors",error_examplesnow reveals WHICH commands failed (top 5 per tool with counts)Bash(make*)now work correctlyTest plan
error_examplesintest_patterns.pyclassification_factorsintest_queries.pymake checkpasses (fmt, lint, test)Closes #45, #46, #48, #49
Supersedes #47 (was already working via tool_id joins)
🤖 Generated with Claude Code