Summary
analyze_failures() reports error counts by tool (e.g., "gh: 268 errors") but doesn't capture WHY commands failed. Adding exit codes and error messages would enable root cause analysis.
Problem
Current output:
errors_by_tool: [
{"tool": "gh", "count": 268},
{"tool": "cargo", "count": 140},
{"tool": "make", "count": 128}
]
Can't answer: Are the gh errors rate limits? Authentication failures? Network issues? PR conflicts?
Proposed Solution
1. Capture error details during ingestion
The Bash tool results likely include exit codes and stderr. Store these:
ALTER TABLE events ADD COLUMN exit_code INTEGER;
ALTER TABLE events ADD COLUMN error_message TEXT; -- First N chars of stderr
2. Extend analyze_failures() output
def analyze_failures(...) -> dict:
return {
"errors_by_tool": [...],
"errors_by_exit_code": [
{"exit_code": 1, "count": 500},
{"exit_code": 128, "count": 50}, # Git errors
...
],
"error_samples": [
{
"tool": "gh",
"command": "gh pr create",
"exit_code": 1,
"error_snippet": "GraphQL: Could not resolve to a Repository",
"timestamp": "2026-01-06T10:00:00"
},
...
]
}
3. Add drill-down query
def get_error_details(
tool: str | None = None,
exit_code: int | None = None,
days: int = 7,
limit: int = 20
) -> dict:
"""Get specific error instances for root cause analysis."""
Open Questions
- What's available in Bash tool results? Need to check JSONL structure for error info.
- How much stderr to store? First 500 chars? Configurable?
- Should we categorize common error patterns (rate limit, auth, network)?
Priority
Low - the current error counts are useful for identifying problem areas. Drill-down is a nice-to-have for root cause analysis.
Related
Summary
analyze_failures()reports error counts by tool (e.g., "gh: 268 errors") but doesn't capture WHY commands failed. Adding exit codes and error messages would enable root cause analysis.Problem
Current output:
Can't answer: Are the
gherrors rate limits? Authentication failures? Network issues? PR conflicts?Proposed Solution
1. Capture error details during ingestion
The Bash tool results likely include exit codes and stderr. Store these:
2. Extend analyze_failures() output
3. Add drill-down query
Open Questions
Priority
Low - the current error counts are useful for identifying problem areas. Drill-down is a nice-to-have for root cause analysis.
Related