Phase 5: Pattern detection and insights#13
Conversation
- pyproject.toml with FastMCP, uvicorn, and dev dependencies - Makefile with check, fmt, lint, test, install, uninstall targets - LaunchAgent plist and install/uninstall scripts for auto-start - dev.sh script for development mode with auto-reload - Basic FastMCP server with placeholder tools: - get_status: Returns server status - ingest_logs: Placeholder for log ingestion - query_tool_frequency: Placeholder for frequency queries - Usage guide as MCP resource at session-analytics://guide - Tests for the placeholder tools - README with installation and usage instructions Server runs on port 8081 (to not conflict with event-bus on 8080). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- storage.py with SQLiteStorage class: - Events table with denormalized fields for fast queries - Sessions table for session metadata - Ingestion state tracking for incremental updates - Patterns table for pre-computed insights - Indexes on timestamp, session_id, tool_name, project_path - Data classes: Event, Session, IngestionState, Pattern - CRUD operations for all entities with batch insert support - get_db_stats() for monitoring database health - Updated server.py to use storage for get_status() - Comprehensive test suite (16 tests) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements log file discovery and parsing: - find_log_files(): Discovers JSONL files within date range - parse_tool_use(): Extracts tool info (command, file_path, skill_name) - parse_entry(): Parses entries into Event objects - ingest_file(): Incremental ingestion with mtime/size tracking - ingest_logs(): Full ingestion orchestration - update_session_stats(): Aggregates session statistics Integrates with server.py to provide real data for ingest_logs tool. Closes #3 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements all query MCP tools: - query_tool_frequency: Tool usage counts with project filter - query_timeline: Events in time window with filtering - query_commands: Bash command breakdown with prefix filter - query_sessions: Session metadata and token totals - query_tokens: Token usage grouped by day/session/model Also adds: - ensure_fresh_data(): Auto-refresh mechanism (5 min staleness) - Comprehensive tests for all queries (18 new tests) Closes #4 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements pattern detection for /improve-workflow integration: - compute_tool_frequency_patterns(): Tool usage frequency - compute_command_patterns(): Bash command frequency - compute_sequence_patterns(): Tool n-gram detection - compute_permission_gaps(): Commands not in settings.json - get_insights(): Unified insights API for /improve-workflow New MCP tools: - query_sequences: Common tool patterns - query_permission_gaps: Commands needing settings.json - get_insights: Pre-computed patterns Adds 16 new tests (69 total). Closes #5 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Code Review: Phase 5 - Pattern Detection and InsightsThank you for this comprehensive implementation! This PR adds substantial value to the session analytics MCP server. Here's my detailed review: ✅ StrengthsArchitecture & Design
Code Quality
🔍 Issues Found🐛 Bug: SQL Injection Vulnerability (CRITICAL)Location: where_clause = " AND ".join(conditions)
rows = conn.execute(
f"""
SELECT tool_name, COUNT(*) as count
FROM events
WHERE {where_clause} # ⚠️ Dangerous string interpolation
GROUP BY tool_name
ORDER BY count DESC
""",
params,
).fetchall()Problem: Using f-string interpolation to build SQL queries can lead to SQL injection if Recommendation: Build the WHERE clause more safely: # Better approach - still use parameterized queries
base_query = """
SELECT tool_name, COUNT(*) as count
FROM events
WHERE timestamp >= ?
"""
if project:
base_query += " AND project_path LIKE ?"
base_query += """
GROUP BY tool_name
ORDER BY count DESC
"""🐛 Edge Case: Sequence Detection Doesn't Process Last SessionLocation: The loop processes sessions as they change, but the final session in the dataset is handled outside the loop. This works, but if the last session is empty or has fewer than Recommendation: Add debug logging for edge cases: # Process last session
if len(session_tools) >= sequence_length:
for i in range(len(session_tools) - sequence_length + 1):
seq = tuple(session_tools[i : i + sequence_length])
sequences[seq] += 1
elif session_tools:
logger.debug(f"Session {current_session} has only {len(session_tools)} tools, skipping sequence detection")
|
Summary
Implements pattern detection for /improve-workflow integration:
compute_tool_frequency_patterns()- Tool usage frequency analysiscompute_command_patterns()- Bash command frequencycompute_sequence_patterns()- Tool n-gram detection (e.g., "Read → Edit")compute_permission_gaps()- Commands frequently used but not in settings.jsonget_insights()- Unified insights API for /improve-workflowNew MCP tools:
query_sequences- Find common tool patterns/sequencesquery_permission_gaps- Identify commands needing settings.json entriesget_insights- Get pre-computed patterns for workflow improvementTest plan
make check- all 69 tests passCloses #5
🤖 Generated with Claude Code