Skip to content

Phase 4: Query tools implementation#12

Closed
evansenter wants to merge 4 commits into
mainfrom
phase-4-queries
Closed

Phase 4: Query tools implementation#12
evansenter wants to merge 4 commits into
mainfrom
phase-4-queries

Conversation

@evansenter

Copy link
Copy Markdown
Owner

Summary

Implements all query MCP tools for session analytics:

  • query_tool_frequency - Tool usage counts with project filter
  • query_timeline - Events in time window with filtering by tool/project
  • query_commands - Bash command breakdown with prefix filter
  • query_sessions - Session metadata with token totals
  • query_tokens - Token usage grouped by day/session/model

Also adds:

  • ensure_fresh_data() - Auto-refresh mechanism that checks data staleness (>5 min old) and refreshes transparently before queries
  • Comprehensive tests for all queries (18 new tests, 53 total)

Test plan

  • Run make check - all 53 tests pass
  • Verify query_tool_frequency returns tool counts
  • Verify query_timeline respects time ranges and filters
  • Verify query_commands with prefix filter
  • Verify query_sessions returns session metadata
  • Verify query_tokens supports all grouping modes (day/session/model)

Closes #4

🤖 Generated with Claude Code

evansenter and others added 4 commits December 31, 2025 04:14
- pyproject.toml with FastMCP, uvicorn, and dev dependencies
- Makefile with check, fmt, lint, test, install, uninstall targets
- LaunchAgent plist and install/uninstall scripts for auto-start
- dev.sh script for development mode with auto-reload
- Basic FastMCP server with placeholder tools:
  - get_status: Returns server status
  - ingest_logs: Placeholder for log ingestion
  - query_tool_frequency: Placeholder for frequency queries
- Usage guide as MCP resource at session-analytics://guide
- Tests for the placeholder tools
- README with installation and usage instructions

Server runs on port 8081 (to not conflict with event-bus on 8080).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- storage.py with SQLiteStorage class:
  - Events table with denormalized fields for fast queries
  - Sessions table for session metadata
  - Ingestion state tracking for incremental updates
  - Patterns table for pre-computed insights
  - Indexes on timestamp, session_id, tool_name, project_path
- Data classes: Event, Session, IngestionState, Pattern
- CRUD operations for all entities with batch insert support
- get_db_stats() for monitoring database health
- Updated server.py to use storage for get_status()
- Comprehensive test suite (16 tests)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements log file discovery and parsing:
- find_log_files(): Discovers JSONL files within date range
- parse_tool_use(): Extracts tool info (command, file_path, skill_name)
- parse_entry(): Parses entries into Event objects
- ingest_file(): Incremental ingestion with mtime/size tracking
- ingest_logs(): Full ingestion orchestration
- update_session_stats(): Aggregates session statistics

Integrates with server.py to provide real data for ingest_logs tool.

Closes #3

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements all query MCP tools:
- query_tool_frequency: Tool usage counts with project filter
- query_timeline: Events in time window with filtering
- query_commands: Bash command breakdown with prefix filter
- query_sessions: Session metadata and token totals
- query_tokens: Token usage grouped by day/session/model

Also adds:
- ensure_fresh_data(): Auto-refresh mechanism (5 min staleness)
- Comprehensive tests for all queries (18 new tests)

Closes #4

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude

claude Bot commented Dec 31, 2025

Copy link
Copy Markdown

Pull Request Review: Phase 4 Query Tools Implementation

Overview

This PR implements comprehensive query tools for the session analytics MCP server with good architecture and solid test coverage. The implementation is well-structured and follows Python best practices.


✅ Code Quality & Best Practices

Strengths

  1. Well-structured architecture: Clear separation of concerns across storage.py, queries.py, ingest.py, and server.py
  2. Good use of dataclasses: Event, Session, IngestionState, and Pattern models are clean and type-safe
  3. Proper SQL parameterization: All queries use parameterized statements, avoiding SQL injection vulnerabilities
  4. Comprehensive test coverage: 18 new tests covering all query functions with good edge case handling
  5. Incremental ingestion: Smart file-based state tracking prevents redundant processing
  6. Auto-refresh mechanism: ensure_fresh_data() provides good UX for stale data detection

Security Review

No SQL Injection Vulnerabilities

All queries properly use parameterized SQL. User-supplied values (project, prefix, tool, etc.) are passed via params list, never interpolated into SQL strings.

Path Traversal Protection

Database path uses Path.home() with hardcoded subdirectories. Log file discovery is constrained to ~/.claude/projects/ by default.


🐛 Potential Issues & Edge Cases

1. Race Condition in ensure_fresh_data() (Low severity)

Location: src/session_analytics/queries.py:33-40

If multiple queries run concurrently on stale data, they'll all trigger ingestion simultaneously. This could cause duplicate work and SQLite lock contention. For an MCP server handling sequential requests, this is low priority.

2. Timezone Handling (Low severity)

Location: src/session_analytics/ingest.py:133-135

Converting to naive datetime assumes local timezone. Document that all timestamps are stored in local timezone.

3. Empty UUID Handling (Low severity)

Location: src/session_analytics/ingest.py:270

For summary entries with missing uuid and leafUuid, creates "summary:unknown" which could collide. Recommend adding timestamp suffix for uniqueness.

4. F-string Query Construction (Documentation needed)

While SAFE in current implementation (where_clause is built from hardcoded strings only), add comments explaining this to prevent future bugs.


🚀 Performance Considerations

Good Practices

  1. Batch inserts with executemany()
  2. Proper indexing on filter columns
  3. Query limits with sensible defaults
  4. Incremental ingestion

Potential Optimizations

  1. Session stats update (ingest.py:358-399) - Currently full table scan. For large DBs, consider updating only affected sessions.

  2. LIKE queries (queries.py:65-66) - Leading % prevents index usage. Consider prefix matching if acceptable.


📋 Test Coverage

Excellent Coverage

  • 18 new tests for all query functions
  • Good use of fixtures
  • Covers basic functionality, filters, and edge cases

Suggestions for additional tests:

  • Empty database edge cases
  • Malformed JSONL handling
  • Concurrent ingestion (if addressing race condition)

📝 Documentation

Good Documentation

  • guide.md with clear examples
  • Comprehensive docstrings
  • Clear README

Minor suggestions:

  • Add example outputs to MCP tool docstrings
  • Document error handling expectations

🎯 Summary & Recommendations

Critical Issues

None - Code is production-ready.

High Priority

  1. Add comments explaining f-string SQL safety
  2. Fix empty UUID handling for summary entries

Medium Priority

  1. Document timezone handling
  2. Optimize session stats for large DBs
  3. Consider concurrent ingestion protection

Low Priority

  1. Add example outputs to docstrings
  2. Optimize LIKE queries
  3. Add edge case tests

✅ Approval

Excellent work! Implementation is clean, secure, well-tested, and follows best practices. Issues identified are minor and don't block merging.

Recommendation: ✅ Approve and merge

The codebase demonstrates:

  • Strong SQL security understanding
  • Good architectural design
  • Comprehensive testing (53 total tests)
  • Production-ready error handling

Great job! 🚀

@evansenter

Copy link
Copy Markdown
Owner Author

Superseded by #19 which contains all Phase 1-7 work plus RFC #18 refactoring.

@evansenter evansenter closed this Dec 31, 2025
@evansenter evansenter deleted the phase-4-queries branch December 31, 2025 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Phase 4: Query tools

1 participant