Skip to content

Phase 6: CLI and documentation#14

Closed
evansenter wants to merge 6 commits into
mainfrom
phase-6-integration
Closed

Phase 6: CLI and documentation#14
evansenter wants to merge 6 commits into
mainfrom
phase-6-integration

Conversation

@evansenter

Copy link
Copy Markdown
Owner

Summary

Adds command-line interface for shell access and scripts:

  • session-analytics-cli status - Database stats
  • session-analytics-cli ingest - Trigger log ingestion
  • session-analytics-cli frequency - Tool usage counts
  • session-analytics-cli commands - Bash command breakdown
  • session-analytics-cli sessions - Session metadata
  • session-analytics-cli tokens - Token usage by day/session/model
  • session-analytics-cli sequences - Tool patterns
  • session-analytics-cli permissions - Commands needing settings.json
  • session-analytics-cli insights - Pre-computed patterns for /improve-workflow

All commands support --json for machine-readable output.

Also updates README with CLI usage documentation.

Test plan

  • Run make check - all 69 tests pass
  • Test CLI status command
  • Test CLI frequency command
  • Verify --json output works

Closes #6

🤖 Generated with Claude Code

evansenter and others added 6 commits December 31, 2025 04:14
- pyproject.toml with FastMCP, uvicorn, and dev dependencies
- Makefile with check, fmt, lint, test, install, uninstall targets
- LaunchAgent plist and install/uninstall scripts for auto-start
- dev.sh script for development mode with auto-reload
- Basic FastMCP server with placeholder tools:
  - get_status: Returns server status
  - ingest_logs: Placeholder for log ingestion
  - query_tool_frequency: Placeholder for frequency queries
- Usage guide as MCP resource at session-analytics://guide
- Tests for the placeholder tools
- README with installation and usage instructions

Server runs on port 8081 (to not conflict with event-bus on 8080).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- storage.py with SQLiteStorage class:
  - Events table with denormalized fields for fast queries
  - Sessions table for session metadata
  - Ingestion state tracking for incremental updates
  - Patterns table for pre-computed insights
  - Indexes on timestamp, session_id, tool_name, project_path
- Data classes: Event, Session, IngestionState, Pattern
- CRUD operations for all entities with batch insert support
- get_db_stats() for monitoring database health
- Updated server.py to use storage for get_status()
- Comprehensive test suite (16 tests)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements log file discovery and parsing:
- find_log_files(): Discovers JSONL files within date range
- parse_tool_use(): Extracts tool info (command, file_path, skill_name)
- parse_entry(): Parses entries into Event objects
- ingest_file(): Incremental ingestion with mtime/size tracking
- ingest_logs(): Full ingestion orchestration
- update_session_stats(): Aggregates session statistics

Integrates with server.py to provide real data for ingest_logs tool.

Closes #3

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements all query MCP tools:
- query_tool_frequency: Tool usage counts with project filter
- query_timeline: Events in time window with filtering
- query_commands: Bash command breakdown with prefix filter
- query_sessions: Session metadata and token totals
- query_tokens: Token usage grouped by day/session/model

Also adds:
- ensure_fresh_data(): Auto-refresh mechanism (5 min staleness)
- Comprehensive tests for all queries (18 new tests)

Closes #4

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements pattern detection for /improve-workflow integration:
- compute_tool_frequency_patterns(): Tool usage frequency
- compute_command_patterns(): Bash command frequency
- compute_sequence_patterns(): Tool n-gram detection
- compute_permission_gaps(): Commands not in settings.json
- get_insights(): Unified insights API for /improve-workflow

New MCP tools:
- query_sequences: Common tool patterns
- query_permission_gaps: Commands needing settings.json
- get_insights: Pre-computed patterns

Adds 16 new tests (69 total).

Closes #5

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds command-line interface for shell access:
- session-analytics-cli status: Database stats
- session-analytics-cli ingest: Trigger log ingestion
- session-analytics-cli frequency: Tool usage counts
- session-analytics-cli commands: Bash command breakdown
- session-analytics-cli sessions: Session metadata
- session-analytics-cli tokens: Token usage by day/session/model
- session-analytics-cli sequences: Tool patterns
- session-analytics-cli permissions: Commands needing settings.json
- session-analytics-cli insights: Pre-computed patterns

All commands support --json for machine-readable output.

Also updates README with CLI usage documentation.

Closes #6

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude

claude Bot commented Dec 31, 2025

Copy link
Copy Markdown

Pull Request Review - Phase 6: CLI and Documentation

This is a well-implemented PR that adds a comprehensive CLI interface and completes the project. The code demonstrates strong Python practices and thoughtful architecture.

✅ Strengths

Code Quality

  • Excellent separation of concerns: CLI, server, storage, ingest, patterns, and queries are cleanly separated
  • Consistent coding style: Follows PEP 8, uses type hints effectively
  • Good documentation: Docstrings are comprehensive with clear Args/Returns sections
  • Strong testing: 69 tests with coverage across all modules

Architecture

  • Dataclasses for clean data models: Event, Session, IngestionState, Pattern are well-designed
  • Context managers for DB connections: storage._connect() ensures proper cleanup (storage.py:126-138)
  • Batch operations: add_events_batch() for performance (storage.py:280-319)
  • Incremental ingestion: Smart file tracking prevents re-processing unchanged files (ingest.py:304-308)
  • Auto-refresh mechanism: ensure_fresh_data() keeps analytics current (queries.py:8-40)

CLI Design

  • Clean argument parsing: Well-structured subcommands with sensible defaults
  • Dual output modes: Human-readable and JSON for scripting (cli.py:18-90)
  • Comprehensive coverage: All MCP tools have CLI equivalents

🔍 Issues Found

1. SQL Injection Vulnerability Pattern (Critical)

Location: queries.py:72, 186

The code uses f-strings to construct SQL WHERE clauses. While current implementation only uses hardcoded conditions, this pattern is dangerous and could lead to injection if refactored.

Recommendation: Use parameterized queries throughout or add warnings about injection risks.

2. File Path Security

Location: ingest.py:46

Code does not validate that files are actually within the expected directory. An attacker could potentially use symlinks to read files outside ~/.claude/projects/.

Recommendation: Add file_path.resolve().is_relative_to(logs_dir.resolve()) check.

3. JSON Parsing Error Handling

Location: ingest.py:329-334

The broad Exception catch could hide serious bugs. Consider catching specific exceptions.

4. DateTime Timezone Handling

Location: ingest.py:133-135

This loses timezone information when converting to naive datetime for SQLite. Document this limitation or store UTC explicitly.

5. Resource Cleanup

Location: server.py:47

The module-level storage = SQLiteStorage() instance never closes connections. While SQLite handles this gracefully, consider documenting that connections are short-lived via context managers.

⚡ Performance Considerations

Good

  • Indexes on common queries: idx_events_timestamp, idx_events_session, idx_events_tool, idx_events_project
  • Batch inserts: Reduces transaction overhead
  • Incremental ingestion: Skips unchanged files

Concerns

  1. N+1 query pattern in update_session_stats() (ingest.py:364-398) - Fetches all sessions, then calls upsert_session() in a loop. Should use a single INSERT OR REPLACE with subquery.

  2. Missing LIMIT on some queries: compute_tool_frequency_patterns() fetches all tools - consider TOP N limit for large datasets

  3. Pattern computation overhead: compute_sequence_patterns() loads all events into memory - could cause issues with >100K events

🔒 Security Assessment

Good

  • No user input in SQL - all queries use parameterization
  • File system isolation - reads from known directory
  • No shell command execution
  • Permission system integration reads settings.json safely

Concerns

  • Log file parsing assumes JSONL files are not malicious (acceptable since from Claude Code)
  • Consider adding file size limits to prevent DoS

📊 Test Coverage

Excellent Coverage

  • test_ingest.py, test_patterns.py, test_queries.py, test_server.py, test_storage.py
  • 69 total tests covering all major functionality

Missing Tests

  1. CLI output formatting edge cases
  2. Concurrent ingestion scenarios
  3. Large dataset performance tests

🚀 Recommendations

High Priority

  1. Fix SQL injection pattern in queries.py
  2. Add file path validation in ingest.py
  3. Optimize update_session_stats() with single SQL statement

Medium Priority

  1. Add CLI integration tests
  2. Document timezone handling
  3. Add memory limits for large datasets

Low Priority

  1. Make constants configurable (max_age_minutes, sequence_length)
  2. Add pre-commit hooks
  3. Add changelog

🎯 Overall Assessment

Rating: 8.5/10 - Production-ready code with minor security and performance improvements needed.

Blockers for Merge

  • ✅ None - code is functional and well-tested

Recommended Before Merge

  • Fix SQL injection vulnerability pattern (defensive coding)
  • Add file path validation for security
  • Optimize session stats update

Great work on this phase! The project is now feature-complete with both MCP and CLI access. 🎉

@evansenter

Copy link
Copy Markdown
Owner Author

Superseded by #19 which contains all Phase 1-7 work plus RFC #18 refactoring.

@evansenter evansenter closed this Dec 31, 2025
@evansenter evansenter deleted the phase-6-integration branch December 31, 2025 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Phase 6: Integration

1 participant