Refactor codebase per RFC #18 audit findings by evansenter · Pull Request #19 · evansenter/agent-session-analytics

evansenter · 2025-12-31T05:40:50Z

Summary

Implements all 12 findings from the codebase audit (RFC #18):

P1 - Encapsulation: Add execute_query() / execute_write() public API to SQLiteStorage, migrate 8 locations from _connect()
P2 - Code Quality: Add build_where_clause() helper, formatter registry pattern in CLI, module-qualified imports, importlib.metadata for version
P3 - Extensibility: Schema migration framework with @migration decorator, __all__ exports
P4 - Polish: Remove dead code, document timestamp handling, CLI help epilog

Test plan

All 84 tests pass
make check passes (format, lint, test)
CLI help shows new epilog with examples
Verify CLAUDE.md and README.md document new patterns

Closes #18

🤖 Generated with Claude Code

- pyproject.toml with FastMCP, uvicorn, and dev dependencies - Makefile with check, fmt, lint, test, install, uninstall targets - LaunchAgent plist and install/uninstall scripts for auto-start - dev.sh script for development mode with auto-reload - Basic FastMCP server with placeholder tools: - get_status: Returns server status - ingest_logs: Placeholder for log ingestion - query_tool_frequency: Placeholder for frequency queries - Usage guide as MCP resource at session-analytics://guide - Tests for the placeholder tools - README with installation and usage instructions Server runs on port 8081 (to not conflict with event-bus on 8080). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- storage.py with SQLiteStorage class: - Events table with denormalized fields for fast queries - Sessions table for session metadata - Ingestion state tracking for incremental updates - Patterns table for pre-computed insights - Indexes on timestamp, session_id, tool_name, project_path - Data classes: Event, Session, IngestionState, Pattern - CRUD operations for all entities with batch insert support - get_db_stats() for monitoring database health - Updated server.py to use storage for get_status() - Comprehensive test suite (16 tests) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implements log file discovery and parsing: - find_log_files(): Discovers JSONL files within date range - parse_tool_use(): Extracts tool info (command, file_path, skill_name) - parse_entry(): Parses entries into Event objects - ingest_file(): Incremental ingestion with mtime/size tracking - ingest_logs(): Full ingestion orchestration - update_session_stats(): Aggregates session statistics Integrates with server.py to provide real data for ingest_logs tool. Closes #3 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implements all query MCP tools: - query_tool_frequency: Tool usage counts with project filter - query_timeline: Events in time window with filtering - query_commands: Bash command breakdown with prefix filter - query_sessions: Session metadata and token totals - query_tokens: Token usage grouped by day/session/model Also adds: - ensure_fresh_data(): Auto-refresh mechanism (5 min staleness) - Comprehensive tests for all queries (18 new tests) Closes #4 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implements pattern detection for /improve-workflow integration: - compute_tool_frequency_patterns(): Tool usage frequency - compute_command_patterns(): Bash command frequency - compute_sequence_patterns(): Tool n-gram detection - compute_permission_gaps(): Commands not in settings.json - get_insights(): Unified insights API for /improve-workflow New MCP tools: - query_sequences: Common tool patterns - query_permission_gaps: Commands needing settings.json - get_insights: Pre-computed patterns Adds 16 new tests (69 total). Closes #5 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Adds command-line interface for shell access: - session-analytics-cli status: Database stats - session-analytics-cli ingest: Trigger log ingestion - session-analytics-cli frequency: Tool usage counts - session-analytics-cli commands: Bash command breakdown - session-analytics-cli sessions: Session metadata - session-analytics-cli tokens: Token usage by day/session/model - session-analytics-cli sequences: Tool patterns - session-analytics-cli permissions: Commands needing settings.json - session-analytics-cli insights: Pre-computed patterns All commands support --json for machine-readable output. Also updates README with CLI usage documentation. Closes #6 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add comprehensive CLI test coverage (15 tests) - Fix format_output ordering for insights command - Test all CLI commands: status, ingest, frequency, commands, sessions, tokens, sequences, permissions, insights - Test both human-readable and JSON output modes Closes #7 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Document all 10 MCP tools with descriptions - Add CLI usage examples for all 9 commands - Include example JSON output for key queries - Document architecture, data model, and integration points - Add development and installation instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

@migration

Implements all 12 findings from the codebase audit: **P1 - Encapsulation:** - Add execute_query() and execute_write() public methods to SQLiteStorage - Migrate 8 locations (4 in queries.py, 4 in patterns.py) to use public API **P2 - Code Quality:** - Add build_where_clause() helper to reduce query duplication - Refactor format_output() to use formatter registry pattern - Change to module-qualified imports in server.py - Read version from importlib.metadata instead of hardcoding **P3 - Future Extensibility:** - Add schema migration framework with @migration decorator - Define __all__ exports in __init__.py **P4 - Polish:** - Remove empty pass branch in ingest.py - Document timestamp handling with clear comments - Add CLI epilog with examples - Update CLAUDE.md and README.md with architecture patterns Closes #18 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2025-12-31T05:48:24Z

Code Review: PR #19 - Refactor codebase per RFC #18 audit findings

This is an excellent refactoring that systematically addresses all 12 findings from RFC #18. The implementation is clean, well-tested, and follows best practices.

Strengths

1. Excellent Encapsulation - The new public API (execute_query/execute_write) in storage.py:162-192 properly abstracts database access with clear separation and good documentation.

2. Strong Test Coverage - 1,591 lines of tests across 6 test files (84 tests total) covering storage, queries, patterns, ingest, cli, and server.

3. Elegant Formatter Registry - cli.py:17-135 implements clean decorator-based registry that's easy to extend.

4. Good Query Helper - queries.py:8-40 build_where_clause() eliminates duplication across 4 instances.

5. Migration Framework - storage.py:112-132 provides future-proof schema evolution with decorator pattern.

Issues Found

Critical: None

High Priority:

settings.local.json contains hardcoded absolute paths (/Users/evansenter/...) - should not be committed or should use relative paths only

Medium Priority:
2. ingest.py:134-139 silently drops events on timestamp parse failure - suggest WARNING level logging
3. queries.py:29 uses f-string for column name - add whitelist validation to prevent future SQL injection
4. Inconsistent datetime handling across storage.py and ingest.py - centralize in utility module

Low Priority:
5. cli.py formatters lack return type annotations
6. scripts/install-launchagent.sh doesn't validate template file existence

Recommendations

Documentation: Excellent README and CLAUDE.md updates. Consider adding more API docstrings.

Security: Uses parameterized queries throughout. settings.local.json should not be committed.

Performance: Proper indexes, batch operations, pre-computed patterns, auto-refresh with staleness detection.

Maintainability: Clear separation of concerns, consistent naming, migration framework, comprehensive tests.

Summary

All 12 RFC #18 findings successfully addressed:

Encapsulation (P1): Complete - public API replaces private _connect() usage
Code Quality (P2): Complete - query helper, formatter registry, proper imports
Extensibility (P3): Complete - migration framework, all exports
Polish (P4): Complete - dead code removed, timestamp docs, CLI help

Recommended Actions:

Before merge: Remove or gitignore .claude/settings.local.json
Optional: Add column validation, improve error logging, add type hints

Verdict: Approve with minor changes (fix settings.local.json before merge)

Excellent work on this comprehensive refactoring!

claude · 2025-12-31T12:49:38Z

Code Review: PR #19 - Refactor codebase per RFC #18 audit findings

Overview

This is an impressive refactor that systematically addresses all 12 findings from RFC #18. The PR adds ~4,700 lines of well-structured code with comprehensive test coverage (84 tests across 7 test files). The implementation follows clean architecture principles and establishes clear patterns for future development.

✅ Strengths

Architecture & Design

Public API Encapsulation: The introduction of execute_query() and execute_write() as the public API for SQLiteStorage is excellent. This provides a clear boundary and makes the code more maintainable.
Formatter Registry Pattern: The CLI formatter registry using decorators (@_register_formatter) is elegant and extensible. The predicate-based matching (first-match-wins) is clean and easy to understand.
Migration Framework: The @migration(version, name) decorator pattern is well-designed for future schema evolution.
Query Helper: build_where_clause() successfully eliminates code duplication across query functions.

Code Quality

Type Hints: Extensive use of modern Python type hints (str | None, list[Event], etc.) improves code clarity and IDE support.
Documentation: Good docstring coverage with clear Args/Returns sections.
Error Handling: Appropriate use of try/except with logging in critical paths (e.g., timestamp parsing, file I/O).
Testing: 84 tests provide strong coverage across all modules.

Developer Experience

Makefile: Clean, well-organized targets (check, install, dev, uninstall).
CLI Help: The epilog with examples is a nice touch (cli.py:231-240).
LaunchAgent Setup: Automated installation scripts make deployment straightforward.

🔍 Issues & Recommendations

Security Concerns

P0: SQL Injection Risk (storage.py:427, 434)
While where_clause is built from hardcoded conditions in get_events_in_range(), using f-strings for SQL construction is a dangerous pattern. If this pattern is copied elsewhere with user input, it could lead to SQL injection.

Recommendation: Use string concatenation instead of f-strings for SQL to make the pattern safer.

P1: Path Traversal (ingest.py:43)
The project_filter parameter is used in a simple substring match without validation. If this filter comes from user input, it could be exploited.

Recommendation: Validate or sanitize project_filter to ensure it does not contain path separators.

Bugs

P1: Timezone Handling (storage.py:567-569)
This handles both strings and datetime objects, but the timezone handling could be more explicitly documented. SQLite MAX() returns strings which are converted back.

P2: Silent Failure in Formatter Registry (cli.py:130-135)
If a predicate raises an exception, it will propagate and crash the CLI. Consider wrapping predicate calls in try/except.

P2: Missing Error Handling (queries.py:68-73)
If ingest_logs() fails, the query will fail too. Consider catching exceptions and logging warnings.

Performance

P1: Missing Index (storage.py:262-265)
Missing index on command column, which is used in query_commands(). Add: CREATE INDEX IF NOT EXISTS idx_events_command ON events(command)

P2: Unbounded Query (storage.py:409)
The limit defaults to 100 but consider adding a maximum cap to prevent accidental DoS.

Code Quality

P2: Magic Number (cli.py:36, 44, 67)
The limit of 20 items appears in multiple formatters. Extract to a constant.

P2: Duplicate Code (storage.py:318-350, 354-392)
The add_event() and add_events_batch() methods have identical INSERT statements. Extract to a shared constant.

P3: Type Annotation (cli.py:23)
Use Callable[[dict], bool] instead of callable for better type checking.

Testing

P1: Test Coverage Gap
No tests for the migration framework itself. Add tests for schema version upgrades.

P2: Missing Edge Case Tests

What happens if ~/.claude/projects/ does not exist?
What happens if a JSONL file is malformed?
What happens if the database is locked?

📊 Test Coverage Analysis

Strong coverage across 7 test files:

✅ test_storage.py: Event operations, sessions, patterns
✅ test_queries.py: All query functions
✅ test_ingest.py: JSONL parsing
✅ test_patterns.py: Sequence detection, permission gaps
✅ test_cli.py: CLI commands and formatters
✅ test_server.py: MCP server endpoints
⚠️ Missing: Migration framework, error scenarios

🎯 Recommendations

Before Merge

Fix SQL f-string pattern (P0)
Add index on command column
Add migration framework test

Follow-up PRs

Extract magic numbers to constants
Add error handling improvements
Add edge case tests

✨ Conclusion

This is excellent work that significantly improves codebase maintainability. The architecture patterns provide a solid foundation.

Recommendation: Merge after addressing the P0 security concern and adding the missing index.

Great job! 🚀

evansenter and others added 10 commits December 31, 2025 04:14

Merge main to get CI workflows

37f2812

Ignore .claude/ directory with local settings

8f52671

evansenter merged commit ff49767 into main Dec 31, 2025
3 checks passed

evansenter deleted the issue-18-refactor branch December 31, 2025 12:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor codebase per RFC #18 audit findings#19

Refactor codebase per RFC #18 audit findings#19
evansenter merged 11 commits into
mainfrom
issue-18-refactor

evansenter commented Dec 31, 2025

Uh oh!

claude Bot commented Dec 31, 2025

Uh oh!

claude Bot commented Dec 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

evansenter commented Dec 31, 2025

Summary

Test plan

Uh oh!

claude Bot commented Dec 31, 2025

Code Review: PR #19 - Refactor codebase per RFC #18 audit findings

Strengths

Issues Found

Recommendations

Summary

Uh oh!

claude Bot commented Dec 31, 2025

Code Review: PR #19 - Refactor codebase per RFC #18 audit findings

Overview

✅ Strengths

Architecture & Design

Code Quality

Developer Experience

🔍 Issues & Recommendations

Security Concerns

Bugs

Performance

Code Quality

Testing

📊 Test Coverage Analysis

🎯 Recommendations

Before Merge

Follow-up PRs

✨ Conclusion

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant