Skip to content

Refactor codebase per RFC #18 audit findings#19

Merged
evansenter merged 11 commits into
mainfrom
issue-18-refactor
Dec 31, 2025
Merged

Refactor codebase per RFC #18 audit findings#19
evansenter merged 11 commits into
mainfrom
issue-18-refactor

Conversation

@evansenter

Copy link
Copy Markdown
Owner

Summary

Implements all 12 findings from the codebase audit (RFC #18):

  • P1 - Encapsulation: Add execute_query() / execute_write() public API to SQLiteStorage, migrate 8 locations from _connect()
  • P2 - Code Quality: Add build_where_clause() helper, formatter registry pattern in CLI, module-qualified imports, importlib.metadata for version
  • P3 - Extensibility: Schema migration framework with @migration decorator, __all__ exports
  • P4 - Polish: Remove dead code, document timestamp handling, CLI help epilog

Test plan

  • All 84 tests pass
  • make check passes (format, lint, test)
  • CLI help shows new epilog with examples
  • Verify CLAUDE.md and README.md document new patterns

Closes #18

🤖 Generated with Claude Code

evansenter and others added 10 commits December 31, 2025 04:14
- pyproject.toml with FastMCP, uvicorn, and dev dependencies
- Makefile with check, fmt, lint, test, install, uninstall targets
- LaunchAgent plist and install/uninstall scripts for auto-start
- dev.sh script for development mode with auto-reload
- Basic FastMCP server with placeholder tools:
  - get_status: Returns server status
  - ingest_logs: Placeholder for log ingestion
  - query_tool_frequency: Placeholder for frequency queries
- Usage guide as MCP resource at session-analytics://guide
- Tests for the placeholder tools
- README with installation and usage instructions

Server runs on port 8081 (to not conflict with event-bus on 8080).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- storage.py with SQLiteStorage class:
  - Events table with denormalized fields for fast queries
  - Sessions table for session metadata
  - Ingestion state tracking for incremental updates
  - Patterns table for pre-computed insights
  - Indexes on timestamp, session_id, tool_name, project_path
- Data classes: Event, Session, IngestionState, Pattern
- CRUD operations for all entities with batch insert support
- get_db_stats() for monitoring database health
- Updated server.py to use storage for get_status()
- Comprehensive test suite (16 tests)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements log file discovery and parsing:
- find_log_files(): Discovers JSONL files within date range
- parse_tool_use(): Extracts tool info (command, file_path, skill_name)
- parse_entry(): Parses entries into Event objects
- ingest_file(): Incremental ingestion with mtime/size tracking
- ingest_logs(): Full ingestion orchestration
- update_session_stats(): Aggregates session statistics

Integrates with server.py to provide real data for ingest_logs tool.

Closes #3

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements all query MCP tools:
- query_tool_frequency: Tool usage counts with project filter
- query_timeline: Events in time window with filtering
- query_commands: Bash command breakdown with prefix filter
- query_sessions: Session metadata and token totals
- query_tokens: Token usage grouped by day/session/model

Also adds:
- ensure_fresh_data(): Auto-refresh mechanism (5 min staleness)
- Comprehensive tests for all queries (18 new tests)

Closes #4

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements pattern detection for /improve-workflow integration:
- compute_tool_frequency_patterns(): Tool usage frequency
- compute_command_patterns(): Bash command frequency
- compute_sequence_patterns(): Tool n-gram detection
- compute_permission_gaps(): Commands not in settings.json
- get_insights(): Unified insights API for /improve-workflow

New MCP tools:
- query_sequences: Common tool patterns
- query_permission_gaps: Commands needing settings.json
- get_insights: Pre-computed patterns

Adds 16 new tests (69 total).

Closes #5

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds command-line interface for shell access:
- session-analytics-cli status: Database stats
- session-analytics-cli ingest: Trigger log ingestion
- session-analytics-cli frequency: Tool usage counts
- session-analytics-cli commands: Bash command breakdown
- session-analytics-cli sessions: Session metadata
- session-analytics-cli tokens: Token usage by day/session/model
- session-analytics-cli sequences: Tool patterns
- session-analytics-cli permissions: Commands needing settings.json
- session-analytics-cli insights: Pre-computed patterns

All commands support --json for machine-readable output.

Also updates README with CLI usage documentation.

Closes #6

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add comprehensive CLI test coverage (15 tests)
- Fix format_output ordering for insights command
- Test all CLI commands: status, ingest, frequency, commands,
  sessions, tokens, sequences, permissions, insights
- Test both human-readable and JSON output modes

Closes #7

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Document all 10 MCP tools with descriptions
- Add CLI usage examples for all 9 commands
- Include example JSON output for key queries
- Document architecture, data model, and integration points
- Add development and installation instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements all 12 findings from the codebase audit:

**P1 - Encapsulation:**
- Add execute_query() and execute_write() public methods to SQLiteStorage
- Migrate 8 locations (4 in queries.py, 4 in patterns.py) to use public API

**P2 - Code Quality:**
- Add build_where_clause() helper to reduce query duplication
- Refactor format_output() to use formatter registry pattern
- Change to module-qualified imports in server.py
- Read version from importlib.metadata instead of hardcoding

**P3 - Future Extensibility:**
- Add schema migration framework with @migration decorator
- Define __all__ exports in __init__.py

**P4 - Polish:**
- Remove empty pass branch in ingest.py
- Document timestamp handling with clear comments
- Add CLI epilog with examples
- Update CLAUDE.md and README.md with architecture patterns

Closes #18

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude

claude Bot commented Dec 31, 2025

Copy link
Copy Markdown

Code Review: PR #19 - Refactor codebase per RFC #18 audit findings

This is an excellent refactoring that systematically addresses all 12 findings from RFC #18. The implementation is clean, well-tested, and follows best practices.

Strengths

1. Excellent Encapsulation - The new public API (execute_query/execute_write) in storage.py:162-192 properly abstracts database access with clear separation and good documentation.

2. Strong Test Coverage - 1,591 lines of tests across 6 test files (84 tests total) covering storage, queries, patterns, ingest, cli, and server.

3. Elegant Formatter Registry - cli.py:17-135 implements clean decorator-based registry that's easy to extend.

4. Good Query Helper - queries.py:8-40 build_where_clause() eliminates duplication across 4 instances.

5. Migration Framework - storage.py:112-132 provides future-proof schema evolution with decorator pattern.

Issues Found

Critical: None

High Priority:

  1. settings.local.json contains hardcoded absolute paths (/Users/evansenter/...) - should not be committed or should use relative paths only

Medium Priority:
2. ingest.py:134-139 silently drops events on timestamp parse failure - suggest WARNING level logging
3. queries.py:29 uses f-string for column name - add whitelist validation to prevent future SQL injection
4. Inconsistent datetime handling across storage.py and ingest.py - centralize in utility module

Low Priority:
5. cli.py formatters lack return type annotations
6. scripts/install-launchagent.sh doesn't validate template file existence

Recommendations

Documentation: Excellent README and CLAUDE.md updates. Consider adding more API docstrings.

Security: Uses parameterized queries throughout. settings.local.json should not be committed.

Performance: Proper indexes, batch operations, pre-computed patterns, auto-refresh with staleness detection.

Maintainability: Clear separation of concerns, consistent naming, migration framework, comprehensive tests.

Summary

All 12 RFC #18 findings successfully addressed:

  • Encapsulation (P1): Complete - public API replaces private _connect() usage
  • Code Quality (P2): Complete - query helper, formatter registry, proper imports
  • Extensibility (P3): Complete - migration framework, all exports
  • Polish (P4): Complete - dead code removed, timestamp docs, CLI help

Recommended Actions:

  1. Before merge: Remove or gitignore .claude/settings.local.json
  2. Optional: Add column validation, improve error logging, add type hints

Verdict: Approve with minor changes (fix settings.local.json before merge)

Excellent work on this comprehensive refactoring!

@claude

claude Bot commented Dec 31, 2025

Copy link
Copy Markdown

Code Review: PR #19 - Refactor codebase per RFC #18 audit findings

Overview

This is an impressive refactor that systematically addresses all 12 findings from RFC #18. The PR adds ~4,700 lines of well-structured code with comprehensive test coverage (84 tests across 7 test files). The implementation follows clean architecture principles and establishes clear patterns for future development.

✅ Strengths

Architecture & Design

  • Public API Encapsulation: The introduction of execute_query() and execute_write() as the public API for SQLiteStorage is excellent. This provides a clear boundary and makes the code more maintainable.
  • Formatter Registry Pattern: The CLI formatter registry using decorators (@_register_formatter) is elegant and extensible. The predicate-based matching (first-match-wins) is clean and easy to understand.
  • Migration Framework: The @migration(version, name) decorator pattern is well-designed for future schema evolution.
  • Query Helper: build_where_clause() successfully eliminates code duplication across query functions.

Code Quality

  • Type Hints: Extensive use of modern Python type hints (str | None, list[Event], etc.) improves code clarity and IDE support.
  • Documentation: Good docstring coverage with clear Args/Returns sections.
  • Error Handling: Appropriate use of try/except with logging in critical paths (e.g., timestamp parsing, file I/O).
  • Testing: 84 tests provide strong coverage across all modules.

Developer Experience

  • Makefile: Clean, well-organized targets (check, install, dev, uninstall).
  • CLI Help: The epilog with examples is a nice touch (cli.py:231-240).
  • LaunchAgent Setup: Automated installation scripts make deployment straightforward.

🔍 Issues & Recommendations

Security Concerns

P0: SQL Injection Risk (storage.py:427, 434)
While where_clause is built from hardcoded conditions in get_events_in_range(), using f-strings for SQL construction is a dangerous pattern. If this pattern is copied elsewhere with user input, it could lead to SQL injection.

Recommendation: Use string concatenation instead of f-strings for SQL to make the pattern safer.

P1: Path Traversal (ingest.py:43)
The project_filter parameter is used in a simple substring match without validation. If this filter comes from user input, it could be exploited.

Recommendation: Validate or sanitize project_filter to ensure it does not contain path separators.

Bugs

P1: Timezone Handling (storage.py:567-569)
This handles both strings and datetime objects, but the timezone handling could be more explicitly documented. SQLite MAX() returns strings which are converted back.

P2: Silent Failure in Formatter Registry (cli.py:130-135)
If a predicate raises an exception, it will propagate and crash the CLI. Consider wrapping predicate calls in try/except.

P2: Missing Error Handling (queries.py:68-73)
If ingest_logs() fails, the query will fail too. Consider catching exceptions and logging warnings.

Performance

P1: Missing Index (storage.py:262-265)
Missing index on command column, which is used in query_commands(). Add: CREATE INDEX IF NOT EXISTS idx_events_command ON events(command)

P2: Unbounded Query (storage.py:409)
The limit defaults to 100 but consider adding a maximum cap to prevent accidental DoS.

Code Quality

P2: Magic Number (cli.py:36, 44, 67)
The limit of 20 items appears in multiple formatters. Extract to a constant.

P2: Duplicate Code (storage.py:318-350, 354-392)
The add_event() and add_events_batch() methods have identical INSERT statements. Extract to a shared constant.

P3: Type Annotation (cli.py:23)
Use Callable[[dict], bool] instead of callable for better type checking.

Testing

P1: Test Coverage Gap
No tests for the migration framework itself. Add tests for schema version upgrades.

P2: Missing Edge Case Tests

  • What happens if ~/.claude/projects/ does not exist?
  • What happens if a JSONL file is malformed?
  • What happens if the database is locked?

📊 Test Coverage Analysis

Strong coverage across 7 test files:

  • test_storage.py: Event operations, sessions, patterns
  • test_queries.py: All query functions
  • test_ingest.py: JSONL parsing
  • test_patterns.py: Sequence detection, permission gaps
  • test_cli.py: CLI commands and formatters
  • test_server.py: MCP server endpoints
  • ⚠️ Missing: Migration framework, error scenarios

🎯 Recommendations

Before Merge

  1. Fix SQL f-string pattern (P0)
  2. Add index on command column
  3. Add migration framework test

Follow-up PRs

  1. Extract magic numbers to constants
  2. Add error handling improvements
  3. Add edge case tests

✨ Conclusion

This is excellent work that significantly improves codebase maintainability. The architecture patterns provide a solid foundation.

Recommendation: Merge after addressing the P0 security concern and adding the missing index.

Great job! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC: Address codebase anti-patterns and encapsulation issues

1 participant