diff --git a/CLAUDE.md b/CLAUDE.md index aabfd46..2ce9529 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,328 +2,165 @@ Queryable analytics for Claude Code session logs, exposed as an MCP server and CLI. -**Related**: [claude-event-bus](https://github.com/evansenter/claude-event-bus) shares design patterns with this project. - -## Project Overview - -This MCP server replaces the bash script `~/.claude/contrib/parse-session-logs.sh` with a persistent, queryable analytics layer. It parses JSONL session logs from `~/.claude/projects/` and provides: - -- **Tool frequency analysis**: Which tools you use most (Read, Edit, Bash, etc.) -- **Command breakdown**: Bash command patterns (git, make, cargo, etc.) -- **Workflow sequences**: Common tool chains like Read → Edit → Bash -- **Permission gap detection**: Commands that should be added to settings.json -- **Token usage tracking**: Usage by day, session, or model -- **Session timeline**: Events across conversations, organized by timestamp - -## Architecture - -``` -~/.claude/projects/**/*.jsonl → SQLite DB → MCP Server / CLI - ↓ - ~/.claude/contrib/analytics/data.db -``` +**API Reference**: Run `session-analytics-cli --help` or read `src/session_analytics/guide.md` (served as `session-analytics://guide` MCP resource). -Key components: -- **FastMCP** for MCP server implementation -- **SQLite** for persistent storage with incremental ingestion -- **Auto-refresh** queries automatically refresh stale data (>5 min old) -- **LaunchAgent** for always-on availability (macOS) +**Related**: [claude-event-bus](https://github.com/evansenter/claude-event-bus) shares design patterns with this project. --- -## ⚠️ DATABASE PROTECTION - READ THIS ⚠️ +## ⚠️ DATABASE PROTECTION **The database at `~/.claude/contrib/analytics/data.db` contains irreplaceable historical data.** -### NEVER do any of the following: -- Add code that deletes the database file (`os.remove()`, `unlink()`, `rm`) -- Add `DROP TABLE` statements for `events`, `sessions`, `ingested_files`, or `git_commits` -- Add `DELETE FROM` for user data tables (only `patterns` table can be cleared - it's re-computed) -- Add any "reset" or "clear all" functionality that destroys historical data - -### Safe operations: -- `DELETE FROM patterns` - OK, patterns are re-computed derived data -- `make uninstall` - OK, preserves database (only removes LaunchAgent + MCP config) -- `make reinstall` - OK, just reinstalls Python package +### NEVER: +- Delete the database file (`os.remove()`, `unlink()`, `rm`) +- `DROP TABLE` on `events`, `sessions`, `ingested_files`, or `git_commits` +- `DELETE FROM` user data tables (only `patterns` is safe - it's re-computed) +- Add "reset" or "clear all" functionality -### Before schema/migration changes: -**ALWAYS back up the database before making schema or migration changes:** +### Before schema changes: ```bash cp ~/.claude/contrib/analytics/data.db ~/.claude/contrib/analytics/data.db.backup-$(date +%Y%m%d-%H%M%S) ``` -Migrations can have subtle bugs (race conditions, incorrect data transforms) that corrupt data irreversibly. - -### If you need to test destructive operations: -Use a temporary database in tests (all tests already do this via `tmpdir`). --- -## Commands - -```bash -make check # Run fmt, lint, test -make install # Install LaunchAgent + CLI + MCP config -make uninstall # Remove LaunchAgent + CLI -make restart # Restart LaunchAgent to pick up code changes -make reinstall # pip install -e . + restart (for pyproject.toml changes) -make dev # Run in dev mode with auto-reload -``` - -### When to restart - -The install is editable (`pip install -e .`), so Python code changes are picked up automatically by the CLI. The MCP server (LaunchAgent) needs a restart to see changes. - -| Change type | Action needed | -|-------------|---------------| -| MCP tools (`server.py`) | `make restart` | -| Query/pattern logic (`queries.py`, `patterns.py`) | `make restart` | -| Storage/migrations (`storage.py`) | `make restart` | -| CLI only (`cli.py`) | None - CLI runs fresh each time | -| `pyproject.toml` (entry points, deps) | `make reinstall` | -| Tests | None - pytest runs fresh | -| Documentation (`guide.md`, `CLAUDE.md`) | None | - -## Key Files - -| File | Purpose | -|------|---------| -| `src/session_analytics/server.py` | MCP tools + HTTP server entry point | -| `src/session_analytics/cli.py` | CLI with formatter registry for output | -| `src/session_analytics/storage.py` | SQLite backend with migration support | -| `src/session_analytics/ingest.py` | JSONL parsing with incremental updates | -| `src/session_analytics/queries.py` | Query implementations with `build_where_clause()` helper | -| `src/session_analytics/patterns.py` | Pattern detection (sequences, permission gaps) | - -## Architecture Patterns - -- **Public API**: Use `storage.execute_query()` / `execute_write()` for raw SQL; avoid `_connect()` -- **Formatter Registry**: CLI uses `@_register_formatter(predicate)` decorator pattern -- **Schema Migrations**: Use `@migration(version, name)` decorator in storage.py for DB changes -- **Module Imports**: server.py uses `from session_analytics import queries, patterns, ingest` -- **CLI/MCP Parity**: Always expose new query functions on both CLI and MCP - -**When modifying the API**: Update all discovery surfaces together: -1. **CLI command** - in `cli.py` (visible via `session-analytics-cli --help`) -2. **MCP tool** - in `server.py` (visible to CC via tool inspection) -3. **Usage guide** - `guide.md` (served as `session-analytics://guide` resource) -4. **CLAUDE.md** - This file, for codebase context -5. **~/.claude/contrib/README.md** - User's local contrib directory (lists MCP server data locations) - -## MCP API Naming Conventions - -Standard conventions shared with claude-event-bus. See event-bus CLAUDE.md for the canonical reference. - -### Tool Names - -| Prefix | When to use | Example | -|--------|-------------|---------| -| `list_*` | Enumerate items (no complex filtering) | `list_sessions()` | -| `get_*` | Retrieve data with parameters/filters | `get_events(...)` | -| `search_*` | Full-text/fuzzy search | `search_messages(...)` | -| `analyze_*` | Compute derived insights | `analyze_trends(...)` | -| `ingest_*` | Load/import data | `ingest_logs(...)` | - -### Argument Names - -| Concept | Standard Name | Notes | -|---------|---------------|-------| -| Session identifier | `session_id` | Not `session` or `sid` | -| Max results | `limit` | Not `count` or `max` | -| Time window | `days` | Use fractional for hours: `days=0.5` = 12h | -| Project filter | `project` | Not `project_path` | -| Minimum threshold | `min_count` | Not `threshold` or `min_events` | - ## Design Philosophy -**"Don't over-distill"** (RFC #17): Raw data with light structure beats heavily processed summaries. The LLM can handle context. +This API is consumed by LLMs. Every endpoint should be designed with that in mind. + +### Principle 1: Don't Over-Distill -This means: -- **Surface raw signals, not interpretations**: Return event counts, error rates, and timing data - not pre-computed labels like "success" or "frustrated" -- **Let the LLM interpret**: The consuming LLM has context we don't (user intent, conversation history). It should decide what patterns mean -- **Avoid premature classification**: Don't try to outsmart the LLM by pre-digesting data. Structured raw data is more useful than simplified conclusions +Raw data with light structure beats heavily processed summaries. The LLM can handle context. -Example - instead of: ```python # BAD: Pre-computed interpretation {"outcome": "frustrated", "confidence": 0.75} -``` -Do this: -```python # GOOD: Raw signals for LLM interpretation {"error_count": 5, "error_rate": 0.25, "has_rework": True, "commit_count": 0} ``` -## MCP Tools (28 total) +### Principle 2: Aggregate → Drill-Down -### Status & Ingestion -| Tool | Purpose | -|------|---------| -| `get_status` | Database stats and last ingestion time | -| `ingest_logs` | Refresh data from JSONL files | +Every aggregate endpoint needs a path to actionable detail. If an LLM sees "821 Bash errors", it should be able to discover WHICH commands failed. -### Core Analytics -| Tool | Purpose | -|------|---------| -| `get_tool_frequency` | Tool usage counts with optional breakdown | -| `get_session_events` | Events in time window (supports `session_id` filter) | -| `get_command_frequency` | Bash command breakdown with prefix filter | -| `list_sessions` | Session metadata and token totals | -| `get_token_usage` | Token usage by day, session, or model | - -### Workflow Analysis -| Tool | Purpose | -|------|---------| -| `get_tool_sequences` | Common tool patterns (n-grams) | -| `sample_sequences` | Sample instances of a pattern with context | -| `get_permission_gaps` | Commands needing settings.json entries | -| `get_insights` | Pre-computed patterns for /improve-workflow | +**The test**: Can an LLM go from high-level insight to actionable fix using only MCP calls? -### File & Project Activity -| Tool | Purpose | -|------|---------| -| `get_file_activity` | File reads/edits/writes breakdown | -| `get_languages` | Language distribution from file extensions | -| `get_projects` | Activity across all projects | -| `get_mcp_usage` | MCP server and tool usage breakdown | +See RFC #49 for current drill-down gaps and solutions. -### Agent Activity -| Tool | Purpose | -|------|---------| -| `get_agent_activity` | Task subagent activity vs main session (RFC #41) | +### Principle 3: Self-Play Testing -### Session Analysis -| Tool | Purpose | -|------|---------| -| `get_session_signals` | Raw session metrics for LLM interpretation | -| `classify_sessions` | Categorize sessions (debugging, dev, research) | -| `analyze_failures` | Error patterns and rework detection | -| `analyze_trends` | Compare usage across time periods | -| `get_handoff_context` | Context summary for session handoff | - -### User Messages -| Tool | Purpose | -|------|---------| -| `get_session_messages` | User messages across sessions (supports `session_id` filter) | -| `search_messages` | Full-text search on user messages (FTS5) | +Before merging new API endpoints, test them as an LLM would: -### Session Relationships -| Tool | Purpose | -|------|---------| -| `detect_parallel_sessions` | Find simultaneously active sessions | -| `find_related_sessions` | Find sessions with similar patterns | +1. Start from a high-level question ("What's causing errors?") +2. Use only MCP tools (no direct DB access) +3. Attempt to reach an actionable conclusion +4. If blocked, the API is incomplete -### Git Integration -| Tool | Purpose | -|------|---------| -| `ingest_git_history` | Import git commit history | -| `correlate_git_with_sessions` | Link commits to sessions by timing | -| `get_session_commits` | Session-commit mappings with timing +--- -### Session Discovery and Drill-In Flow +## Quick Reference -1. **Discover sessions**: `list_sessions()` returns all session IDs with basic metadata -2. **Get signals**: `get_session_signals()` returns raw metrics (error_rate, commit_count, etc.) -3. **Drill into session**: - - `get_session_events(session_id=)` - get full event trace - - `get_session_messages(session_id=)` - get all user messages - - `get_session_commits(session_id=)` - get commit associations +### Commands -> **Maintainer note**: This discovery flow is also documented in `src/session_analytics/guide.md` -> (exposed as MCP resource `session-analytics://guide`). Keep both in sync when updating API docs. +```bash +make check # Run fmt, lint, test +make install # Install LaunchAgent + CLI + MCP config +make restart # Restart LaunchAgent to pick up code changes +make reinstall # pip install -e . + restart (for pyproject.toml changes) +``` -## CLI Commands (27 total) +### When to Restart -All commands support `--json` for machine-readable output: +| Change | Action | +|--------|--------| +| `server.py`, `queries.py`, `patterns.py`, `storage.py` | `make restart` | +| `cli.py` only | None (CLI runs fresh) | +| `pyproject.toml` | `make reinstall` | -```bash -# Status & Ingestion -session-analytics-cli status # DB stats -session-analytics-cli ingest --days 30 # Refresh data - -# Core Analytics -session-analytics-cli frequency # Tool usage (--no-expand to hide breakdowns) -session-analytics-cli commands --prefix git # Command breakdown -session-analytics-cli sessions # Session info -session-analytics-cli tokens --by model # Token usage - -# Workflow Analysis -session-analytics-cli sequences # Tool chains (--expand for command-level) -session-analytics-cli sample-sequences # Sample instances with context -session-analytics-cli permissions # Permission gaps -session-analytics-cli insights # For /improve-workflow - -# File & Project Activity -session-analytics-cli file-activity # File reads/edits/writes -session-analytics-cli languages # Language distribution -session-analytics-cli projects # Cross-project activity -session-analytics-cli mcp-usage # MCP server/tool usage - -# Agent Activity -session-analytics-cli agents # Task subagent vs main session (RFC #41) - -# Session Analysis -session-analytics-cli signals # Raw session metrics -session-analytics-cli classify # Categorize sessions -session-analytics-cli failures # Error patterns and rework -session-analytics-cli trends # Compare time periods -session-analytics-cli handoff # Session context summary - -# User Messages -session-analytics-cli journey # User messages across sessions -session-analytics-cli search # Full-text search on messages - -# Session Relationships -session-analytics-cli parallel # Find simultaneous sessions -session-analytics-cli related # Find similar sessions - -# Git Integration -session-analytics-cli git-ingest # Import git history -session-analytics-cli git-correlate # Link commits to sessions -session-analytics-cli session-commits # Show commits per session -``` +### Key Files -### Expand Flags +| File | Purpose | +|------|---------| +| `server.py` | MCP tools + entry point | +| `cli.py` | CLI with formatter registry | +| `storage.py` | SQLite + migrations | +| `ingest.py` | JSONL parsing | +| `queries.py` | Query implementations | +| `patterns.py` | Sequence/permission gap detection | +| `guide.md` | API reference (MCP resource) | -The `--expand` flag shows detailed breakdowns for aggregated tools: +--- -| Command | Default | Flag | Effect | -|---------|---------|------|--------| -| `frequency` | Expanded | `--no-expand` | Show Bash/Skill/Task breakdowns (commands, skills, agents) | -| `sequences` | Tool-level | `--expand` | Expand to command/skill/agent level sequences | +## Adding New Endpoints -**Why different defaults?** -- `frequency` answers "what am I using?" - breakdowns are useful by default -- `sequences` answers "what's my workflow?" - tool-level patterns are clearer by default, command-level is for drilling in +Checklist for adding a new query: -## Integration +1. **Query function** in `queries.py` + - Use `build_where_clause()` helper for filters + - Return structured dict, not raw tuples -### With /improve-workflow +2. **MCP tool** in `server.py` + - Follow naming: `get_*`, `list_*`, `search_*`, `analyze_*` + - Standard args: `days`, `limit`, `session_id`, `project` -The `get_insights` tool (or `session-analytics-cli insights`) provides pre-computed patterns: -- Tool frequency for identifying high-value automations -- Command frequency for settings.json additions -- Tool sequences for workflow optimization -- Permission gaps with ready-to-use suggestions +3. **CLI command** in `cli.py` + - Add formatter with `@_register_formatter(predicate)` + - Support `--json` flag -### With session-start hook +4. **Documentation** in `guide.md` + - Add to appropriate section + - Include example usage + +5. **Self-play test** + - Can you reach actionable info using only MCP? + - If aggregate, what's the drill-down path? + +6. **Run `make check`** + +--- + +## Architecture -Can be used to auto-ingest on session start: -```bash -session-analytics-cli ingest --days 1 --json 2>/dev/null || true +``` +~/.claude/projects/**/*.jsonl → SQLite DB → MCP Server / CLI + ↓ + ~/.claude/contrib/analytics/data.db ``` -## Data Model +### Key Patterns -**Events table**: Individual tool uses with timestamps, tokens, commands -**Sessions table**: Aggregated session metadata -**Patterns table**: Pre-computed patterns for fast querying -**Ingested files table**: Tracks file mtime/size for incremental updates +- **Storage API**: Use `storage.execute_query()` / `execute_write()`; avoid `_connect()` +- **Migrations**: Use `@migration(version, name)` decorator in storage.py +- **Formatters**: CLI uses `@_register_formatter(predicate)` - first match wins +- **CLI/MCP Parity**: Every query should be accessible from both -## Related +### Naming Conventions -- [claude-event-bus](https://github.com/evansenter/claude-event-bus) - Cross-session communication for Claude Code +| Prefix | Use | Example | +|--------|-----|---------| +| `list_*` | Enumerate (no complex filtering) | `list_sessions()` | +| `get_*` | Retrieve with filters | `get_session_events()` | +| `search_*` | Full-text search | `search_messages()` | +| `analyze_*` | Compute insights | `analyze_failures()` | +| `ingest_*` | Load/import data | `ingest_logs()` | -## Reference +| Arg | Standard Name | +|-----|---------------| +| Session ID | `session_id` | +| Max results | `limit` | +| Time window | `days` (fractional OK: `0.5` = 12h) | +| Project filter | `project` | + +--- + +## Data Model -Full implementation plan: `~/.claude/plans/precious-crunching-crescent.md` +| Table | Purpose | +|-------|---------| +| `events` | Individual tool uses with timestamps, tokens, commands | +| `sessions` | Aggregated session metadata | +| `patterns` | Pre-computed patterns (safe to delete - re-computed) | +| `ingested_files` | Tracks file mtime/size for incremental updates | +| `git_commits` | Commit history for session correlation | diff --git a/README.md b/README.md index 5959934..6854e4e 100644 --- a/README.md +++ b/README.md @@ -53,6 +53,9 @@ session-analytics-cli languages # Language distribution session-analytics-cli projects # Activity by project session-analytics-cli mcp-usage # MCP server/tool usage +# Agent Activity +session-analytics-cli agents # Task subagent activity vs main session + # Session Analysis session-analytics-cli signals # Raw session metrics for LLM interpretation session-analytics-cli classify # Categorize sessions (debug/dev/research) @@ -84,143 +87,21 @@ All commands support: ## MCP Tools -When running as an MCP server, these tools are available: - -### Status & Ingestion - -| Tool | Description | -|------|-------------| -| `get_status` | Database stats and last ingestion time | -| `ingest_logs` | Refresh data from JSONL files | - -### Core Analytics - -| Tool | Description | -|------|-------------| -| `get_tool_frequency` | Tool usage counts with optional breakdown | -| `get_session_events` | Events in time window with filtering | -| `get_command_frequency` | Bash command breakdown | -| `list_sessions` | Session metadata and totals | -| `get_token_usage` | Token usage by day/session/model | - -### Workflow Analysis - -| Tool | Description | -|------|-------------| -| `get_tool_sequences` | Common tool patterns (n-grams) | -| `sample_sequences` | Sample instances of a pattern with context | -| `get_permission_gaps` | Commands needing settings.json | -| `get_insights` | Pre-computed patterns for /improve-workflow | - -### File & Project Activity - -| Tool | Description | -|------|-------------| -| `get_file_activity` | File reads/edits/writes breakdown | -| `get_languages` | Language distribution from file extensions | -| `get_projects` | Activity breakdown by project | -| `get_mcp_usage` | MCP server and tool usage | - -### Session Analysis - -| Tool | Description | -|------|-------------| -| `get_session_signals` | Raw session metrics for LLM interpretation | -| `classify_sessions` | Categorize sessions (debugging, dev, research) | -| `analyze_failures` | Error patterns and rework detection | -| `analyze_trends` | Compare usage across time periods | -| `get_handoff_context` | Context summary for session handoff | - -### User Messages - -| Tool | Description | -|------|-------------| -| `get_session_messages` | User messages across sessions | -| `search_messages` | Full-text search on user messages (FTS5) | - -### Session Relationships - -| Tool | Description | -|------|-------------| -| `detect_parallel_sessions` | Find simultaneously active sessions | -| `find_related_sessions` | Find sessions with similar patterns | - -### Git Integration - -| Tool | Description | -|------|-------------| -| `ingest_git_history` | Import git commit history | -| `correlate_git_with_sessions` | Link commits to sessions by timing | -| `get_session_commits` | Get commits associated with a session | - -### Example: get_tool_frequency - -```json -{ - "days": 7, - "total_tool_calls": 1523, - "tools": [ - {"tool": "Read", "count": 423}, - {"tool": "Bash", "count": 312, "breakdown": [{"name": "git", "count": 145}, {"name": "make", "count": 89}]}, - {"tool": "Edit", "count": 289}, - {"tool": "Grep", "count": 156} - ] -} -``` - -### Example: get_permission_gaps - -```json -{ - "gaps": [ - {"command": "npm", "count": 47, "suggestion": "Bash(npm:*)"}, - {"command": "docker", "count": 23, "suggestion": "Bash(docker:*)"} - ] -} -``` - -### Example: get_tool_sequences - -```json -{ - "sequences": [ - {"pattern": "Read → Edit", "count": 156}, - {"pattern": "Grep → Read", "count": 89}, - {"pattern": "Edit → Bash", "count": 67} - ] -} -``` - -### Example: get_session_signals - -```json -{ - "sessions": [ - { - "session_id": "abc123", - "event_count": 45, - "error_rate": 0.04, - "commit_count": 2, - "has_rework": false, - "has_pr_activity": true - } - ] -} -``` - -## Integration with /improve-workflow - -The `get_insights` tool returns pre-computed patterns optimized for the `/improve-workflow` command: - -```bash -session-analytics-cli insights --refresh -``` - -Returns: -- Tool frequency for identifying high-value automations -- Command frequency for settings.json additions -- Tool sequences for workflow optimization -- Permission gaps with ready-to-use `Bash(cmd:*)` suggestions +28 tools available when running as an MCP server: + +| Category | Tools | +|----------|-------| +| **Status** | `get_status`, `ingest_logs` | +| **Analytics** | `get_tool_frequency`, `get_command_frequency`, `get_session_events`, `list_sessions`, `get_token_usage` | +| **Patterns** | `get_tool_sequences`, `sample_sequences`, `get_permission_gaps`, `get_insights` | +| **Files** | `get_file_activity`, `get_languages`, `get_projects`, `get_mcp_usage` | +| **Agents** | `get_agent_activity` | +| **Sessions** | `get_session_signals`, `classify_sessions`, `analyze_failures`, `analyze_trends`, `get_handoff_context` | +| **Messages** | `get_session_messages`, `search_messages` | +| **Relationships** | `detect_parallel_sessions`, `find_related_sessions` | +| **Git** | `ingest_git_history`, `correlate_git_with_sessions`, `get_session_commits` | + +For detailed usage, read the MCP resource `session-analytics://guide` or see [guide.md](src/session_analytics/guide.md). ## Development diff --git a/src/session_analytics/guide.md b/src/session_analytics/guide.md index a9f8585..c233f14 100644 --- a/src/session_analytics/guide.md +++ b/src/session_analytics/guide.md @@ -37,6 +37,7 @@ identify permission gaps. | Tool | Purpose | |------|---------| | `get_tool_sequences(days?, min_count?, length?)` | Common tool chains (e.g., Read → Edit → Bash) | +| `sample_sequences(pattern, limit?, context_events?)` | Random samples of a pattern with surrounding context | | `get_permission_gaps(days?, min_count?)` | Commands that should be in settings.json | | `get_insights(days?, refresh?)` | Pre-computed patterns for /improve-workflow | @@ -58,14 +59,20 @@ identify permission gaps. |------|---------| | `analyze_trends(days?, compare_to?)` | Token/event trends with growth rates | -### User Workflow +### User Messages | Tool | Purpose | |------|---------| -| `get_session_messages(days?, project?)` | User messages across sessions chronologically | -| `find_related_sessions(session_id)` | Find sessions with similar patterns | +| `get_session_messages(days?, project?, session_id?)` | User messages across sessions chronologically | | `search_messages(query, limit?)` | Full-text search on user messages (FTS5) | +### Session Relationships + +| Tool | Purpose | +|------|---------| +| `detect_parallel_sessions(days?, min_overlap_minutes?)` | Find simultaneously active sessions | +| `find_related_sessions(session_id)` | Find sessions with similar patterns | + ### Git Integration | Tool | Purpose | @@ -79,6 +86,7 @@ identify permission gaps. | Tool | Purpose | |------|---------| | `get_session_signals(days?, min_count?)` | Raw session metrics for LLM interpretation | +| `get_handoff_context(session_id?, days?, limit?)` | Recent activity summary for session continuity | ### Agent Activity @@ -156,239 +164,55 @@ analyze_failures() → "These commands tend to fail" analyze_trends() → "Usage is increasing/decreasing" ``` -## Session Discovery and Drill-In - -A common workflow is discovering sessions, getting signals about them, then drilling into interesting ones: - -### 1. Discover sessions -``` -list_sessions(days=7) -→ {sessions: [{id: "abc123", project: "my-repo", event_count: 50}, ...]} -``` - -### 2. Get signals for sessions -``` -get_session_signals(days=7) -→ {sessions: [ - {session_id: "abc123", error_rate: 0.04, commit_count: 2, has_rework: false, ...}, - {session_id: "def456", error_rate: 0.25, commit_count: 0, has_rework: true, ...} - ]} -``` - -The LLM interprets these raw signals - high error rate + rework + no commits might indicate frustration. - -### 3. Drill into an interesting session -``` -# Get full event trace -get_session_events(session_id="abc123") -→ {events: [{tool: "Read", file: "auth.py", ...}, {tool: "Edit", ...}, ...]} - -# Get all user messages -get_session_messages(session_id="abc123") -→ {messages: [{content: "Fix the login bug", ...}, ...]} +## Reference -# Get commit associations -get_session_commits(session_id="abc123") -→ {commits: [{sha: "a1b2c3", time_to_commit_seconds: 1800, is_first_commit: true}]} -``` +### Session Categories -## Common Patterns +`classify_sessions()` returns one of these categories: -### Understanding tool usage +| Category | Criteria | +|----------|----------| +| **debugging** | High error rate (>15%) or 5+ errors | +| **development** | Heavy editing (>30% edits or 3+ writes) | +| **maintenance** | Git/build focus without much editing | +| **research** | Mostly reading/searching codebase | +| **mixed** | No dominant pattern | -``` -# What tools do I use most? -get_tool_frequency(days=30) - -# What bash commands do I run? -get_command_frequency(days=30, prefix="git") # Just git commands -get_command_frequency(days=30) # All commands -``` - -### Finding workflow sequences - -``` -# What 2-tool patterns are common? -get_tool_sequences(length=2, min_count=10) -→ [{pattern: "Read → Edit", count: 234}, {pattern: "Grep → Read", count: 156}, ...] - -# What 3-tool patterns? -get_tool_sequences(length=3, min_count=5) -→ [{pattern: "Read → Edit → Bash", count: 45}, ...] -``` +### Permission Gaps -### Identifying permission gaps +`get_permission_gaps()` returns commands to add to `~/.claude/settings.json`: ``` -# Commands I use frequently but haven't added to settings.json get_permission_gaps(min_count=5) -→ [{command: "npm test", count: 23, suggestion: "Bash(npm test:*)"}, ...] +→ [{command: "npm", count: 23, suggestion: "Bash(npm:*)"}] ``` -Add these to your `~/.claude/settings.json` under `permissions.allow`. +Add suggestions to `permissions.allow` in your settings. -### Token usage analysis - -``` -# Usage by day -get_token_usage(days=30, by="day") - -# Usage by model -get_token_usage(days=30, by="model") - -# Usage by session -get_token_usage(days=7, by="session") -``` - -### Timeline exploration - -``` -# Recent events (1 day = 24 hours) -get_session_events(days=1) - -# Filter by tool -get_session_events(days=1, tool="Bash") - -# Filter by session -get_session_events(session_id="abc123") -``` - -### Session classification - -``` -# Categorize recent sessions by activity type -classify_sessions(days=30) -→ { - sessions: [ - {session_id: "abc", category: "development", confidence: 0.85}, - {session_id: "def", category: "debugging", confidence: 0.72}, - ... - ], - summary: {debugging: 5, development: 12, research: 3, maintenance: 2} - } -``` - -Categories: -- **debugging**: High error rate (>15%) or 5+ errors -- **development**: Heavy editing (>30% edits or 3+ writes) -- **maintenance**: Git/build focus without much editing -- **research**: Mostly reading/searching codebase -- **mixed**: No dominant pattern - -### Failure analysis - -``` -# Analyze failure patterns and rework -analyze_failures(days=30) -→ {total_errors: 45, errors_by_tool: [...], rework_patterns: {...}} -``` - -### Git integration - -``` -# Ingest git history from current repo -ingest_git_history(days=30) -→ {commits_found: 45, commits_added: 42, skipped_malformed: 0} - -# Link commits to sessions (within 5-min buffer of session) -correlate_git_with_sessions(days=30) -→ {sessions_analyzed: 20, commits_correlated: 38} - -# See what commits were made during a session -get_session_commits(session_id="abc123") -→ [{sha: "abc...", time_to_commit_seconds: 1800, is_first_commit: true}] -``` - -### Trend analysis - -``` -# Compare this week to last week -analyze_trends(days=7, compare_to="previous") -→ { - metrics: { - events: {current: 500, previous: 400, change_pct: 25, direction: "up"}, - tokens: {current: 50000, previous: 45000, change_pct: 11, direction: "up"} - } - } - -# Compare to same week last month -analyze_trends(days=7, compare_to="same_last_month") -``` - -## Integration with /improve-workflow +### Git Integration -The `get_insights` tool returns pre-computed patterns specifically formatted -for the `/improve-workflow` command: +Git correlation requires two steps: ``` -get_insights(days=30, refresh=True) -→ { - tool_frequency: [...], - command_frequency: [...], - sequences: [...], - permission_gaps: [...], - summary: {has_gaps: true, top_tools: ["Read", "Edit", "Bash"]} - } +ingest_git_history(days=30) # Parse commits from repo +correlate_git_with_sessions() # Link to sessions by timing +get_session_commits(session_id="abc") # View results ``` -This powers data-driven workflow improvement suggestions. - -## Best Practices - -### Ingestion - -1. **Let auto-refresh work** - Queries auto-ingest when data is stale (>5 min) -2. **Use project filter** - `ingest_logs(project="my-repo")` for faster, focused ingestion -3. **Force refresh sparingly** - `force=True` re-parses everything, slower but thorough - -### Querying - -4. **Start with frequency** - `get_tool_frequency` gives quick overview -5. **Use day filters** - `days=7` for recent trends, `days=30` for patterns -6. **Project filter** - Most queries accept `project` to focus on one repo - -### Permission Gaps - -7. **Check weekly** - Run `get_permission_gaps(min_count=3)` to catch new patterns -8. **Higher min_count = less noise** - Start with `min_count=10` if overwhelmed -9. **Review before adding** - Some commands shouldn't be auto-approved - -### Workflow Improvement - -10. **Use /improve-workflow** - It consumes `get_insights` and generates suggestions -11. **Look for sequences** - Repeated patterns might benefit from automation -12. **Track over time** - Compare `days=7` vs `days=30` to see trend changes +## Tips -## Data Details +- **Auto-refresh**: Queries auto-ingest when data is stale (>5 min). Use `get_status()` to check. +- **Project filter**: Most queries accept `project` - uses LIKE matching, partial names work. +- **Day filters**: `days=7` for recent trends, `days=30` for patterns. +- **Permission gaps**: Compare against `~/.claude/settings.json`. Higher `min_count` = less noise. +- **Sequences**: `length=3` finds complex patterns but needs more data. +- **CLI parity**: `session-analytics-cli` mirrors all MCP tools for terminal use. -### Storage Location +## Data | Item | Path | |------|------| | Database | `~/.claude/contrib/analytics/data.db` | | Source logs | `~/.claude/projects/**/*.jsonl` | -### What's Tracked - -Each event includes: -- Timestamp and session ID -- Tool name and entry type -- For Bash: command prefix (e.g., "git", "npm") -- For file ops: file path -- Token counts (input/output) -- Error status - -### Incremental Ingestion - -The server tracks file mtimes and sizes. Only changed files are re-parsed -on subsequent ingestions, making `ingest_logs` fast for daily use. - -## Tips - -- Data auto-refreshes on query if stale (>5 min since last ingestion) -- Use `get_status()` to check when data was last refreshed -- The `project` filter uses LIKE matching - partial names work -- `get_tool_sequences` with `length=3` finds more complex patterns but needs more data -- Permission gaps compare your usage against `~/.claude/settings.json` -- Token queries help track API usage costs over time -- The CLI (`session-analytics-cli`) mirrors all MCP tools for terminal use +Ingestion is incremental - only changed files are re-parsed.