diff --git a/CLAUDE.md b/CLAUDE.md index 387d83b..cd4dae2 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -30,12 +30,26 @@ Key components: ## Commands ```bash -make check # Run fmt, lint, test (84 tests) +make check # Run fmt, lint, test make install # Install LaunchAgent + CLI make uninstall # Remove LaunchAgent + CLI +make restart # Restart LaunchAgent to pick up code changes make dev # Run in dev mode with auto-reload ``` +### When to restart + +The LaunchAgent runs the installed Python code. After making changes, you need to restart for them to take effect: + +| Change type | Restart needed? | +|-------------|-----------------| +| MCP tools (`server.py`) | Yes - `make restart` | +| Query/pattern logic (`queries.py`, `patterns.py`) | Yes - `make restart` | +| Storage/migrations (`storage.py`) | Yes - `make restart` | +| CLI only (`cli.py`) | No - CLI runs fresh each time | +| Tests | No - pytest runs fresh | +| Documentation (`guide.md`, `CLAUDE.md`) | No | + ## Key Files | File | Purpose | @@ -54,22 +68,81 @@ make dev # Run in dev mode with auto-reload - **Schema Migrations**: Use `@migration(version, name)` decorator in storage.py for DB changes - **Module Imports**: server.py uses `from session_analytics import queries, patterns, ingest` +## MCP API Naming Conventions + +Standard conventions shared with claude-event-bus. See event-bus CLAUDE.md for the canonical reference. + +### Tool Names + +| Prefix | When to use | Example | +|--------|-------------|---------| +| `list_*` | Enumerate items (no complex filtering) | `list_sessions()` | +| `get_*` | Retrieve data with parameters/filters | `get_events(...)` | +| `search_*` | Full-text/fuzzy search | `search_messages(...)` | +| `analyze_*` | Compute derived insights | `analyze_trends(...)` | +| `ingest_*` | Load/import data | `ingest_logs(...)` | + +### Argument Names + +| Concept | Standard Name | Notes | +|---------|---------------|-------| +| Session identifier | `session_id` | Not `session` or `sid` | +| Max results | `limit` | Not `count` or `max` | +| Time window | `days` | Use fractional for hours: `days=0.5` = 12h | +| Project filter | `project` | Not `project_path` | +| Minimum threshold | `min_count` | Not `threshold` or `min_events` | + +## Design Philosophy + +**"Don't over-distill"** (RFC #17): Raw data with light structure beats heavily processed summaries. The LLM can handle context. + +This means: +- **Surface raw signals, not interpretations**: Return event counts, error rates, and timing data - not pre-computed labels like "success" or "frustrated" +- **Let the LLM interpret**: The consuming LLM has context we don't (user intent, conversation history). It should decide what patterns mean +- **Avoid premature classification**: Don't try to outsmart the LLM by pre-digesting data. Structured raw data is more useful than simplified conclusions + +Example - instead of: +```python +# BAD: Pre-computed interpretation +{"outcome": "frustrated", "confidence": 0.75} +``` + +Do this: +```python +# GOOD: Raw signals for LLM interpretation +{"error_count": 5, "error_rate": 0.25, "has_rework": True, "commit_count": 0} +``` + ## MCP Tools | Tool | Purpose | |------|---------| | `get_status` | Database stats and last ingestion time | | `ingest_logs` | Refresh data from JSONL files | -| `query_tool_frequency` | Tool usage counts (Read, Edit, Bash, etc.) | -| `query_timeline` | Events in time window with filtering | -| `query_commands` | Bash command breakdown with prefix filter | -| `query_sessions` | Session metadata and token totals | -| `query_tokens` | Token usage by day, session, or model | -| `query_sequences` | Common tool patterns (n-grams) | -| `query_permission_gaps` | Commands needing settings.json entries | +| `get_tool_frequency` | Tool usage counts (Read, Edit, Bash, etc.) | +| `get_session_events` | Events in time window (supports `session_id` filter) | +| `get_command_frequency` | Bash command breakdown with prefix filter | +| `list_sessions` | Session metadata and token totals (lists all session IDs) | +| `get_token_usage` | Token usage by day, session, or model | +| `get_tool_sequences` | Common tool patterns (n-grams, `length` param for n-gram size) | +| `get_permission_gaps` | Commands needing settings.json entries | | `get_insights` | Pre-computed patterns for /improve-workflow | -| `get_user_journey` | User messages across sessions chronologically | +| `get_session_messages` | User messages across sessions (supports `session_id` filter) | | `search_messages` | Full-text search on user messages (FTS5) | +| `get_session_signals` | Raw session metrics for LLM interpretation (RFC #26) | +| `get_session_commits` | Session-commit mappings with timing (RFC #26) | + +### Session Discovery and Drill-In Flow + +1. **Discover sessions**: `list_sessions()` returns all session IDs with basic metadata +2. **Get signals**: `get_session_signals()` returns raw metrics (error_rate, commit_count, etc.) +3. **Drill into session**: + - `get_session_events(session_id=)` - get full event trace + - `get_session_messages(session_id=)` - get all user messages + - `get_session_commits(session_id=)` - get commit associations + +> **Maintainer note**: This discovery flow is also documented in `src/session_analytics/guide.md` +> (exposed as MCP resource `session-analytics://guide`). Keep both in sync when updating API docs. ## CLI Commands @@ -87,6 +160,8 @@ session-analytics-cli permissions # Permission gaps session-analytics-cli insights # For /improve-workflow session-analytics-cli journey # User messages across sessions session-analytics-cli search # Full-text search on messages +session-analytics-cli signals # Raw session signals (RFC #26) +session-analytics-cli session-commits # Session-commit associations (RFC #26) ``` ## Integration diff --git a/Makefile b/Makefile index a191f7c..4400f97 100644 --- a/Makefile +++ b/Makefile @@ -1,4 +1,4 @@ -.PHONY: check fmt lint test clean install uninstall dev venv +.PHONY: check fmt lint test clean install uninstall restart dev venv # Run all quality gates (format check, lint, tests) check: fmt lint test @@ -56,6 +56,25 @@ install: venv @echo "Make sure ~/.local/bin is in your PATH:" @echo ' export PATH="$$HOME/.local/bin:$$PATH"' +# Restart the LaunchAgent (pick up code changes) +restart: + @PLIST="$$HOME/Library/LaunchAgents/com.evansenter.claude-session-analytics.plist"; \ + if [ -f "$$PLIST" ]; then \ + echo "Restarting session-analytics..."; \ + launchctl unload "$$PLIST" 2>/dev/null || true; \ + launchctl load "$$PLIST"; \ + sleep 1; \ + if launchctl list | grep -q "com.evansenter.claude-session-analytics"; then \ + echo "Service restarted successfully"; \ + else \ + echo "Error: Service failed to start. Check ~/.claude/session-analytics.err"; \ + exit 1; \ + fi; \ + else \ + echo "LaunchAgent not installed. Run: make install"; \ + exit 1; \ + fi + # Uninstall: LaunchAgent + CLI + MCP config uninstall: @echo "Uninstalling..." diff --git a/src/session_analytics/cli.py b/src/session_analytics/cli.py index 10f7d9e..4ea8f75 100644 --- a/src/session_analytics/cli.py +++ b/src/session_analytics/cli.py @@ -16,12 +16,19 @@ from session_analytics.patterns import ( analyze_failures as do_analyze_failures, ) -from session_analytics.patterns import analyze_trends as do_analyze_trends +from session_analytics.patterns import ( + analyze_trends as do_analyze_trends, +) from session_analytics.patterns import ( compute_permission_gaps, compute_sequence_patterns, ) -from session_analytics.patterns import get_insights as do_get_insights +from session_analytics.patterns import ( + get_insights as do_get_insights, +) +from session_analytics.patterns import ( + get_session_signals as do_get_signals, +) from session_analytics.patterns import ( sample_sequences as do_sample_sequences, ) @@ -337,6 +344,60 @@ def _format_handoff_context(data: dict) -> list[str]: return lines +@_register_formatter( + lambda d: "sessions_analyzed" in d + and "sessions" in d + and "error_count" in d.get("sessions", [{}])[0] +) +def _format_signals(data: dict) -> list[str]: + """Format raw session signals for display. + + Per RFC #17: Surfaces raw data for LLM interpretation, no outcome labels. + """ + lines = [ + f"Session Signals (last {data['days']} days)", + f"Sessions analyzed: {data['sessions_analyzed']}", + "", + "Sessions (raw signals for LLM interpretation):", + ] + for sess in data.get("sessions", [])[:15]: + commit_info = f", {sess['commit_count']} commits" if sess.get("commit_count") else "" + error_info = f", {sess['error_rate']:.0%} errors" if sess.get("error_rate", 0) > 0 else "" + rework = " [rework]" if sess.get("has_rework") else "" + pr = " [PR]" if sess.get("has_pr_activity") else "" + lines.append( + f" {sess['session_id'][:16]} - {sess['event_count']} events, " + f"{sess['duration_minutes']:.0f}m{commit_info}{error_info}{rework}{pr}" + ) + if len(data.get("sessions", [])) > 15: + lines.append(f" ... and {len(data['sessions']) - 15} more") + return lines + + +@_register_formatter(lambda d: "commits" in d and "total_commits" in d) +def _format_session_commits(data: dict) -> list[str]: + lines = [ + f"Session Commits (last {data['days']} days)", + f"Total commits: {data['total_commits']}", + "", + ] + if data.get("session_id"): + lines.insert(1, f"Session: {data['session_id']}") + + for commit in data.get("commits", [])[:20]: + sha = commit.get("sha", "")[:8] + time_to = commit.get("time_to_commit_seconds", 0) + first = " (first)" if commit.get("is_first_commit") else "" + session = commit.get("session_id", "")[:12] if not data.get("session_id") else "" + if session: + lines.append(f" {sha} - {time_to}s{first} [{session}]") + else: + lines.append(f" {sha} - {time_to}s{first}") + if len(data.get("commits", [])) > 20: + lines.append(f" ... and {len(data['commits']) - 20} more") + return lines + + @_register_formatter(lambda d: "metrics" in d and "tool_changes" in d) def _format_trends(data: dict) -> list[str]: def format_metric(name: str, metric: dict) -> str: @@ -446,7 +507,7 @@ def cmd_sequences(args): def cmd_permissions(args): """Show permission gaps.""" storage = SQLiteStorage() - patterns = compute_permission_gaps(storage, days=args.days, threshold=args.threshold) + patterns = compute_permission_gaps(storage, days=args.days, threshold=args.min_count) result = { "days": args.days, "gaps": [ @@ -479,7 +540,7 @@ def cmd_sample_sequences(args): result = do_sample_sequences( storage, pattern=args.pattern, - count=args.count, + count=args.limit, context_events=args.context, days=args.days, ) @@ -487,12 +548,14 @@ def cmd_sample_sequences(args): def cmd_journey(args): - """Show user journey across sessions.""" + """Show user messages across sessions.""" storage = SQLiteStorage() + hours = int(args.days * 24) result = get_user_journey( storage, - hours=args.hours, + hours=hours, include_projects=not args.no_projects, + session_id=getattr(args, "session_id", None), limit=args.limit, ) print(format_output(result, args.json)) @@ -533,9 +596,10 @@ def cmd_search(args): def cmd_parallel(args): """Show parallel session detection.""" storage = SQLiteStorage() + hours = int(args.days * 24) result = detect_parallel_sessions( storage, - hours=args.hours, + hours=hours, min_overlap_minutes=args.min_overlap, ) print(format_output(result, args.json)) @@ -579,10 +643,11 @@ def cmd_classify(args): def cmd_handoff(args): """Show handoff context for a session.""" storage = SQLiteStorage() + hours = int(args.days * 24) result = do_get_handoff_context( storage, session_id=args.session_id, - hours=args.hours, + hours=hours, message_limit=args.limit, ) print(format_output(result, args.json)) @@ -621,6 +686,63 @@ def cmd_git_correlate(args): print(format_output(result, args.json)) +def cmd_signals(args): + """Show raw session signals for LLM interpretation (RFC #26, revised per RFC #17).""" + storage = SQLiteStorage() + result = do_get_signals( + storage, + days=args.days, + min_count=args.min_count, + project=args.project, + ) + print(format_output(result, args.json)) + + +def cmd_session_commits(args): + """Show session-commit associations (RFC #26).""" + storage = SQLiteStorage() + commits = storage.get_session_commits(args.session_id) if args.session_id else [] + + # If no session_id, get all session commits from recent days + if not args.session_id: + project_filter = "" + params = [f"-{args.days} days"] + if args.project: + project_filter = "AND s.project_path LIKE ?" + params.append(f"%{args.project}%") + + rows = storage.execute_query( + f""" + SELECT sc.session_id, sc.commit_sha, sc.time_to_commit_seconds, + sc.is_first_commit + FROM session_commits sc + JOIN sessions s ON s.id = sc.session_id + WHERE s.first_seen >= datetime('now', ?) + {project_filter} + ORDER BY s.first_seen DESC + """, + tuple(params), + ) + commits = [ + { + "session_id": r["session_id"], + "sha": r["commit_sha"], + "time_to_commit_seconds": r["time_to_commit_seconds"], + "is_first_commit": bool(r["is_first_commit"]), + } + for r in rows + ] + + result = { + "days": args.days, + "session_id": args.session_id, + "project": getattr(args, "project", None), + "total_commits": len(commits), + "commits": commits, + } + print(format_output(result, args.json)) + + def main(): """CLI entry point.""" epilog = """ @@ -690,7 +812,7 @@ def main(): # permissions sub = subparsers.add_parser("permissions", help="Show permission gaps") sub.add_argument("--days", type=int, default=7, help="Days to analyze (default: 7)") - sub.add_argument("--threshold", type=int, default=5, help="Minimum usage count") + sub.add_argument("--min-count", type=int, default=5, help="Minimum usage count (default: 5)") sub.set_defaults(func=cmd_permissions) # insights @@ -708,17 +830,20 @@ def main(): ) sub.add_argument("pattern", help="Pattern to sample (e.g., 'Read → Edit' or 'Read,Edit')") sub.add_argument("--days", type=int, default=7, help="Days to analyze (default: 7)") - sub.add_argument("--count", type=int, default=5, help="Number of samples (default: 5)") + sub.add_argument("--limit", type=int, default=5, help="Number of samples (default: 5)") sub.add_argument( "--context", type=int, default=2, help="Context events before/after (default: 2)" ) sub.set_defaults(func=cmd_sample_sequences) - # journey - sub = subparsers.add_parser("journey", help="Show user journey across sessions") - sub.add_argument("--hours", type=int, default=24, help="Hours to look back (default: 24)") + # journey (maps to get_session_messages MCP tool) + sub = subparsers.add_parser("journey", help="Show user messages across sessions") + sub.add_argument( + "--days", type=float, default=1, help="Days to look back (default: 1, supports 0.5 for 12h)" + ) sub.add_argument("--limit", type=int, default=100, help="Max messages (default: 100)") sub.add_argument("--no-projects", action="store_true", help="Exclude project info") + sub.add_argument("--session-id", help="Filter to specific session ID") sub.set_defaults(func=cmd_journey) # search @@ -730,7 +855,9 @@ def main(): # parallel sub = subparsers.add_parser("parallel", help="Detect parallel sessions") - sub.add_argument("--hours", type=int, default=24, help="Hours to look back (default: 24)") + sub.add_argument( + "--days", type=float, default=1, help="Days to look back (default: 1, supports 0.5 for 12h)" + ) sub.add_argument("--min-overlap", type=int, default=5, help="Min overlap minutes (default: 5)") sub.set_defaults(func=cmd_parallel) @@ -764,7 +891,9 @@ def main(): # handoff sub = subparsers.add_parser("handoff", help="Get handoff context for a session") sub.add_argument("--session-id", help="Specific session ID (default: most recent)") - sub.add_argument("--hours", type=int, default=4, help="Hours to look back (default: 4)") + sub.add_argument( + "--days", type=float, default=0.17, help="Days to look back (default: 0.17 = ~4 hours)" + ) sub.add_argument("--limit", type=int, default=10, help="Max messages (default: 10)") sub.set_defaults(func=cmd_handoff) @@ -791,6 +920,20 @@ def main(): sub.add_argument("--days", type=int, default=7, help="Days to correlate (default: 7)") sub.set_defaults(func=cmd_git_correlate) + # signals (RFC #26, revised per RFC #17 - raw data, no interpretation) + sub = subparsers.add_parser("signals", help="Show raw session signals for LLM interpretation") + sub.add_argument("--days", type=int, default=7, help="Days to analyze (default: 7)") + sub.add_argument("--min-count", type=int, default=1, help="Min events per session (default: 1)") + sub.add_argument("--project", help="Project path filter") + sub.set_defaults(func=cmd_signals) + + # session-commits (RFC #26) + sub = subparsers.add_parser("session-commits", help="Show session-commit associations") + sub.add_argument("--session-id", help="Specific session ID (default: all recent)") + sub.add_argument("--days", type=int, default=7, help="Days to look back (default: 7)") + sub.add_argument("--project", help="Project path filter") + sub.set_defaults(func=cmd_session_commits) + args = parser.parse_args() args.func(args) diff --git a/src/session_analytics/guide.md b/src/session_analytics/guide.md index 771ab6c..d3ca02b 100644 --- a/src/session_analytics/guide.md +++ b/src/session_analytics/guide.md @@ -20,47 +20,45 @@ identify permission gaps. | Tool | Purpose | |------|---------| -| `query_tool_frequency(days?, project?)` | Tool usage counts (Read, Edit, Bash, etc.) | -| `query_commands(days?, prefix?, project?)` | Bash command breakdown | -| `query_sessions(days?, project?)` | Session metadata and token totals | -| `query_tokens(days?, by?, project?)` | Token usage by day, session, or model | -| `query_timeline(hours?, tool?, session_id?)` | Recent events with filtering | +| `get_tool_frequency(days?, project?)` | Tool usage counts (Read, Edit, Bash, etc.) | +| `get_command_frequency(days?, prefix?, project?)` | Bash command breakdown | +| `list_sessions(days?, project?)` | Session metadata and token totals | +| `get_token_usage(days?, by?, project?)` | Token usage by day, session, or model | +| `get_session_events(days?, tool?, session_id?)` | Recent events with filtering | ### Pattern Analysis | Tool | Purpose | |------|---------| -| `query_sequences(days?, min_count?, length?)` | Common tool chains (e.g., Read → Edit → Bash) | -| `query_permission_gaps(days?, threshold?)` | Commands that should be in settings.json | +| `get_tool_sequences(days?, min_count?, length?)` | Common tool chains (e.g., Read → Edit → Bash) | +| `get_permission_gaps(days?, min_count?)` | Commands that should be in settings.json | | `get_insights(days?, refresh?)` | Pre-computed patterns for /improve-workflow | ### Failure Analysis | Tool | Purpose | |------|---------| -| `query_failure_correlation(days?, project?)` | Correlate tool failures with commands | -| `query_common_failures(days?, min_count?)` | Aggregate failure patterns | +| `analyze_failures(days?, project?)` | Failure patterns, rework, and correlations | ### Session Classification | Tool | Purpose | |------|---------| | `classify_sessions(days?, project?)` | Categorize sessions (debugging, development, research, maintenance) | -| `query_session_progression(session_id)` | Track session stage transitions | ### Trend Analysis | Tool | Purpose | |------|---------| | `analyze_trends(days?, compare_to?)` | Token/event trends with growth rates | -| `compare_periods(days?, metric?)` | Period-over-period comparisons | ### User Workflow | Tool | Purpose | |------|---------| -| `get_user_journey(days?, project?)` | Session summaries with tool chains | +| `get_session_messages(days?, project?)` | User messages across sessions chronologically | | `find_related_sessions(session_id)` | Find sessions with similar patterns | +| `search_messages(query, limit?)` | Full-text search on user messages (FTS5) | ### Git Integration @@ -68,7 +66,13 @@ identify permission gaps. |------|---------| | `ingest_git_history(days?, repo_path?)` | Parse and store git commits | | `correlate_git_with_sessions(days?)` | Link commits to sessions by timing | -| `query_session_commits(session_id)` | Get commits associated with a session | +| `get_session_commits(session_id?)` | Get commits associated with a session | + +### Session Signals + +| Tool | Purpose | +|------|---------| +| `get_session_signals(days?, min_count?)` | Raw session metrics for LLM interpretation | ## Quick Start @@ -87,32 +91,117 @@ Data auto-refreshes when queries detect stale data (>5 min old). ### 3. Query your usage ``` -query_tool_frequency(days=30) +get_tool_frequency(days=30) → {tools: [{name: "Read", count: 1234}, {name: "Edit", count: 567}, ...]} ``` +## Suggested Workflows + +These are common patterns for using the analytics API. They're suggestions, not requirements— +use the APIs however best fits your needs. + +### Workflow: Broad to Narrow + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ BROAD OVERVIEW │ +├─────────────────────────────────────────────────────────────────┤ +│ get_status() → Is data fresh? How many events? │ +│ get_tool_frequency() → What tools are used most? │ +│ get_command_frequency()→ What commands are common? │ +├─────────────────────────────────────────────────────────────────┤ +│ DISCOVER PATTERNS │ +├─────────────────────────────────────────────────────────────────┤ +│ list_sessions() → What sessions exist? │ +│ get_session_signals() → Which sessions look interesting? │ +│ classify_sessions() → What type of work (debug, dev, etc)? │ +├─────────────────────────────────────────────────────────────────┤ +│ DRILL INTO SPECIFICS │ +├─────────────────────────────────────────────────────────────────┤ +│ get_session_events(session_id=X) → Full event trace │ +│ get_session_messages(session_id=X) → User intent │ +│ get_session_commits(session_id=X) → Work products │ +│ search_messages("query") → Find specific topics │ +└─────────────────────────────────────────────────────────────────┘ +``` + +### Workflow: Question-Based + +| Question | Tools to Use | +|----------|-------------| +| "What have I been working on?" | `list_sessions()` → `get_session_messages()` | +| "Why did session X struggle?" | `get_session_signals(session_id=X)` → `get_session_events(session_id=X)` | +| "What workflows can I automate?" | `get_tool_sequences()` → `get_permission_gaps()` | +| "How has my usage changed?" | `analyze_trends()` | +| "What did I do with feature X?" | `search_messages("feature X")` | + +### Workflow: Improvement-Focused + +``` +get_permission_gaps() → "Add these commands to settings.json" +get_tool_sequences() → "These patterns could be automated" +analyze_failures() → "These commands tend to fail" +analyze_trends() → "Usage is increasing/decreasing" +``` + +## Session Discovery and Drill-In + +A common workflow is discovering sessions, getting signals about them, then drilling into interesting ones: + +### 1. Discover sessions +``` +list_sessions(days=7) +→ {sessions: [{id: "abc123", project: "my-repo", event_count: 50}, ...]} +``` + +### 2. Get signals for sessions +``` +get_session_signals(days=7) +→ {sessions: [ + {session_id: "abc123", error_rate: 0.04, commit_count: 2, has_rework: false, ...}, + {session_id: "def456", error_rate: 0.25, commit_count: 0, has_rework: true, ...} + ]} +``` + +The LLM interprets these raw signals - high error rate + rework + no commits might indicate frustration. + +### 3. Drill into an interesting session +``` +# Get full event trace +get_session_events(session_id="abc123") +→ {events: [{tool: "Read", file: "auth.py", ...}, {tool: "Edit", ...}, ...]} + +# Get all user messages +get_session_messages(session_id="abc123") +→ {messages: [{content: "Fix the login bug", ...}, ...]} + +# Get commit associations +get_session_commits(session_id="abc123") +→ {commits: [{sha: "a1b2c3", time_to_commit_seconds: 1800, is_first_commit: true}]} +``` + ## Common Patterns ### Understanding tool usage ``` # What tools do I use most? -query_tool_frequency(days=30) +get_tool_frequency(days=30) # What bash commands do I run? -query_commands(days=30, prefix="git") # Just git commands -query_commands(days=30) # All commands +get_command_frequency(days=30, prefix="git") # Just git commands +get_command_frequency(days=30) # All commands ``` ### Finding workflow sequences ``` # What 2-tool patterns are common? -query_sequences(length=2, min_count=10) +get_tool_sequences(length=2, min_count=10) → [{pattern: "Read → Edit", count: 234}, {pattern: "Grep → Read", count: 156}, ...] # What 3-tool patterns? -query_sequences(length=3, min_count=5) +get_tool_sequences(length=3, min_count=5) → [{pattern: "Read → Edit → Bash", count: 45}, ...] ``` @@ -120,7 +209,7 @@ query_sequences(length=3, min_count=5) ``` # Commands I use frequently but haven't added to settings.json -query_permission_gaps(threshold=5) +get_permission_gaps(min_count=5) → [{command: "npm test", count: 23, suggestion: "Bash(npm test:*)"}, ...] ``` @@ -130,26 +219,26 @@ Add these to your `~/.claude/settings.json` under `permissions.allow`. ``` # Usage by day -query_tokens(days=30, by="day") +get_token_usage(days=30, by="day") # Usage by model -query_tokens(days=30, by="model") +get_token_usage(days=30, by="model") # Usage by session -query_tokens(days=7, by="session") +get_token_usage(days=7, by="session") ``` ### Timeline exploration ``` -# Recent events -query_timeline(hours=24) +# Recent events (1 day = 24 hours) +get_session_events(days=1) # Filter by tool -query_timeline(hours=24, tool="Bash") +get_session_events(days=1, tool="Bash") # Filter by session -query_timeline(session_id="abc123") +get_session_events(session_id="abc123") ``` ### Session classification @@ -177,13 +266,9 @@ Categories: ### Failure analysis ``` -# What commands tend to fail? -query_common_failures(days=30, min_count=3) -→ [{tool: "Bash", command: "cargo test", count: 12}, ...] - -# Correlate failures with context -query_failure_correlation(days=30) -→ {correlations: [{tool: "Bash", command: "npm install", failure_rate: 0.15}, ...]} +# Analyze failure patterns and rework +analyze_failures(days=30) +→ {total_errors: 45, errors_by_tool: [...], rework_patterns: {...}} ``` ### Git integration @@ -198,8 +283,8 @@ correlate_git_with_sessions(days=30) → {sessions_analyzed: 20, commits_correlated: 38} # See what commits were made during a session -query_session_commits(session_id="abc123") -→ [{sha: "abc...", message: "Fix auth bug", timestamp: "..."}] +get_session_commits(session_id="abc123") +→ [{sha: "abc...", time_to_commit_seconds: 1800, is_first_commit: true}] ``` ### Trend analysis @@ -246,14 +331,14 @@ This powers data-driven workflow improvement suggestions. ### Querying -4. **Start with frequency** - `query_tool_frequency` gives quick overview +4. **Start with frequency** - `get_tool_frequency` gives quick overview 5. **Use day filters** - `days=7` for recent trends, `days=30` for patterns 6. **Project filter** - Most queries accept `project` to focus on one repo ### Permission Gaps -7. **Check weekly** - Run `query_permission_gaps(threshold=3)` to catch new patterns -8. **Higher threshold = less noise** - Start with `threshold=10` if overwhelmed +7. **Check weekly** - Run `get_permission_gaps(min_count=3)` to catch new patterns +8. **Higher min_count = less noise** - Start with `min_count=10` if overwhelmed 9. **Review before adding** - Some commands shouldn't be auto-approved ### Workflow Improvement @@ -291,7 +376,7 @@ on subsequent ingestions, making `ingest_logs` fast for daily use. - Data auto-refreshes on query if stale (>5 min since last ingestion) - Use `get_status()` to check when data was last refreshed - The `project` filter uses LIKE matching - partial names work -- `query_sequences` with `length=3` finds more complex patterns but needs more data +- `get_tool_sequences` with `length=3` finds more complex patterns but needs more data - Permission gaps compare your usage against `~/.claude/settings.json` - Token queries help track API usage costs over time - The CLI (`session-analytics-cli`) mirrors all MCP tools for terminal use diff --git a/src/session_analytics/ingest.py b/src/session_analytics/ingest.py index b80bc8e..99a193b 100644 --- a/src/session_analytics/ingest.py +++ b/src/session_analytics/ingest.py @@ -599,6 +599,10 @@ def correlate_git_with_sessions( Associates commits with sessions based on timing - if a commit was made during an active session, it's likely related to that session's work. + RFC #26: Also populates session_commits junction table with timing metadata: + - time_to_commit_seconds: Time from session start to commit + - is_first_commit: Whether this was the first commit in the session + Args: storage: Storage instance days: Number of days to correlate (default: 7) @@ -647,8 +651,12 @@ def correlate_git_with_sessions( # Commits just before starting a session are often related preparatory work buffer = timedelta(minutes=5) - # Collect correlations for batch update - correlations: list[tuple[str, str]] = [] # (session_id, sha) + # Collect correlations for batch update: (session_id, sha) + correlations: list[tuple[str, str]] = [] + # Collect session_commits data: (session_id, sha, time_to_commit_seconds, is_first_commit) + session_commit_links: list[tuple[str, str, int | None, bool]] = [] + # Track first commit per session for is_first_commit calculation + session_first_commits: dict[str, tuple[str, datetime]] = {} # session_id -> (sha, time) for commit in commits: commit_time = commit.timestamp @@ -658,12 +666,34 @@ def correlate_git_with_sessions( # Find matching session (commit within session window ± 5 min buffer) for sr in session_ranges: if (sr["start"] - buffer) <= commit_time <= (sr["end"] + buffer): - correlations.append((sr["session_id"], commit.sha)) + session_id = sr["session_id"] + correlations.append((session_id, commit.sha)) + + # Calculate time to commit (seconds from session start) + time_to_commit = int((commit_time - sr["start"]).total_seconds()) + # Clamp negative values (commits before session start) to 0 + time_to_commit = max(0, time_to_commit) + + # Track earliest commit per session for is_first_commit + if session_id not in session_first_commits: + session_first_commits[session_id] = (commit.sha, commit_time) + elif commit_time < session_first_commits[session_id][1]: + session_first_commits[session_id] = (commit.sha, commit_time) + + session_commit_links.append((session_id, commit.sha, time_to_commit, False)) break + # Mark is_first_commit for each session's earliest commit + session_commit_links_final = [] + for session_id, sha, time_to_commit, _ in session_commit_links: + is_first = session_first_commits.get(session_id, (None,))[0] == sha + session_commit_links_final.append((session_id, sha, time_to_commit, is_first)) + # Batch update all correlations correlated_count = 0 correlation_errors = 0 + session_commits_added = 0 + session_commits_errors = 0 if correlations: try: @@ -684,10 +714,24 @@ def correlate_git_with_sessions( ) correlation_errors = len(correlations) + # RFC #26: Populate session_commits junction table + if session_commit_links_final: + try: + session_commits_added = storage.add_session_commits_batch(session_commit_links_final) + except Exception as e: + logger.error( + "Failed to add %d session_commits: %s", + len(session_commit_links_final), + e, + ) + session_commits_errors = len(session_commit_links_final) + return { "days": days, "sessions_analyzed": len(session_ranges), "commits_checked": len(commits), "commits_correlated": correlated_count, + "session_commits_added": session_commits_added, "correlation_errors": correlation_errors, + "session_commits_errors": session_commits_errors, } diff --git a/src/session_analytics/patterns.py b/src/session_analytics/patterns.py index c4a0273..7629d04 100644 --- a/src/session_analytics/patterns.py +++ b/src/session_analytics/patterns.py @@ -769,6 +769,159 @@ def get_insights( return insights +def get_session_signals( + storage: SQLiteStorage, + days: int = 7, + min_count: int = 1, + project: str | None = None, +) -> dict: + """Get raw session signals for LLM interpretation. + + RFC #26 (revised per RFC #17 principle): Extracts observable session data + without interpretation. Per RFC #17: "Don't over-distill - raw data with + light structure beats heavily processed summaries. The LLM can handle context." + + The consuming LLM should interpret these signals to determine outcomes like + success, abandonment, or frustration based on the full context. + + Args: + storage: Storage instance + days: Number of days to analyze (default: 7) + min_count: Minimum events for a session to be included (default: 1) + project: Optional project path filter + + Returns: + Dict with raw session signals for LLM interpretation + """ + cutoff = datetime.now() - timedelta(days=days) + + # Build optional project filter + project_filter = "" + params: list = [cutoff] + if project: + project_filter = "AND project_path LIKE ?" + params.append(f"%{project}%") + params.append(min_count) + + # Get session summaries with activity metrics + sessions = storage.execute_query( + f""" + SELECT + session_id, + project_path, + COUNT(*) as event_count, + SUM(CASE WHEN is_error = 1 THEN 1 ELSE 0 END) as error_count, + SUM(CASE WHEN tool_name = 'Edit' THEN 1 ELSE 0 END) as edit_count, + SUM(CASE WHEN command = 'git' THEN 1 ELSE 0 END) as git_count, + SUM(CASE WHEN skill_name IS NOT NULL THEN 1 ELSE 0 END) as skill_count, + MIN(timestamp) as first_event, + MAX(timestamp) as last_event + FROM events + WHERE timestamp >= ? + {project_filter} + GROUP BY session_id + HAVING COUNT(*) >= ? + """, + tuple(params), + ) + + # Get commit counts per session from session_commits + commit_counts = storage.execute_query( + """ + SELECT session_id, COUNT(*) as commit_count + FROM session_commits + GROUP BY session_id + """, + (), + ) + commits_by_session = {r["session_id"]: r["commit_count"] for r in commit_counts} + + # Detect rework patterns (file edited 4+ times in session) + rework_sessions = set() + file_edits = storage.execute_query( + """ + SELECT session_id, file_path, COUNT(*) as edit_count + FROM events + WHERE timestamp >= ? + AND tool_name = 'Edit' + AND file_path IS NOT NULL + GROUP BY session_id, file_path + HAVING COUNT(*) >= 4 + """, + (cutoff,), + ) + for row in file_edits: + rework_sessions.add(row["session_id"]) + + # Check for PR-related activity + pr_sessions = set() + pr_events = storage.execute_query( + """ + SELECT DISTINCT session_id + FROM events + WHERE timestamp >= ? + AND ( + (command = 'gh' AND command_args LIKE 'pr %') + OR skill_name LIKE '%pr%' + OR skill_name LIKE '%commit%' + ) + """, + (cutoff,), + ) + for row in pr_events: + pr_sessions.add(row["session_id"]) + + # Build raw signals for each session (no interpretation) + signals = [] + for session in sessions: + session_id = session["session_id"] + event_count = session["event_count"] + error_count = session["error_count"] or 0 + edit_count = session["edit_count"] or 0 + git_count = session["git_count"] or 0 + skill_count = session["skill_count"] or 0 + commit_count = commits_by_session.get(session_id, 0) + + # Calculate derived observables (still factual, not interpretive) + error_rate = error_count / event_count if event_count > 0 else 0 + + first_event = session["first_event"] + last_event = session["last_event"] + if isinstance(first_event, str): + first_event = datetime.fromisoformat(first_event) + if isinstance(last_event, str): + last_event = datetime.fromisoformat(last_event) + duration_minutes = ( + (last_event - first_event).total_seconds() / 60 if first_event and last_event else 0 + ) + + signals.append( + { + "session_id": session_id, + "project_path": session["project_path"], + # Raw counts + "event_count": event_count, + "error_count": error_count, + "edit_count": edit_count, + "git_count": git_count, + "skill_count": skill_count, + "commit_count": commit_count, + # Derived observables + "error_rate": round(error_rate, 3), + "duration_minutes": round(duration_minutes, 1), + # Boolean flags (observable patterns) + "has_rework": session_id in rework_sessions, + "has_pr_activity": session_id in pr_sessions, + } + ) + + return { + "days": days, + "sessions_analyzed": len(signals), + "sessions": signals, + } + + def analyze_trends( storage: SQLiteStorage, days: int = 7, diff --git a/src/session_analytics/queries.py b/src/session_analytics/queries.py index 4c18b65..e0cbbe4 100644 --- a/src/session_analytics/queries.py +++ b/src/session_analytics/queries.py @@ -137,6 +137,7 @@ def query_timeline( end: datetime | None = None, tool: str | None = None, project: str | None = None, + session_id: str | None = None, limit: int = 100, ) -> dict: """Get events in a time window. @@ -147,6 +148,7 @@ def query_timeline( end: End of time window (default: now) tool: Optional tool name filter project: Optional project path filter + session_id: Optional session ID filter (get full session trace) limit: Maximum events to return Returns: @@ -162,6 +164,7 @@ def query_timeline( end=end, tool_name=tool, project_path=project, + session_id=session_id, limit=limit, ) @@ -170,6 +173,7 @@ def query_timeline( "end": end.isoformat(), "tool": tool, "project": project, + "session_id": session_id, "count": len(events), "events": [ { @@ -459,6 +463,7 @@ def get_user_journey( storage: SQLiteStorage, hours: int = 24, include_projects: bool = True, + session_id: str | None = None, limit: int = 100, ) -> dict: """Get all user messages chronologically across sessions. @@ -470,6 +475,7 @@ def get_user_journey( storage: Storage instance hours: Number of hours to look back (default: 24) include_projects: Include project info in output (default: True) + session_id: Optional session ID filter (get messages from specific session) limit: Maximum messages to return (default: 100) Returns: @@ -477,9 +483,17 @@ def get_user_journey( """ cutoff = datetime.now() - timedelta(hours=hours) + # Build query with optional session_id filter + session_filter = "" + params: list = [cutoff] + if session_id: + session_filter = "AND session_id = ?" + params.append(session_id) + params.append(limit) + # Query user messages ordered by timestamp rows = storage.execute_query( - """ + f""" SELECT timestamp, session_id, @@ -489,10 +503,11 @@ def get_user_journey( WHERE timestamp >= ? AND entry_type = 'user' AND user_message_text IS NOT NULL + {session_filter} ORDER BY timestamp ASC LIMIT ? """, - (cutoff, limit), + tuple(params), ) # Build journey events @@ -520,6 +535,7 @@ def get_user_journey( return { "hours": hours, + "session_id": session_id, "message_count": len(journey), "projects_visited": list(projects_seen) if include_projects else None, "project_switches": project_switches if include_projects else None, diff --git a/src/session_analytics/server.py b/src/session_analytics/server.py index 9f315ab..f029313 100644 --- a/src/session_analytics/server.py +++ b/src/session_analytics/server.py @@ -2,16 +2,18 @@ Provides tools for querying Claude Code session logs: - ingest_logs: Refresh data from JSONL files -- query_timeline: Events in time window -- query_tool_frequency: Tool usage counts -- query_commands: Bash command breakdown -- query_sequences: Common tool patterns -- query_permission_gaps: Commands needing settings.json -- query_sessions: Session metadata -- query_tokens: Token usage analysis +- list_sessions: Session metadata +- get_session_events: Events for a session/time window +- get_session_messages: User messages across sessions +- get_session_signals: Raw session signals for LLM interpretation +- get_session_commits: Session-commit mappings +- get_tool_frequency: Tool usage counts +- get_command_frequency: Bash command breakdown +- get_tool_sequences: Common tool patterns +- get_token_usage: Token usage analysis +- get_permission_gaps: Commands needing settings.json - get_insights: Pre-computed patterns for /improve-workflow - get_status: Ingestion status + DB stats -- get_user_journey: User messages across sessions - search_messages: Full-text search on user messages """ @@ -97,7 +99,7 @@ def ingest_logs(days: int = 7, project: str | None = None, force: bool = False) @mcp.tool() -def query_tool_frequency(days: int = 7, project: str | None = None) -> dict: +def get_tool_frequency(days: int = 7, project: str | None = None) -> dict: """Get tool usage frequency counts. Args: @@ -113,20 +115,22 @@ def query_tool_frequency(days: int = 7, project: str | None = None) -> dict: @mcp.tool() -def query_timeline( +def get_session_events( start: str | None = None, end: str | None = None, tool: str | None = None, project: str | None = None, + session_id: str | None = None, limit: int = 100, ) -> dict: - """Get events in a time window. + """Get events in a time window or for a specific session. Args: start: Start time (ISO format, default: 24 hours ago) end: End time (ISO format, default: now) tool: Optional tool name filter project: Optional project path filter + session_id: Optional session ID filter (get full session trace) limit: Maximum events to return (default: 100) Returns: @@ -139,13 +143,21 @@ def query_timeline( queries.ensure_fresh_data(storage) result = queries.query_timeline( - storage, start=start_dt, end=end_dt, tool=tool, project=project, limit=limit + storage, + start=start_dt, + end=end_dt, + tool=tool, + project=project, + session_id=session_id, + limit=limit, ) return {"status": "ok", **result} @mcp.tool() -def query_commands(days: int = 7, project: str | None = None, prefix: str | None = None) -> dict: +def get_command_frequency( + days: int = 7, project: str | None = None, prefix: str | None = None +) -> dict: """Get Bash command breakdown. Args: @@ -162,8 +174,8 @@ def query_commands(days: int = 7, project: str | None = None, prefix: str | None @mcp.tool() -def query_sessions(days: int = 7, project: str | None = None) -> dict: - """Get session metadata. +def list_sessions(days: int = 7, project: str | None = None) -> dict: + """List all sessions with metadata. Args: days: Number of days to analyze (default: 7) @@ -178,7 +190,7 @@ def query_sessions(days: int = 7, project: str | None = None) -> dict: @mcp.tool() -def query_tokens(days: int = 7, project: str | None = None, by: str = "day") -> dict: +def get_token_usage(days: int = 7, project: str | None = None, by: str = "day") -> dict: """Get token usage analysis. Args: @@ -195,7 +207,7 @@ def query_tokens(days: int = 7, project: str | None = None, by: str = "day") -> @mcp.tool() -def query_sequences(days: int = 7, min_count: int = 3, length: int = 2) -> dict: +def get_tool_sequences(days: int = 7, min_count: int = 3, length: int = 2) -> dict: """Get common tool patterns (sequences). Args: @@ -220,7 +232,7 @@ def query_sequences(days: int = 7, min_count: int = 3, length: int = 2) -> dict: @mcp.tool() -def sample_sequences(pattern: str, count: int = 5, context_events: int = 2, days: int = 7) -> dict: +def sample_sequences(pattern: str, limit: int = 5, context_events: int = 2, days: int = 7) -> dict: """Get random samples of a sequence pattern with surrounding context. Instead of just counting "Read → Edit" occurrences, returns actual examples @@ -228,7 +240,7 @@ def sample_sequences(pattern: str, count: int = 5, context_events: int = 2, days Args: pattern: Sequence pattern (e.g., "Read → Edit" or "Read,Edit") - count: Number of random samples to return (default: 5) + limit: Number of random samples to return (default: 5) context_events: Number of events before/after to include (default: 2) days: Number of days to analyze (default: 7) @@ -237,28 +249,28 @@ def sample_sequences(pattern: str, count: int = 5, context_events: int = 2, days """ queries.ensure_fresh_data(storage, days=days) result = patterns.sample_sequences( - storage, pattern=pattern, count=count, context_events=context_events, days=days + storage, pattern=pattern, count=limit, context_events=context_events, days=days ) return {"status": "ok", **result} @mcp.tool() -def query_permission_gaps(days: int = 7, threshold: int = 5) -> dict: +def get_permission_gaps(days: int = 7, min_count: int = 5) -> dict: """Find commands that may need to be added to settings.json. Args: days: Number of days to analyze (default: 7) - threshold: Minimum usage count to suggest (default: 5) + min_count: Minimum usage count to suggest (default: 5) Returns: Commands that are frequently used but not in allowed list """ queries.ensure_fresh_data(storage, days=days) - gap_patterns = patterns.compute_permission_gaps(storage, days=days, threshold=threshold) + gap_patterns = patterns.compute_permission_gaps(storage, days=days, threshold=min_count) return { "status": "ok", "days": days, - "threshold": threshold, + "min_count": min_count, "gaps": [ { "command": p.pattern_key, @@ -271,23 +283,34 @@ def query_permission_gaps(days: int = 7, threshold: int = 5) -> dict: @mcp.tool() -def get_user_journey(hours: int = 24, include_projects: bool = True, limit: int = 100) -> dict: +def get_session_messages( + days: float = 1, + include_projects: bool = True, + session_id: str | None = None, + limit: int = 100, +) -> dict: """Get all user messages chronologically across sessions. Shows how the user moved across sessions and projects over time, revealing task switching, project interleaving, and work patterns. Args: - hours: Number of hours to look back (default: 24) + days: Number of days to look back (default: 1, supports fractions like 0.5 for 12h) include_projects: Include project info in output (default: True) + session_id: Optional session ID filter (get messages from specific session) limit: Maximum messages to return (default: 100) Returns: Journey events with timestamps, sessions, and messages """ - queries.ensure_fresh_data(storage, days=max(1, hours // 24 + 1)) + hours = int(days * 24) + queries.ensure_fresh_data(storage, days=max(1, int(days) + 1)) result = queries.get_user_journey( - storage, hours=hours, include_projects=include_projects, limit=limit + storage, + hours=hours, + include_projects=include_projects, + session_id=session_id, + limit=limit, ) return {"status": "ok", **result} @@ -341,20 +364,21 @@ def search_messages(query: str, limit: int = 50, project: str | None = None) -> @mcp.tool() -def detect_parallel_sessions(hours: int = 24, min_overlap_minutes: int = 5) -> dict: +def detect_parallel_sessions(days: float = 1, min_overlap_minutes: int = 5) -> dict: """Find sessions that were active simultaneously. Identifies when multiple sessions were active at the same time, indicating worktree usage, waiting on CI, or multi-task work. Args: - hours: Number of hours to look back (default: 24) + days: Number of days to look back (default: 1, supports fractions like 0.5 for 12h) min_overlap_minutes: Minimum overlap to consider parallel (default: 5) Returns: Parallel session periods with timing and session details """ - queries.ensure_fresh_data(storage, days=max(1, hours // 24 + 1)) + hours = int(days * 24) + queries.ensure_fresh_data(storage, days=max(1, int(days) + 1)) result = queries.detect_parallel_sessions( storage, hours=hours, min_overlap_minutes=min_overlap_minutes ) @@ -448,9 +472,7 @@ def classify_sessions(days: int = 7, project: str | None = None) -> dict: @mcp.tool() -def get_handoff_context( - session_id: str | None = None, hours: int = 4, message_limit: int = 10 -) -> dict: +def get_handoff_context(session_id: str | None = None, days: float = 0.17, limit: int = 10) -> dict: """Get context for session handoff (useful for /status-report). Provides recent activity summary including last user messages, @@ -458,15 +480,16 @@ def get_handoff_context( Args: session_id: Specific session ID (default: most recent session) - hours: Hours to look back if no session specified (default: 4) - message_limit: Maximum messages to return (default: 10) + days: Days to look back if no session specified (default: 0.17 = ~4 hours) + limit: Maximum messages to return (default: 10) Returns: Handoff context including messages, files, commands, and activity summary """ - queries.ensure_fresh_data(storage, days=max(1, hours // 24 + 1)) + hours = int(days * 24) + queries.ensure_fresh_data(storage, days=max(1, int(days) + 1)) result = queries.get_handoff_context( - storage, session_id=session_id, hours=hours, message_limit=message_limit + storage, session_id=session_id, hours=hours, message_limit=limit ) return {"status": "ok", **result} @@ -527,6 +550,67 @@ def correlate_git_with_sessions(days: int = 7) -> dict: return {"status": "ok", **result} +@mcp.tool() +def get_session_signals(days: int = 7, min_count: int = 1) -> dict: + """Get raw session signals for LLM interpretation. + + RFC #26 (revised per RFC #17 principle): Extracts observable session data + without interpretation. Per RFC #17: "Don't over-distill - raw data with + light structure beats heavily processed summaries. The LLM can handle context." + + Returns raw signals like event counts, error rates, commit counts, and + boolean flags (has_rework, has_pr_activity). The consuming LLM should + interpret these to determine outcomes like success or abandonment. + + Args: + days: Number of days to analyze (default: 7) + min_count: Minimum events for a session to be included (default: 1) + + Returns: + Raw session signals for LLM interpretation + """ + queries.ensure_fresh_data(storage, days=days) + result = patterns.get_session_signals(storage, days=days, min_count=min_count) + return {"status": "ok", **result} + + +@mcp.tool() +def get_session_commits(session_id: str | None = None, days: int = 7) -> dict: + """Get commits associated with sessions. + + RFC #26: Returns commits linked to sessions with timing metadata: + - time_to_commit_seconds: Time from session start to commit + - is_first_commit: Whether this was the first commit in the session + + Args: + session_id: Specific session ID (optional, returns all if not specified) + days: Number of days to look back (default: 7) + + Returns: + Session-commit mappings with timing metadata + """ + queries.ensure_fresh_data(storage, days=days) + + if session_id: + commits = storage.get_session_commits(session_id) + return { + "status": "ok", + "session_id": session_id, + "commit_count": len(commits), + "commits": commits, + } + else: + # Get all session commits + result = storage.get_commits_for_sessions() + total_commits = sum(len(commits) for commits in result.values()) + return { + "status": "ok", + "session_count": len(result), + "total_commits": total_commits, + "sessions": result, + } + + def create_app(): """Create the ASGI app for uvicorn.""" # stateless_http=True allows resilience to server restarts diff --git a/src/session_analytics/storage.py b/src/session_analytics/storage.py index f3f1a33..90c86c1 100644 --- a/src/session_analytics/storage.py +++ b/src/session_analytics/storage.py @@ -86,6 +86,9 @@ class Session: primary_branch: str | None = None slug: str | None = None + # RFC #26: Session enrichment fields (observable data only, no interpretation) + context_switch_count: int = 0 # Number of mid-session topic changes + @dataclass class IngestionState: @@ -139,7 +142,7 @@ def __post_init__(self): DEFAULT_DB_PATH = Path.home() / ".claude" / "contrib" / "analytics" / "data.db" # Schema version for migrations -SCHEMA_VERSION = 3 +SCHEMA_VERSION = 4 # Migration functions: dict of version -> (migration_name, migration_func) # Each migration upgrades FROM version-1 TO version @@ -247,6 +250,47 @@ def migrate_v3(conn): """) +@migration(4, "add_session_enrichment") +def migrate_v4(conn): + """Add columns for RFC #26: session enrichment with observable data. + + Adds: + - Session-commit junction table for time-to-commit metrics + - Context switch count for tracking topic changes + + Note: This migration intentionally does NOT add outcome/satisfaction columns. + Per RFC #17 design principle: "Don't over-distill - raw data with light + structure beats heavily processed summaries. The LLM can handle context." + Outcome classification should be done by the consuming LLM, not pre-computed. + """ + # Check existing session columns + existing_cols = {row[1] for row in conn.execute("PRAGMA table_info(sessions)")} + + # Add observable data columns (no interpretation) + if "context_switch_count" not in existing_cols: + conn.execute("ALTER TABLE sessions ADD COLUMN context_switch_count INTEGER DEFAULT 0") + + # Create session_commits junction table for detailed commit tracking + # This allows tracking time_to_commit and multiple commits per session + conn.execute(""" + CREATE TABLE IF NOT EXISTS session_commits ( + session_id TEXT NOT NULL, + commit_sha TEXT NOT NULL, + time_to_commit_seconds INTEGER, + is_first_commit INTEGER DEFAULT 0, + PRIMARY KEY (session_id, commit_sha), + FOREIGN KEY (session_id) REFERENCES sessions(id), + FOREIGN KEY (commit_sha) REFERENCES git_commits(sha) + ) + """) + conn.execute( + "CREATE INDEX IF NOT EXISTS idx_session_commits_session ON session_commits(session_id)" + ) + conn.execute( + "CREATE INDEX IF NOT EXISTS idx_session_commits_commit ON session_commits(commit_sha)" + ) + + class SQLiteStorage: """SQLite-backed storage for session analytics.""" @@ -411,7 +455,9 @@ def _init_db(self): total_input_tokens INTEGER DEFAULT 0, total_output_tokens INTEGER DEFAULT 0, primary_branch TEXT, - slug TEXT + slug TEXT, + -- RFC #26: Observable session data (no interpretation) + context_switch_count INTEGER DEFAULT 0 ) """) @@ -460,6 +506,27 @@ def _init_db(self): "CREATE INDEX IF NOT EXISTS idx_git_commits_project ON git_commits(project_path)" ) + # Session-commit junction table for detailed commit tracking (RFC #26) + conn.execute(""" + CREATE TABLE IF NOT EXISTS session_commits ( + session_id TEXT NOT NULL, + commit_sha TEXT NOT NULL, + time_to_commit_seconds INTEGER, + is_first_commit INTEGER DEFAULT 0, + PRIMARY KEY (session_id, commit_sha), + FOREIGN KEY (session_id) REFERENCES sessions(id), + FOREIGN KEY (commit_sha) REFERENCES git_commits(sha) + ) + """) + conn.execute( + "CREATE INDEX IF NOT EXISTS idx_session_commits_session " + "ON session_commits(session_id)" + ) + conn.execute( + "CREATE INDEX IF NOT EXISTS idx_session_commits_commit " + "ON session_commits(commit_sha)" + ) + # FTS5 full-text search on user_message_text (RFC #17 Phase 1) conn.execute(""" CREATE VIRTUAL TABLE IF NOT EXISTS events_fts USING fts5( @@ -607,6 +674,7 @@ def get_events_in_range( end: datetime | None = None, tool_name: str | None = None, project_path: str | None = None, + session_id: str | None = None, limit: int = 100, ) -> list[Event]: """Get events within a time range with optional filters.""" @@ -626,6 +694,9 @@ def get_events_in_range( if project_path: conditions.append("project_path = ?") params.append(project_path) + if session_id: + conditions.append("session_id = ?") + params.append(session_id) # Safe: where_clause is built from hardcoded condition strings, not user input where_clause = " AND ".join(conditions) if conditions else "1=1" @@ -690,8 +761,9 @@ def upsert_session(self, session: Session) -> None: id, project_path, first_seen, last_seen, entry_count, tool_use_count, total_input_tokens, total_output_tokens, - primary_branch, slug - ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) + primary_branch, slug, + context_switch_count + ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) """, ( session.id, @@ -704,6 +776,7 @@ def upsert_session(self, session: Session) -> None: session.total_output_tokens, session.primary_branch, session.slug, + session.context_switch_count, ), ) @@ -723,6 +796,15 @@ def get_session_count(self) -> int: def _row_to_session(self, row: sqlite3.Row) -> Session: """Convert a database row to a Session object.""" + + # Helper to safely get column that might not exist in older schema + def get_col(name: str, default=None): + try: + return row[name] + except (IndexError, KeyError): + logger.debug("Column '%s' not found in row, using default %s", name, default) + return default + return Session( id=row["id"], project_path=row["project_path"], @@ -734,6 +816,8 @@ def _row_to_session(self, row: sqlite3.Row) -> Session: total_output_tokens=row["total_output_tokens"], primary_branch=row["primary_branch"], slug=row["slug"], + # RFC #26: Session enrichment (observable data only, no interpretation) + context_switch_count=get_col("context_switch_count", 0), ) # Ingestion state operations @@ -927,6 +1011,128 @@ def get_git_commit_count(self) -> int: row = conn.execute("SELECT COUNT(*) as count FROM git_commits").fetchone() return row["count"] + # Session-commit correlation operations (RFC #26) + + def add_session_commit( + self, + session_id: str, + commit_sha: str, + time_to_commit_seconds: int | None = None, + is_first_commit: bool = False, + ) -> None: + """Link a commit to a session with timing metadata.""" + with self._connect() as conn: + conn.execute( + """ + INSERT OR REPLACE INTO session_commits ( + session_id, commit_sha, time_to_commit_seconds, is_first_commit + ) VALUES (?, ?, ?, ?) + """, + (session_id, commit_sha, time_to_commit_seconds, 1 if is_first_commit else 0), + ) + + def add_session_commits_batch(self, links: list[tuple[str, str, int | None, bool]]) -> int: + """Add multiple session-commit links in a batch. + + Args: + links: List of (session_id, commit_sha, time_to_commit_seconds, is_first_commit) + + Returns: + Number of rows affected + """ + with self._connect() as conn: + cursor = conn.executemany( + """ + INSERT OR REPLACE INTO session_commits ( + session_id, commit_sha, time_to_commit_seconds, is_first_commit + ) VALUES (?, ?, ?, ?) + """, + [(s, c, t, 1 if f else 0) for s, c, t, f in links], + ) + return cursor.rowcount + + def get_session_commits(self, session_id: str) -> list[dict]: + """Get all commits associated with a session. + + Returns: + List of dicts with commit info and timing + """ + with self._connect() as conn: + rows = conn.execute( + """ + SELECT sc.commit_sha, sc.time_to_commit_seconds, sc.is_first_commit, + gc.timestamp, gc.message + FROM session_commits sc + LEFT JOIN git_commits gc ON sc.commit_sha = gc.sha + WHERE sc.session_id = ? + ORDER BY gc.timestamp + """, + (session_id,), + ).fetchall() + + return [ + { + "sha": row["commit_sha"], + "time_to_commit_seconds": row["time_to_commit_seconds"], + "is_first_commit": bool(row["is_first_commit"]), + "timestamp": row["timestamp"], + "message": row["message"], + } + for row in rows + ] + + def get_commits_for_sessions( + self, session_ids: list[str] | None = None + ) -> dict[str, list[dict]]: + """Get commits grouped by session. + + Args: + session_ids: Optional list of session IDs to filter by + + Returns: + Dict mapping session_id to list of commit info dicts + """ + with self._connect() as conn: + if session_ids: + placeholders = ",".join("?" * len(session_ids)) + rows = conn.execute( + f""" + SELECT sc.session_id, sc.commit_sha, sc.time_to_commit_seconds, + sc.is_first_commit, gc.timestamp, gc.message + FROM session_commits sc + LEFT JOIN git_commits gc ON sc.commit_sha = gc.sha + WHERE sc.session_id IN ({placeholders}) + ORDER BY sc.session_id, gc.timestamp + """, + session_ids, + ).fetchall() + else: + rows = conn.execute( + """ + SELECT sc.session_id, sc.commit_sha, sc.time_to_commit_seconds, + sc.is_first_commit, gc.timestamp, gc.message + FROM session_commits sc + LEFT JOIN git_commits gc ON sc.commit_sha = gc.sha + ORDER BY sc.session_id, gc.timestamp + """ + ).fetchall() + + result: dict[str, list[dict]] = {} + for row in rows: + sid = row["session_id"] + if sid not in result: + result[sid] = [] + result[sid].append( + { + "sha": row["commit_sha"], + "time_to_commit_seconds": row["time_to_commit_seconds"], + "is_first_commit": bool(row["is_first_commit"]), + "timestamp": row["timestamp"], + "message": row["message"], + } + ) + return result + # Full-text search operations def search_user_messages( diff --git a/tests/test_cli.py b/tests/test_cli.py index a298bc3..04629da 100644 --- a/tests/test_cli.py +++ b/tests/test_cli.py @@ -14,12 +14,14 @@ cmd_permissions, cmd_search, cmd_sequences, + cmd_session_commits, cmd_sessions, + cmd_signals, cmd_status, cmd_tokens, format_output, ) -from session_analytics.storage import Event, Session, SQLiteStorage +from session_analytics.storage import Event, GitCommit, Session, SQLiteStorage @pytest.fixture @@ -270,7 +272,7 @@ def test_cmd_permissions(self, populated_storage, capsys): class Args: json = False days = 7 - threshold = 1 + min_count = 1 with patch("session_analytics.cli.SQLiteStorage", return_value=populated_storage): cmd_permissions(Args()) @@ -386,3 +388,172 @@ class Args: captured = capsys.readouterr() assert "Search: authentication" in captured.out assert "Results:" in captured.out + + def test_cmd_signals(self, populated_storage, capsys): + """Test signals command (RFC #26, revised per RFC #17).""" + + class Args: + json = False + days = 7 + min_count = 1 + project = None + + with patch("session_analytics.cli.SQLiteStorage", return_value=populated_storage): + cmd_signals(Args()) + + captured = capsys.readouterr() + assert "Session Signals" in captured.out + assert "Sessions analyzed:" in captured.out + + def test_cmd_signals_json(self, populated_storage, capsys): + """Test signals command with JSON output.""" + + class Args: + json = True + days = 7 + min_count = 1 + project = None + + with patch("session_analytics.cli.SQLiteStorage", return_value=populated_storage): + cmd_signals(Args()) + + captured = capsys.readouterr() + assert '"sessions_analyzed"' in captured.out + assert '"sessions"' in captured.out + + def test_cmd_session_commits(self, populated_storage, capsys): + """Test session-commits command (RFC #26).""" + # Add a commit and link it to the session + now = datetime.now() + populated_storage.add_git_commit( + GitCommit(sha="abc1234def", timestamp=now, message="Test commit") + ) + populated_storage.add_session_commit("s1", "abc1234def", 300, True) + + class Args: + json = False + days = 7 + session_id = None + project = None + + with patch("session_analytics.cli.SQLiteStorage", return_value=populated_storage): + cmd_session_commits(Args()) + + captured = capsys.readouterr() + assert "Session Commits" in captured.out + assert "Total commits:" in captured.out + + def test_cmd_session_commits_specific_session(self, populated_storage, capsys): + """Test session-commits command for specific session.""" + now = datetime.now() + populated_storage.add_git_commit( + GitCommit(sha="def5678abc", timestamp=now, message="Test commit 2") + ) + populated_storage.add_session_commit("s1", "def5678abc", 600, False) + + class Args: + json = False + days = 7 + session_id = "s1" + project = None + + with patch("session_analytics.cli.SQLiteStorage", return_value=populated_storage): + cmd_session_commits(Args()) + + captured = capsys.readouterr() + assert "Session Commits" in captured.out + assert "Total commits:" in captured.out + + +class TestRFC26Formatters: + """Tests for RFC #26 output formatters (revised per RFC #17 - raw signals only).""" + + def test_signals_format(self): + """Test signals formatting (raw data, no interpretation).""" + data = { + "days": 7, + "sessions_analyzed": 5, + "sessions": [ + { + "session_id": "session-1-abc", + "project_path": "/test", + "event_count": 50, + "error_count": 2, + "edit_count": 10, + "git_count": 5, + "skill_count": 3, + "commit_count": 2, + "error_rate": 0.04, + "duration_minutes": 45.0, + "has_rework": False, + "has_pr_activity": True, + }, + { + "session_id": "session-2-def", + "project_path": "/test", + "event_count": 20, + "error_count": 5, + "edit_count": 8, + "git_count": 1, + "skill_count": 0, + "commit_count": 0, + "error_rate": 0.25, + "duration_minutes": 30.0, + "has_rework": True, + "has_pr_activity": False, + }, + ], + } + result = format_output(data) + assert "Session Signals" in result + assert "Sessions analyzed: 5" in result + assert "session-1-abc" in result + assert "50 events" in result + assert "[PR]" in result + assert "[rework]" in result + + def test_session_commits_format(self): + """Test session commits formatting.""" + data = { + "days": 7, + "session_id": None, + "total_commits": 3, + "commits": [ + { + "session_id": "session-1", + "sha": "abc1234def5678", + "time_to_commit_seconds": 300, + "is_first_commit": True, + }, + { + "session_id": "session-1", + "sha": "def5678abc1234", + "time_to_commit_seconds": 600, + "is_first_commit": False, + }, + ], + } + result = format_output(data) + assert "Session Commits" in result + assert "Total commits: 3" in result + assert "abc1234d" in result # First 8 chars of SHA + assert "300s" in result + assert "(first)" in result + + def test_session_commits_format_specific_session(self): + """Test session commits formatting for specific session.""" + data = { + "days": 7, + "session_id": "session-specific", + "total_commits": 1, + "commits": [ + { + "sha": "abc1234def5678", + "time_to_commit_seconds": 450, + "is_first_commit": True, + }, + ], + } + result = format_output(data) + assert "session-specific" in result + assert "450s" in result diff --git a/tests/test_patterns.py b/tests/test_patterns.py index 523f1c1..542e047 100644 --- a/tests/test_patterns.py +++ b/tests/test_patterns.py @@ -1089,3 +1089,171 @@ def test_insights_graceful_degradation(self, storage): assert "has_trends" in insights["summary"] assert "has_failure_analysis" in insights["summary"] assert "has_classification" in insights["summary"] + + +class TestGetSessionSignals: + """Tests for RFC #26 session signals (revised per RFC #17 - raw data only).""" + + def test_get_signals_empty_database(self, storage): + """Test with empty database.""" + from session_analytics.patterns import get_session_signals + + result = get_session_signals(storage, days=7) + + assert result["sessions_analyzed"] == 0 + assert result["sessions"] == [] + + def test_get_signals_with_commits(self, storage): + """Test that commit counts are included in signals.""" + from session_analytics.patterns import get_session_signals + from session_analytics.storage import GitCommit, Session + + now = datetime.now() + + # Create session with events + events = [ + Event( + id=None, + uuid=f"sig-{i}", + timestamp=now - timedelta(hours=1, minutes=i), + session_id="signal-session", + project_path="/project", + entry_type="tool_use", + tool_name="Edit" if i % 2 == 0 else "Read", + file_path=f"/file{i}.py", + ) + for i in range(15) + ] + storage.add_events_batch(events) + + # Create session record + storage.upsert_session(Session(id="signal-session", project_path="/project")) + + # Add commit and link it + storage.add_git_commit(GitCommit(sha="abc1234", timestamp=now)) + storage.add_session_commit("signal-session", "abc1234", 300, True) + + result = get_session_signals(storage, days=7, min_count=5) + + # Should have raw signals, no outcome classification + assert result["sessions_analyzed"] == 1 + session = result["sessions"][0] + assert session["session_id"] == "signal-session" + assert session["commit_count"] == 1 + assert session["event_count"] == 15 + assert "outcome" not in session # No interpretation + assert "confidence" not in session # No interpretation + + def test_get_signals_with_errors(self, storage): + """Test that error rates are included in signals.""" + from session_analytics.patterns import get_session_signals + + now = datetime.now() + + # Create session with some errors + events = [] + for i in range(10): + events.append( + Event( + id=None, + uuid=f"err-use-{i}", + timestamp=now - timedelta(hours=1, minutes=i * 2), + session_id="error-session", + project_path="/project", + entry_type="tool_use", + tool_name="Edit", + tool_id=f"tool-{i}", + file_path="/file.py", + is_error=(i < 3), # 3 errors out of 10 + ) + ) + storage.add_events_batch(events) + + result = get_session_signals(storage, days=7, min_count=5) + + # Should include error rate as raw signal + assert result["sessions_analyzed"] == 1 + session = result["sessions"][0] + assert session["error_count"] == 3 + assert session["error_rate"] == 0.3 + assert "outcome" not in session # No interpretation + + def test_get_signals_min_count_filter(self, storage): + """Test that sessions below min_count threshold are excluded.""" + from session_analytics.patterns import get_session_signals + + now = datetime.now() + + # Create session with only 3 events + events = [ + Event( + id=None, + uuid=f"small-{i}", + timestamp=now - timedelta(hours=1, minutes=i), + session_id="small-session", + project_path="/project", + entry_type="tool_use", + tool_name="Read", + ) + for i in range(3) + ] + storage.add_events_batch(events) + + result = get_session_signals(storage, days=7, min_count=5) + + # Session should be excluded due to min_count + assert result["sessions_analyzed"] == 0 + + def test_get_signals_includes_all_raw_fields(self, storage): + """Test that all expected raw signal fields are present.""" + from session_analytics.patterns import get_session_signals + from session_analytics.storage import Session + + now = datetime.now() + + # Create session with various activity + events = [ + Event( + id=None, + uuid=f"full-{i}", + timestamp=now - timedelta(hours=1, minutes=i), + session_id="full-session", + project_path="/project", + entry_type="tool_use", + tool_name="Edit", + file_path="/file.py", + command="git" if i == 0 else None, + skill_name="commit" if i == 1 else None, + ) + for i in range(10) + ] + storage.add_events_batch(events) + storage.upsert_session(Session(id="full-session", project_path="/project")) + + result = get_session_signals(storage, days=7, min_count=5) + + assert result["sessions_analyzed"] == 1 + session = result["sessions"][0] + + # Verify all expected raw signal fields + expected_fields = [ + "session_id", + "project_path", + "event_count", + "error_count", + "edit_count", + "git_count", + "skill_count", + "commit_count", + "error_rate", + "duration_minutes", + "has_rework", + "has_pr_activity", + ] + for field in expected_fields: + assert field in session, f"Missing field: {field}" + + # Verify NO interpretation fields + interpretation_fields = ["outcome", "confidence", "satisfaction_score"] + for field in interpretation_fields: + assert field not in session, f"Unexpected interpretation field: {field}" diff --git a/tests/test_queries.py b/tests/test_queries.py index dd8c8b9..fb6247d 100644 --- a/tests/test_queries.py +++ b/tests/test_queries.py @@ -189,6 +189,13 @@ def test_timeline_with_time_range(self, populated_storage): assert ts >= start assert ts <= end + def test_timeline_with_session_id_filter(self, populated_storage): + """Test timeline with session_id filter.""" + result = query_timeline(populated_storage, session_id="session-1", limit=100) + assert result["session_id"] == "session-1" + for event in result["events"]: + assert event["session_id"] == "session-1" + class TestQueryCommands: """Tests for command queries.""" @@ -371,6 +378,42 @@ def test_journey_excludes_tool_events(self, storage): # Should only have the user message, not the tool use assert result["message_count"] == 1 + def test_journey_with_session_id_filter(self, storage): + """Test get_user_journey with session_id filter.""" + from session_analytics.queries import get_user_journey + + now = datetime.now() + # Add user messages from two different sessions + storage.add_event( + Event( + id=None, + uuid="journey-1", + timestamp=now - timedelta(hours=1), + session_id="session-target", + project_path="project-a", + entry_type="user", + user_message_text="Message from target session", + ) + ) + storage.add_event( + Event( + id=None, + uuid="journey-2", + timestamp=now - timedelta(hours=1), + session_id="session-other", + project_path="project-a", + entry_type="user", + user_message_text="Message from other session", + ) + ) + + # Filter to only target session + result = get_user_journey(storage, hours=24, session_id="session-target") + + assert result["session_id"] == "session-target" + assert result["message_count"] == 1 + assert result["journey"][0]["session_id"] == "session-target" + class TestDetectParallelSessions: """Tests for detect_parallel_sessions function.""" diff --git a/tests/test_server.py b/tests/test_server.py index d203b57..b02a5d3 100644 --- a/tests/test_server.py +++ b/tests/test_server.py @@ -1,16 +1,16 @@ """Tests for the MCP server.""" from session_analytics.server import ( + get_command_frequency, get_insights, + get_permission_gaps, + get_session_events, get_status, + get_token_usage, + get_tool_frequency, + get_tool_sequences, ingest_logs, - query_commands, - query_permission_gaps, - query_sequences, - query_sessions, - query_timeline, - query_tokens, - query_tool_frequency, + list_sessions, search_messages, ) @@ -34,9 +34,9 @@ def test_ingest_logs(): assert "events_added" in result -def test_query_tool_frequency(): - """Test that query_tool_frequency returns tool counts.""" - result = query_tool_frequency.fn(days=7) +def test_get_tool_frequency(): + """Test that get_tool_frequency returns tool counts.""" + result = get_tool_frequency.fn(days=7) assert result["status"] == "ok" assert "days" in result assert "total_tool_calls" in result @@ -44,9 +44,9 @@ def test_query_tool_frequency(): assert isinstance(result["tools"], list) -def test_query_timeline(): - """Test that query_timeline returns events.""" - result = query_timeline.fn(limit=10) +def test_get_session_events(): + """Test that get_session_events returns events.""" + result = get_session_events.fn(limit=10) assert result["status"] == "ok" assert "start" in result assert "end" in result @@ -54,9 +54,9 @@ def test_query_timeline(): assert isinstance(result["events"], list) -def test_query_commands(): - """Test that query_commands returns command counts.""" - result = query_commands.fn(days=7) +def test_get_command_frequency(): + """Test that get_command_frequency returns command counts.""" + result = get_command_frequency.fn(days=7) assert result["status"] == "ok" assert "days" in result assert "total_commands" in result @@ -64,9 +64,9 @@ def test_query_commands(): assert isinstance(result["commands"], list) -def test_query_sessions(): - """Test that query_sessions returns session info.""" - result = query_sessions.fn(days=7) +def test_list_sessions(): + """Test that list_sessions returns session info.""" + result = list_sessions.fn(days=7) assert result["status"] == "ok" assert "days" in result assert "session_count" in result @@ -74,9 +74,9 @@ def test_query_sessions(): assert isinstance(result["sessions"], list) -def test_query_tokens(): - """Test that query_tokens returns token breakdown.""" - result = query_tokens.fn(days=7, by="day") +def test_get_token_usage(): + """Test that get_token_usage returns token breakdown.""" + result = get_token_usage.fn(days=7, by="day") assert result["status"] == "ok" assert "days" in result assert "group_by" in result @@ -84,18 +84,18 @@ def test_query_tokens(): assert isinstance(result["breakdown"], list) -def test_query_sequences(): - """Test that query_sequences returns sequence patterns.""" - result = query_sequences.fn(days=7, min_count=1, length=2) +def test_get_tool_sequences(): + """Test that get_tool_sequences returns sequence patterns.""" + result = get_tool_sequences.fn(days=7, min_count=1, length=2) assert result["status"] == "ok" assert "days" in result assert "sequences" in result assert isinstance(result["sequences"], list) -def test_query_permission_gaps(): - """Test that query_permission_gaps returns gap analysis.""" - result = query_permission_gaps.fn(days=7, threshold=1) +def test_get_permission_gaps(): + """Test that get_permission_gaps returns gap analysis.""" + result = get_permission_gaps.fn(days=7, min_count=1) assert result["status"] == "ok" assert "days" in result assert "gaps" in result diff --git a/tests/test_storage.py b/tests/test_storage.py index f9729be..1fcdd5a 100644 --- a/tests/test_storage.py +++ b/tests/test_storage.py @@ -124,6 +124,45 @@ def test_get_events_by_tool(self, storage): assert len(bash_events) == 1 assert bash_events[0].tool_name == "Bash" + def test_get_events_by_session_id(self, storage): + """Test filtering events by session ID.""" + # Add events from different sessions + storage.add_event( + Event( + id=None, + uuid="uuid-1", + timestamp=datetime.now(), + session_id="session-alpha", + tool_name="Bash", + ) + ) + storage.add_event( + Event( + id=None, + uuid="uuid-2", + timestamp=datetime.now(), + session_id="session-alpha", + tool_name="Read", + ) + ) + storage.add_event( + Event( + id=None, + uuid="uuid-3", + timestamp=datetime.now(), + session_id="session-beta", + tool_name="Edit", + ) + ) + + # Filter by session + alpha_events = storage.get_events_in_range(session_id="session-alpha") + assert len(alpha_events) == 2 + + beta_events = storage.get_events_in_range(session_id="session-beta") + assert len(beta_events) == 1 + assert beta_events[0].session_id == "session-beta" + class TestSessionOperations: """Tests for session CRUD operations.""" @@ -685,3 +724,141 @@ def test_fts_trigger_on_update_value_to_null(self, storage): # Should no longer be in FTS results = storage.search_user_messages("removable") assert len(results) == 0 + + +class TestSessionCommits: + """Tests for RFC #26 session_commits junction table.""" + + def test_add_session_commit(self, storage): + """Test adding a single session-commit link.""" + # First create the session and commit + storage.upsert_session(Session(id="session-1", project_path="project-a")) + storage.add_git_commit(GitCommit(sha="abc1234", timestamp=datetime.now())) + + # Link them + storage.add_session_commit( + session_id="session-1", + commit_sha="abc1234", + time_to_commit_seconds=300, + is_first_commit=True, + ) + + # Verify + commits = storage.get_session_commits("session-1") + assert len(commits) == 1 + assert commits[0]["sha"] == "abc1234" + assert commits[0]["time_to_commit_seconds"] == 300 + assert commits[0]["is_first_commit"] is True + + def test_add_session_commits_batch(self, storage): + """Test batch adding session-commit links.""" + # Create session and commits + storage.upsert_session(Session(id="session-1")) + storage.add_git_commits_batch( + [ + GitCommit(sha="aaa1111", timestamp=datetime.now()), + GitCommit(sha="bbb2222", timestamp=datetime.now()), + GitCommit(sha="ccc3333", timestamp=datetime.now()), + ] + ) + + # Batch link + links = [ + ("session-1", "aaa1111", 100, True), + ("session-1", "bbb2222", 200, False), + ("session-1", "ccc3333", 300, False), + ] + count = storage.add_session_commits_batch(links) + assert count == 3 + + # Verify + commits = storage.get_session_commits("session-1") + assert len(commits) == 3 + + def test_get_commits_for_sessions(self, storage): + """Test getting commits for multiple sessions.""" + # Create sessions + storage.upsert_session(Session(id="session-1")) + storage.upsert_session(Session(id="session-2")) + + # Create commits + storage.add_git_commits_batch( + [ + GitCommit(sha="aaa1111", timestamp=datetime.now()), + GitCommit(sha="bbb2222", timestamp=datetime.now()), + GitCommit(sha="ccc3333", timestamp=datetime.now()), + ] + ) + + # Link commits to sessions + storage.add_session_commits_batch( + [ + ("session-1", "aaa1111", 100, True), + ("session-1", "bbb2222", 200, False), + ("session-2", "ccc3333", 150, True), + ] + ) + + # Get all session commits + result = storage.get_commits_for_sessions() + assert "session-1" in result + assert "session-2" in result + assert len(result["session-1"]) == 2 + assert len(result["session-2"]) == 1 + + # Get for specific sessions + result = storage.get_commits_for_sessions(["session-1"]) + assert "session-1" in result + assert "session-2" not in result + + def test_session_commit_replace_on_conflict(self, storage): + """Test that INSERT OR REPLACE updates existing links.""" + storage.upsert_session(Session(id="session-1")) + storage.add_git_commit(GitCommit(sha="abc1234", timestamp=datetime.now())) + + # First insert + storage.add_session_commit("session-1", "abc1234", 100, False) + commits = storage.get_session_commits("session-1") + assert commits[0]["time_to_commit_seconds"] == 100 + assert commits[0]["is_first_commit"] is False + + # Update via INSERT OR REPLACE + storage.add_session_commit("session-1", "abc1234", 200, True) + commits = storage.get_session_commits("session-1") + assert len(commits) == 1 + assert commits[0]["time_to_commit_seconds"] == 200 + assert commits[0]["is_first_commit"] is True + + +class TestSessionEnrichmentFields: + """Tests for RFC #26 session enrichment fields (observable data only). + + Per RFC #17 principle: "Don't over-distill - raw data with light structure + beats heavily processed summaries." We only store observable data like + context_switch_count, not interpretation fields like outcome/satisfaction. + """ + + def test_session_with_context_switch_count(self, storage): + """Test storing and retrieving context switch count.""" + session = Session( + id="session-context-1", + context_switch_count=3, + ) + storage.upsert_session(session) + + rows = storage.execute_query( + "SELECT context_switch_count FROM sessions WHERE id = ?", + ("session-context-1",), + ) + assert rows[0]["context_switch_count"] == 3 + + def test_session_context_switch_default(self, storage): + """Test that context_switch_count defaults to 0.""" + session = Session(id="session-default") + storage.upsert_session(session) + + rows = storage.execute_query( + "SELECT context_switch_count FROM sessions WHERE id = ?", + ("session-default",), + ) + assert rows[0]["context_switch_count"] == 0