diff --git a/CLAUDE.md b/CLAUDE.md index cd4dae2..6004fd7 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -31,24 +31,26 @@ Key components: ```bash make check # Run fmt, lint, test -make install # Install LaunchAgent + CLI +make install # Install LaunchAgent + CLI + MCP config make uninstall # Remove LaunchAgent + CLI make restart # Restart LaunchAgent to pick up code changes +make reinstall # pip install -e . + restart (for pyproject.toml changes) make dev # Run in dev mode with auto-reload ``` ### When to restart -The LaunchAgent runs the installed Python code. After making changes, you need to restart for them to take effect: +The install is editable (`pip install -e .`), so Python code changes are picked up automatically by the CLI. The MCP server (LaunchAgent) needs a restart to see changes. -| Change type | Restart needed? | -|-------------|-----------------| -| MCP tools (`server.py`) | Yes - `make restart` | -| Query/pattern logic (`queries.py`, `patterns.py`) | Yes - `make restart` | -| Storage/migrations (`storage.py`) | Yes - `make restart` | -| CLI only (`cli.py`) | No - CLI runs fresh each time | -| Tests | No - pytest runs fresh | -| Documentation (`guide.md`, `CLAUDE.md`) | No | +| Change type | Action needed | +|-------------|---------------| +| MCP tools (`server.py`) | `make restart` | +| Query/pattern logic (`queries.py`, `patterns.py`) | `make restart` | +| Storage/migrations (`storage.py`) | `make restart` | +| CLI only (`cli.py`) | None - CLI runs fresh each time | +| `pyproject.toml` (entry points, deps) | `make reinstall` | +| Tests | None - pytest runs fresh | +| Documentation (`guide.md`, `CLAUDE.md`) | None | ## Key Files @@ -67,6 +69,7 @@ The LaunchAgent runs the installed Python code. After making changes, you need t - **Formatter Registry**: CLI uses `@_register_formatter(predicate)` decorator pattern - **Schema Migrations**: Use `@migration(version, name)` decorator in storage.py for DB changes - **Module Imports**: server.py uses `from session_analytics import queries, patterns, ingest` +- **CLI/MCP Parity**: Always expose new query functions on both CLI and MCP. Add MCP tool in `server.py`, CLI command in `cli.py`, document in both `guide.md` and this file ## MCP API Naming Conventions @@ -131,6 +134,10 @@ Do this: | `search_messages` | Full-text search on user messages (FTS5) | | `get_session_signals` | Raw session metrics for LLM interpretation (RFC #26) | | `get_session_commits` | Session-commit mappings with timing (RFC #26) | +| `get_file_activity` | File reads/edits/writes with breakdown | +| `get_languages` | Language distribution from file extensions | +| `get_projects` | Activity across all projects | +| `get_mcp_usage` | MCP server and tool usage breakdown | ### Session Discovery and Drill-In Flow @@ -151,19 +158,36 @@ All commands support `--json` for machine-readable output: ```bash session-analytics-cli status # DB stats session-analytics-cli ingest --days 30 # Refresh data -session-analytics-cli frequency # Tool usage +session-analytics-cli frequency # Tool usage (--no-expand to hide breakdowns) session-analytics-cli commands --prefix git # Command breakdown session-analytics-cli sessions # Session info session-analytics-cli tokens --by model # Token usage -session-analytics-cli sequences # Tool chains +session-analytics-cli sequences # Tool chains (--expand for command-level) session-analytics-cli permissions # Permission gaps session-analytics-cli insights # For /improve-workflow session-analytics-cli journey # User messages across sessions session-analytics-cli search # Full-text search on messages session-analytics-cli signals # Raw session signals (RFC #26) session-analytics-cli session-commits # Session-commit associations (RFC #26) +session-analytics-cli file-activity # File reads/edits/writes +session-analytics-cli languages # Language distribution +session-analytics-cli projects # Cross-project activity +session-analytics-cli mcp-usage # MCP server/tool usage ``` +### Expand Flags + +The `--expand` flag shows detailed breakdowns for aggregated tools: + +| Command | Default | Flag | Effect | +|---------|---------|------|--------| +| `frequency` | Expanded | `--no-expand` | Show Bash/Skill/Task breakdowns (commands, skills, agents) | +| `sequences` | Tool-level | `--expand` | Expand to command/skill/agent level sequences | + +**Why different defaults?** +- `frequency` answers "what am I using?" - breakdowns are useful by default +- `sequences` answers "what's my workflow?" - tool-level patterns are clearer by default, command-level is for drilling in + ## Integration ### With /improve-workflow diff --git a/Makefile b/Makefile index 4400f97..2257767 100644 --- a/Makefile +++ b/Makefile @@ -1,4 +1,4 @@ -.PHONY: check fmt lint test clean install uninstall restart dev venv +.PHONY: check fmt lint test clean install uninstall restart reinstall dev venv # Run all quality gates (format check, lint, tests) check: fmt lint test @@ -75,6 +75,12 @@ restart: exit 1; \ fi +# Reinstall: pip install + restart LaunchAgent (picks up code changes) +reinstall: venv + @echo "Reinstalling package..." + .venv/bin/pip install -e . + @$(MAKE) restart + # Uninstall: LaunchAgent + CLI + MCP config uninstall: @echo "Uninstalling..." diff --git a/scripts/global-report.sh b/scripts/global-report.sh new file mode 100755 index 0000000..b57456d --- /dev/null +++ b/scripts/global-report.sh @@ -0,0 +1,113 @@ +#!/bin/bash +# Generate a 7-day global analytics report +# Outputs to /tmp/session-analytics-report.md + +set -e + +OUTPUT="/tmp/session-analytics-report.md" +DAYS=7 +CLI="session-analytics-cli" + +# Check if CLI is available +if ! command -v "$CLI" &> /dev/null; then + # Try the venv version + SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" + CLI="$SCRIPT_DIR/../.venv/bin/session-analytics-cli" + if [[ ! -x "$CLI" ]]; then + echo "Error: session-analytics-cli not found" >&2 + exit 1 + fi +fi + +echo "Generating $DAYS-day global report..." + +{ + echo "# Claude Code Session Analytics Report" + echo "" + echo "Generated: $(date '+%Y-%m-%d %H:%M:%S')" + echo "Period: Last $DAYS days" + echo "" + + echo "## Status" + echo "" + echo '```' + "$CLI" status + echo '```' + echo "" + + echo "## Tool Usage" + echo "" + echo '```' + "$CLI" frequency --days "$DAYS" + echo '```' + echo "" + + echo "## Command Breakdown" + echo "" + echo '```' + "$CLI" commands --days "$DAYS" + echo '```' + echo "" + + echo "## MCP Server Usage" + echo "" + echo '```' + "$CLI" mcp-usage --days "$DAYS" + echo '```' + echo "" + + echo "## Language Distribution" + echo "" + echo '```' + "$CLI" languages --days "$DAYS" + echo '```' + echo "" + + echo "## Project Activity" + echo "" + echo '```' + "$CLI" projects --days "$DAYS" + echo '```' + echo "" + + echo "## File Activity (Top 20, worktrees collapsed)" + echo "" + echo '```' + "$CLI" file-activity --days "$DAYS" --limit 20 --collapse-worktrees + echo '```' + echo "" + + echo "## Tool Sequences" + echo "" + echo '```' + "$CLI" sequences --days "$DAYS" --min-count 5 + echo '```' + echo "" + + echo "## Token Usage by Day" + echo "" + echo '```' + "$CLI" tokens --days "$DAYS" --by day + echo '```' + echo "" + + echo "## Session Overview" + echo "" + echo '```' + "$CLI" sessions --days "$DAYS" + echo '```' + echo "" + + echo "## Permission Gaps" + echo "" + echo '```' + "$CLI" permissions --days "$DAYS" --min-count 3 + echo '```' + echo "" + +} > "$OUTPUT" + +echo "Report saved to: $OUTPUT" +echo "" +echo "View with: cat $OUTPUT" +echo "Or open in browser: open $OUTPUT" diff --git a/src/session_analytics/cli.py b/src/session_analytics/cli.py index 4ea8f75..133cbb0 100644 --- a/src/session_analytics/cli.py +++ b/src/session_analytics/cli.py @@ -40,6 +40,10 @@ find_related_sessions, get_user_journey, query_commands, + query_file_activity, + query_languages, + query_mcp_usage, + query_projects, query_sessions, query_tokens, query_tool_frequency, @@ -67,39 +71,59 @@ def decorator(formatter: callable): @_register_formatter(lambda d: "total_tool_calls" in d) def _format_tool_frequency(data: dict) -> list[str]: - lines = [f"Total tool calls: {data['total_tool_calls']}", "", "Tool frequency:"] - for tool in data.get("tools", [])[:20]: + lines = [ + "Which tools you use most (Read, Edit, Bash, etc.)", + "", + f"Total tool calls: {data['total_tool_calls']}", + "", + "Tool frequency:", + ] + for tool in data.get("tools", []): lines.append(f" {tool['tool']}: {tool['count']}") + # Show breakdown if present (for Skill, Task, Bash) + for item in tool.get("breakdown", []): + lines.append(f" └ {item['name']}: {item['count']}") return lines @_register_formatter(lambda d: "total_commands" in d) def _format_commands(data: dict) -> list[str]: - lines = [f"Total commands: {data['total_commands']}", "", "Command frequency:"] - for cmd in data.get("commands", [])[:20]: + lines = [ + "Bash commands by frequency (gh, git, cargo, etc.)", + "", + f"Total commands: {data['total_commands']}", + "", + "Command frequency:", + ] + for cmd in data.get("commands", []): lines.append(f" {cmd['command']}: {cmd['count']}") return lines @_register_formatter(lambda d: "session_count" in d and "total_entries" in d) def _format_sessions(data: dict) -> list[str]: - total_tokens = data.get("total_input_tokens", 0) + data.get("total_output_tokens", 0) + input_tokens = data.get("total_input_tokens", 0) + output_tokens = data.get("total_output_tokens", 0) + total_tokens = input_tokens + output_tokens return [ + "Summary of Claude Code sessions and token usage", + "", f"Sessions: {data['session_count']}", f"Total entries: {data['total_entries']}", - f"Total tokens: {total_tokens}", + f"Tokens: {input_tokens:,} in / {output_tokens:,} out ({total_tokens:,} total)", ] @_register_formatter(lambda d: "breakdown" in d) def _format_tokens(data: dict) -> list[str]: lines = [ - f"Token usage by {data.get('group_by', 'unknown')}:", - f"Total input: {data['total_input_tokens']}", - f"Total output: {data['total_output_tokens']}", + f"Token consumption grouped by {data.get('group_by', 'unknown')}", + "", + f"Total input: {data['total_input_tokens']:,}", + f"Total output: {data['total_output_tokens']:,}", "", ] - for item in data["breakdown"][:20]: + for item in data["breakdown"]: key = item.get("day") or item.get("session_id") or item.get("model") lines.append(f" {key}: {item['input_tokens']} in / {item['output_tokens']} out") return lines @@ -108,30 +132,105 @@ def _format_tokens(data: dict) -> list[str]: @_register_formatter(lambda d: "summary" in d) def _format_insights(data: dict) -> list[str]: return [ - "Insights summary:", - f" Tools: {data['summary']['total_tools']}", - f" Commands: {data['summary']['total_commands']}", - f" Sequences: {data['summary']['total_sequences']}", - f" Permission gaps: {data['summary']['permission_gaps_found']}", + "Pre-computed patterns for /improve-workflow", + "", + f"Tools tracked: {data['summary']['total_tools']}", + f"Commands tracked: {data['summary']['total_commands']}", + f"Sequences found: {data['summary']['total_sequences']}", + f"Permission gaps: {data['summary']['permission_gaps_found']}", ] @_register_formatter(lambda d: "sequences" in d) def _format_sequences(data: dict) -> list[str]: - lines = ["Common tool sequences:"] - for seq in data.get("sequences", [])[:20]: + if data.get("expanded"): + desc = "Detailed sequences (Bash→commands, Skill→skills, Task→agents)" + else: + desc = "Tool chains showing workflow patterns (Read → Edit, etc.)" + lines = [ + desc, + "", + "Sequences:", + ] + for seq in data.get("sequences", []): lines.append(f" {seq['pattern']}: {seq['count']}") return lines @_register_formatter(lambda d: "gaps" in d) def _format_gaps(data: dict) -> list[str]: - lines = ["Permission gaps (consider adding to settings.json):"] - for gap in data.get("gaps", [])[:20]: + lines = [ + "Commands used frequently that could be auto-approved in settings.json", + "", + "Permission gaps:", + ] + for gap in data.get("gaps", []): lines.append(f" {gap['command']}: {gap['count']} uses -> {gap['suggestion']}") return lines +@_register_formatter(lambda d: "files" in d and "file_count" in d) +def _format_file_activity(data: dict) -> list[str]: + collapsed = " (worktrees collapsed)" if data.get("collapse_worktrees") else "" + lines = [ + f"Files with most activity (reads, edits, writes){collapsed}", + "", + f"Files touched: {data['file_count']}", + "", + ] + for f in data.get("files", []): + lines.append(f" {f['file']}") + lines.append( + f" total: {f['total']} read: {f['reads']} edit: {f['edits']} write: {f['writes']}" + ) + return lines + + +@_register_formatter(lambda d: "languages" in d and "total_operations" in d) +def _format_languages(data: dict) -> list[str]: + lines = [ + "Language distribution from file extensions", + "", + f"Total file operations: {data['total_operations']:,}", + "", + f"{'LANGUAGE':<20} {'COUNT':>8} {'%':>6}", + ] + for lang in data.get("languages", []): + lines.append(f"{lang['language']:<20} {lang['count']:>8} {lang['percent']:>5.1f}%") + return lines + + +@_register_formatter(lambda d: "projects" in d and "project_count" in d) +def _format_projects(data: dict) -> list[str]: + lines = [ + "Activity across projects", + "", + f"Projects: {data['project_count']}", + "", + f"{'PROJECT':<30} {'EVENTS':>8} {'SESSIONS':>8}", + ] + for proj in data.get("projects", []): + lines.append(f"{proj['name']:<30} {proj['events']:>8} {proj['sessions']:>8}") + return lines + + +@_register_formatter(lambda d: "servers" in d and "total_mcp_calls" in d) +def _format_mcp_usage(data: dict) -> list[str]: + lines = [ + "MCP server and tool usage", + "", + f"Total MCP calls: {data['total_mcp_calls']:,}", + "", + ] + for server in data.get("servers", []): + lines.append(f"{server['server']}: {server['total']} calls") + for tool in server.get("tools", [])[:5]: + lines.append(f" └ {tool['tool']}: {tool['count']}") + if len(server.get("tools", [])) > 5: + lines.append(f" └ ... and {len(server['tools']) - 5} more") + return lines + + @_register_formatter(lambda d: "samples" in d and "parsed_tools" in d) def _format_sample_sequences(data: dict) -> list[str]: lines = [ @@ -166,18 +265,14 @@ def _format_user_journey(data: dict) -> list[str]: lines.append(f"Project switches: {data.get('project_switches', 0)}") lines.append("") - for event in data.get("journey", [])[:20]: + for event in data.get("journey", []): ts = event.get("timestamp", "")[:16] if event.get("timestamp") else "unknown" msg = event.get("message", "") if event.get("message") else "" - if len(msg) > 60: - msg = msg[:57] + "..." project = event.get("project", "") if project: lines.append(f" [{ts}] ({project}) {msg}") else: lines.append(f" [{ts}] {msg}") - if len(data.get("journey", [])) > 20: - lines.append(f" ... and {len(data['journey']) - 20} more") return lines @@ -188,18 +283,14 @@ def _format_search_results(data: dict) -> list[str]: f"Results: {data['count']}", "", ] - for msg in data.get("messages", [])[:20]: + for msg in data.get("messages", []): ts = msg.get("timestamp", "")[:16] if msg.get("timestamp") else "unknown" text = msg.get("message", "") if msg.get("message") else "" - if len(text) > 60: - text = text[:57] + "..." project = msg.get("project", "") if project: lines.append(f" [{ts}] ({project}) {text}") else: lines.append(f" [{ts}] {text}") - if len(data.get("messages", [])) > 20: - lines.append(f" ... and {len(data['messages']) - 20} more") return lines @@ -252,11 +343,13 @@ def _format_ingest(data: dict) -> list[str]: @_register_formatter(lambda d: "event_count" in d) def _format_status(data: dict) -> list[str]: lines = [ + "Analytics database status and ingestion info", + "", f"Database: {data.get('db_path', 'unknown')}", f"Size: {data.get('db_size_bytes', 0) / 1024:.1f} KB", - f"Events: {data['event_count']}", - f"Sessions: {data['session_count']}", - f"Patterns: {data.get('pattern_count', 0)}", + f"Events: {data['event_count']:,}", + f"Sessions: {data['session_count']:,}", + f"Patterns: {data.get('pattern_count', 0):,}", ] if data.get("earliest_event"): lines.append(f"Date range: {data['earliest_event'][:10]} to {data['latest_event'][:10]}") @@ -350,27 +443,22 @@ def _format_handoff_context(data: dict) -> list[str]: and "error_count" in d.get("sessions", [{}])[0] ) def _format_signals(data: dict) -> list[str]: - """Format raw session signals for display. - - Per RFC #17: Surfaces raw data for LLM interpretation, no outcome labels. - """ + """Format raw session signals for display.""" lines = [ - f"Session Signals (last {data['days']} days)", - f"Sessions analyzed: {data['sessions_analyzed']}", + "Session metrics: events, duration, errors, rework, and PR activity", + "", + f"Sessions analyzed: {data['sessions_analyzed']} (last {data['days']} days)", "", - "Sessions (raw signals for LLM interpretation):", ] - for sess in data.get("sessions", [])[:15]: + for sess in data.get("sessions", []): commit_info = f", {sess['commit_count']} commits" if sess.get("commit_count") else "" error_info = f", {sess['error_rate']:.0%} errors" if sess.get("error_rate", 0) > 0 else "" rework = " [rework]" if sess.get("has_rework") else "" pr = " [PR]" if sess.get("has_pr_activity") else "" lines.append( - f" {sess['session_id'][:16]} - {sess['event_count']} events, " + f" {sess['session_id']} - {sess['event_count']} events, " f"{sess['duration_minutes']:.0f}m{commit_info}{error_info}{rework}{pr}" ) - if len(data.get("sessions", [])) > 15: - lines.append(f" ... and {len(data['sessions']) - 15} more") return lines @@ -384,17 +472,15 @@ def _format_session_commits(data: dict) -> list[str]: if data.get("session_id"): lines.insert(1, f"Session: {data['session_id']}") - for commit in data.get("commits", [])[:20]: - sha = commit.get("sha", "")[:8] + for commit in data.get("commits", []): + sha = commit.get("sha", "") time_to = commit.get("time_to_commit_seconds", 0) first = " (first)" if commit.get("is_first_commit") else "" - session = commit.get("session_id", "")[:12] if not data.get("session_id") else "" + session = commit.get("session_id", "") if not data.get("session_id") else "" if session: lines.append(f" {sha} - {time_to}s{first} [{session}]") else: lines.append(f" {sha} - {time_to}s{first}") - if len(data.get("commits", [])) > 20: - lines.append(f" ... and {len(data['commits']) - 20} more") return lines @@ -466,7 +552,8 @@ def cmd_ingest(args): def cmd_frequency(args): """Show tool frequency.""" storage = SQLiteStorage() - result = query_tool_frequency(storage, days=args.days, project=args.project) + expand = not getattr(args, "no_expand", False) + result = query_tool_frequency(storage, days=args.days, project=args.project, expand=expand) print(format_output(result, args.json)) @@ -494,12 +581,17 @@ def cmd_tokens(args): def cmd_sequences(args): """Show tool sequences.""" storage = SQLiteStorage() - patterns = compute_sequence_patterns( - storage, days=args.days, sequence_length=args.length, min_count=args.min_count + sequence_patterns = compute_sequence_patterns( + storage, + days=args.days, + sequence_length=args.length, + min_count=args.min_count, + expand=args.expand, ) result = { "days": args.days, - "sequences": [{"pattern": p.pattern_key, "count": p.count} for p in patterns], + "expanded": args.expand, + "sequences": [{"pattern": p.pattern_key, "count": p.count} for p in sequence_patterns], } print(format_output(result, args.json)) @@ -522,6 +614,40 @@ def cmd_permissions(args): print(format_output(result, args.json)) +def cmd_file_activity(args): + """Show file activity.""" + storage = SQLiteStorage() + result = query_file_activity( + storage, + days=args.days, + project=args.project, + limit=args.limit, + collapse_worktrees=args.collapse_worktrees, + ) + print(format_output(result, args.json)) + + +def cmd_languages(args): + """Show language distribution.""" + storage = SQLiteStorage() + result = query_languages(storage, days=args.days, project=args.project) + print(format_output(result, args.json)) + + +def cmd_projects(args): + """Show project activity.""" + storage = SQLiteStorage() + result = query_projects(storage, days=args.days) + print(format_output(result, args.json)) + + +def cmd_mcp_usage(args): + """Show MCP server/tool usage.""" + storage = SQLiteStorage() + result = query_mcp_usage(storage, days=args.days, project=args.project) + print(format_output(result, args.json)) + + def cmd_insights(args): """Show insights for /improve-workflow.""" storage = SQLiteStorage() @@ -780,6 +906,11 @@ def main(): sub = subparsers.add_parser("frequency", help="Show tool frequency") sub.add_argument("--days", type=int, default=7, help="Days to analyze (default: 7)") sub.add_argument("--project", help="Project path filter") + sub.add_argument( + "--no-expand", + action="store_true", + help="Disable breakdown for Skill, Task, and Bash", + ) sub.set_defaults(func=cmd_frequency) # commands @@ -807,6 +938,11 @@ def main(): sub.add_argument("--days", type=int, default=7, help="Days to analyze (default: 7)") sub.add_argument("--min-count", type=int, default=3, help="Minimum occurrences") sub.add_argument("--length", type=int, default=2, help="Sequence length") + sub.add_argument( + "--expand", + action="store_true", + help="Expand Bash→commands, Skill→skills, Task→agents", + ) sub.set_defaults(func=cmd_sequences) # permissions @@ -934,6 +1070,35 @@ def main(): sub.add_argument("--project", help="Project path filter") sub.set_defaults(func=cmd_session_commits) + # file-activity + sub = subparsers.add_parser("file-activity", help="Show file read/write activity") + sub.add_argument("--days", type=int, default=7, help="Days to analyze (default: 7)") + sub.add_argument("--project", help="Project path filter") + sub.add_argument("--limit", type=int, default=20, help="Max files to show (default: 20)") + sub.add_argument( + "--collapse-worktrees", + action="store_true", + help="Consolidate .worktrees// paths", + ) + sub.set_defaults(func=cmd_file_activity) + + # languages + sub = subparsers.add_parser("languages", help="Show language breakdown by file operations") + sub.add_argument("--days", type=int, default=7, help="Days to analyze (default: 7)") + sub.add_argument("--project", help="Project path filter") + sub.set_defaults(func=cmd_languages) + + # projects + sub = subparsers.add_parser("projects", help="Show activity by project") + sub.add_argument("--days", type=int, default=7, help="Days to analyze (default: 7)") + sub.set_defaults(func=cmd_projects) + + # mcp-usage + sub = subparsers.add_parser("mcp-usage", help="Show MCP server/tool usage") + sub.add_argument("--days", type=int, default=7, help="Days to analyze (default: 7)") + sub.add_argument("--project", help="Project path filter") + sub.set_defaults(func=cmd_mcp_usage) + args = parser.parse_args() args.func(args) diff --git a/src/session_analytics/guide.md b/src/session_analytics/guide.md index d3ca02b..78306b8 100644 --- a/src/session_analytics/guide.md +++ b/src/session_analytics/guide.md @@ -25,6 +25,10 @@ identify permission gaps. | `list_sessions(days?, project?)` | Session metadata and token totals | | `get_token_usage(days?, by?, project?)` | Token usage by day, session, or model | | `get_session_events(days?, tool?, session_id?)` | Recent events with filtering | +| `get_file_activity(days?, project?, limit?, collapse_worktrees?)` | File reads/edits/writes breakdown | +| `get_languages(days?, project?)` | Language distribution from file extensions | +| `get_projects(days?)` | Activity across all projects | +| `get_mcp_usage(days?, project?)` | MCP server and tool usage | ### Pattern Analysis diff --git a/src/session_analytics/patterns.py b/src/session_analytics/patterns.py index 7629d04..b33bfb7 100644 --- a/src/session_analytics/patterns.py +++ b/src/session_analytics/patterns.py @@ -108,6 +108,7 @@ def compute_sequence_patterns( days: int = 7, sequence_length: int = 2, min_count: int = 3, + expand: bool = False, ) -> list[Pattern]: """Compute tool sequence patterns (n-grams) from events. @@ -116,6 +117,8 @@ def compute_sequence_patterns( days: Number of days to analyze sequence_length: Length of sequences to detect min_count: Minimum occurrences to include + expand: If True, expand Bash to commands, Skill to skill names, + Task to subagent types. Shows detailed workflow patterns. Returns: List of sequence patterns @@ -124,9 +127,10 @@ def compute_sequence_patterns( now = datetime.now() # Get all tool events ordered by session and timestamp + # Include extra columns needed for expansion rows = storage.execute_query( """ - SELECT session_id, tool_name, timestamp + SELECT session_id, tool_name, command, skill_name, tool_input_json, timestamp FROM events WHERE timestamp >= ? AND tool_name IS NOT NULL ORDER BY session_id, timestamp @@ -134,6 +138,25 @@ def compute_sequence_patterns( (cutoff,), ) + def get_effective_name(row) -> str: + """Get the effective name for a tool, optionally expanded.""" + if not expand: + return row["tool_name"] + + tool = row["tool_name"] + if tool == "Bash" and row["command"]: + return row["command"] + elif tool == "Skill" and row["skill_name"]: + return row["skill_name"] + elif tool == "Task" and row["tool_input_json"]: + try: + input_data = json.loads(row["tool_input_json"]) + if subagent := input_data.get("subagent_type"): + return subagent + except (json.JSONDecodeError, TypeError): + pass + return tool + # Group by session and extract sequences sequences: Counter = Counter() current_session = None @@ -150,7 +173,7 @@ def compute_sequence_patterns( current_session = row["session_id"] session_tools = [] - session_tools.append(row["tool_name"]) + session_tools.append(get_effective_name(row)) # Process last session if len(session_tools) >= sequence_length: @@ -159,23 +182,23 @@ def compute_sequence_patterns( sequences[seq] += 1 # Create patterns for sequences meeting min_count - patterns = [] + result_patterns = [] for seq, count in sequences.most_common(): if count < min_count: break - patterns.append( + result_patterns.append( Pattern( id=None, pattern_type="tool_sequence", pattern_key=" → ".join(seq), count=count, last_seen=now, - metadata={"sequence": list(seq)}, + metadata={"sequence": list(seq), "expanded": expand}, computed_at=now, ) ) - return patterns + return result_patterns def sample_sequences( @@ -991,7 +1014,6 @@ def get_period_metrics(start: datetime, end: datetime) -> dict: WHERE timestamp >= ? AND timestamp < ? AND tool_name IS NOT NULL GROUP BY tool_name ORDER BY count DESC - LIMIT 10 """, (start, end), ) diff --git a/src/session_analytics/queries.py b/src/session_analytics/queries.py index e0cbbe4..b985f01 100644 --- a/src/session_analytics/queries.py +++ b/src/session_analytics/queries.py @@ -1,5 +1,6 @@ """Query implementations for session analytics.""" +import re from datetime import datetime, timedelta from session_analytics.storage import SQLiteStorage @@ -91,6 +92,7 @@ def query_tool_frequency( storage: SQLiteStorage, days: int = 7, project: str | None = None, + expand: bool = True, ) -> dict: """Get tool usage frequency counts. @@ -98,6 +100,7 @@ def query_tool_frequency( storage: Storage instance days: Number of days to analyze project: Optional project path filter + expand: Include breakdown for Skill, Task, and Bash (default: True) Returns: Dict with tool frequency breakdown @@ -123,6 +126,22 @@ def query_tool_frequency( tools = [{"tool": row["tool_name"], "count": row["count"]} for row in rows] + # Add breakdowns if expand=True + if expand: + # Build breakdown queries with same filters + skill_breakdown = _get_skill_breakdown(storage, cutoff, project) + task_breakdown = _get_task_breakdown(storage, cutoff, project) + bash_breakdown = _get_bash_breakdown(storage, cutoff, project) + + # Attach breakdowns to respective tools + for tool in tools: + if tool["tool"] == "Skill" and skill_breakdown: + tool["breakdown"] = skill_breakdown + elif tool["tool"] == "Task" and task_breakdown: + tool["breakdown"] = task_breakdown + elif tool["tool"] == "Bash" and bash_breakdown: + tool["breakdown"] = bash_breakdown + return { "days": days, "project": project, @@ -131,6 +150,89 @@ def query_tool_frequency( } +def _get_skill_breakdown( + storage: SQLiteStorage, + cutoff: datetime, + project: str | None = None, +) -> list[dict]: + """Get Skill usage breakdown by skill_name.""" + where_clause, params = build_where_clause( + cutoff=cutoff, + project=project, + extra_conditions=["tool_name = 'Skill'", "skill_name IS NOT NULL"], + ) + + rows = storage.execute_query( + f""" + SELECT skill_name, COUNT(*) as count + FROM events + WHERE {where_clause} + GROUP BY skill_name + ORDER BY count DESC + """, + params, + ) + + return [{"name": row["skill_name"], "count": row["count"]} for row in rows] + + +def _get_task_breakdown( + storage: SQLiteStorage, + cutoff: datetime, + project: str | None = None, +) -> list[dict]: + """Get Task usage breakdown by subagent_type.""" + where_clause, params = build_where_clause( + cutoff=cutoff, + project=project, + extra_conditions=["tool_name = 'Task'", "tool_input_json IS NOT NULL"], + ) + + rows = storage.execute_query( + f""" + SELECT + json_extract(tool_input_json, '$.subagent_type') as subagent_type, + COUNT(*) as count + FROM events + WHERE {where_clause} + AND json_extract(tool_input_json, '$.subagent_type') IS NOT NULL + GROUP BY subagent_type + ORDER BY count DESC + """, + params, + ) + + return [{"name": row["subagent_type"], "count": row["count"]} for row in rows] + + +def _get_bash_breakdown( + storage: SQLiteStorage, + cutoff: datetime, + project: str | None = None, + limit: int = 10, +) -> list[dict]: + """Get Bash usage breakdown by command prefix.""" + where_clause, params = build_where_clause( + cutoff=cutoff, + project=project, + extra_conditions=["tool_name = 'Bash'", "command IS NOT NULL"], + ) + + rows = storage.execute_query( + f""" + SELECT command, COUNT(*) as count + FROM events + WHERE {where_clause} + GROUP BY command + ORDER BY count DESC + LIMIT ? + """, + (*params, limit), + ) + + return [{"name": row["command"], "count": row["count"]} for row in rows] + + def query_timeline( storage: SQLiteStorage, start: datetime | None = None, @@ -641,7 +743,7 @@ def detect_parallel_sessions( "min_overlap_minutes": min_overlap_minutes, "total_sessions": len(sessions), "parallel_period_count": len(parallel_periods), - "parallel_periods": parallel_periods[:20], # Limit to top 20 + "parallel_periods": parallel_periods, } @@ -1139,3 +1241,313 @@ def get_handoff_context( "recent_commands": recent_commands, "tool_summary": tool_summary, } + + +# Pattern to match worktree paths: .worktrees// +WORKTREE_PATTERN = re.compile(r"\.worktrees/[^/]+/") + + +def _collapse_worktree_path(path: str) -> str: + """Remove .worktrees// from a path to consolidate file activity.""" + return WORKTREE_PATTERN.sub("", path) + + +def query_file_activity( + storage: SQLiteStorage, + days: int = 7, + project: str | None = None, + limit: int = 20, + collapse_worktrees: bool = False, +) -> dict: + """Query file activity (reads, edits, writes) with breakdown. + + Args: + storage: Storage instance + days: Number of days to analyze + project: Optional project path filter + limit: Maximum files to return + collapse_worktrees: If True, consolidate .worktrees// paths + + Returns: + File activity data with read/edit/write breakdown + """ + cutoff = datetime.now() - timedelta(days=days) + where_clause, params = build_where_clause( + cutoff=cutoff, + project=project, + extra_conditions=["tool_name IN ('Read', 'Edit', 'Write')", "file_path IS NOT NULL"], + ) + + rows = storage.execute_query( + f""" + SELECT + file_path, + tool_name, + COUNT(*) as count + FROM events + WHERE {where_clause} + GROUP BY file_path, tool_name + ORDER BY count DESC + """, + params, + ) + + # Aggregate by file, optionally collapsing worktree paths + file_stats: dict[str, dict] = {} + for row in rows: + path = row["file_path"] + if collapse_worktrees: + path = _collapse_worktree_path(path) + + if path not in file_stats: + file_stats[path] = {"reads": 0, "edits": 0, "writes": 0, "total": 0} + + tool = row["tool_name"] + count = row["count"] + if tool == "Read": + file_stats[path]["reads"] += count + elif tool == "Edit": + file_stats[path]["edits"] += count + elif tool == "Write": + file_stats[path]["writes"] += count + file_stats[path]["total"] += count + + # Sort by total and limit + sorted_files = sorted(file_stats.items(), key=lambda x: x[1]["total"], reverse=True)[:limit] + + files = [ + { + "file": path, + "total": stats["total"], + "reads": stats["reads"], + "edits": stats["edits"], + "writes": stats["writes"], + } + for path, stats in sorted_files + ] + + return { + "days": days, + "collapse_worktrees": collapse_worktrees, + "file_count": len(file_stats), + "files": files, + } + + +def query_languages( + storage: SQLiteStorage, + days: int = 7, + project: str | None = None, +) -> dict: + """Query language distribution from file extensions. + + Args: + storage: Storage instance + days: Number of days to analyze + project: Optional project path filter + + Returns: + Language distribution data + """ + cutoff = datetime.now() - timedelta(days=days) + where_clause, params = build_where_clause( + cutoff=cutoff, + project=project, + extra_conditions=["tool_name IN ('Read', 'Edit', 'Write')", "file_path IS NOT NULL"], + ) + + rows = storage.execute_query( + f""" + SELECT + CASE + WHEN file_path LIKE '%.rs' THEN 'Rust' + WHEN file_path LIKE '%.py' THEN 'Python' + WHEN file_path LIKE '%.ts' THEN 'TypeScript' + WHEN file_path LIKE '%.tsx' THEN 'TypeScript' + WHEN file_path LIKE '%.js' THEN 'JavaScript' + WHEN file_path LIKE '%.jsx' THEN 'JavaScript' + WHEN file_path LIKE '%.md' THEN 'Markdown' + WHEN file_path LIKE '%.json' THEN 'JSON' + WHEN file_path LIKE '%.toml' THEN 'TOML' + WHEN file_path LIKE '%.yaml' THEN 'YAML' + WHEN file_path LIKE '%.yml' THEN 'YAML' + WHEN file_path LIKE '%.sh' THEN 'Shell' + WHEN file_path LIKE '%.bash' THEN 'Shell' + WHEN file_path LIKE '%.go' THEN 'Go' + WHEN file_path LIKE '%.java' THEN 'Java' + WHEN file_path LIKE '%.rb' THEN 'Ruby' + WHEN file_path LIKE '%.c' THEN 'C' + WHEN file_path LIKE '%.cpp' THEN 'C++' + WHEN file_path LIKE '%.h' THEN 'C/C++ Header' + WHEN file_path LIKE '%.hpp' THEN 'C++ Header' + WHEN file_path LIKE '%.swift' THEN 'Swift' + WHEN file_path LIKE '%.css' THEN 'CSS' + WHEN file_path LIKE '%.html' THEN 'HTML' + WHEN file_path LIKE '%.sql' THEN 'SQL' + ELSE 'Other' + END as language, + COUNT(*) as count + FROM events + WHERE {where_clause} + GROUP BY language + ORDER BY count DESC + """, + params, + ) + + total = sum(row["count"] for row in rows) + languages = [ + { + "language": row["language"], + "count": row["count"], + "percent": round(row["count"] / total * 100, 1) if total > 0 else 0, + } + for row in rows + ] + + return { + "days": days, + "total_operations": total, + "languages": languages, + } + + +def query_projects( + storage: SQLiteStorage, + days: int = 7, +) -> dict: + """Query cross-project activity. + + Note: This function intentionally does not have a project filter parameter + because it's designed to show activity *across* all projects. + + Args: + storage: Storage instance + days: Number of days to analyze + + Returns: + Project activity data with event counts and session counts per project + """ + cutoff = datetime.now() - timedelta(days=days) + + rows = storage.execute_query( + """ + SELECT + project_path, + COUNT(*) as events, + COUNT(DISTINCT session_id) as sessions + FROM events + WHERE timestamp >= ? + AND project_path IS NOT NULL + GROUP BY project_path + ORDER BY events DESC + """, + (cutoff,), + ) + + # Extract repo name from path + def get_repo_name(path: str) -> str: + # Try to extract meaningful name from path + parts = path.rstrip("/").split("/") + # Look for common patterns + for i, part in enumerate(parts): + if part in ("projects", "repos", "src", "Documents"): + if i + 1 < len(parts): + return parts[i + 1] + # Fallback to last component + return parts[-1] if parts else path + + projects = [ + { + "project": row["project_path"], + "name": get_repo_name(row["project_path"]), + "events": row["events"], + "sessions": row["sessions"], + } + for row in rows + ] + + return { + "days": days, + "project_count": len(projects), + "projects": projects, + } + + +def query_mcp_usage( + storage: SQLiteStorage, + days: int = 7, + project: str | None = None, +) -> dict: + """Query MCP server/tool usage breakdown. + + Args: + storage: Storage instance + days: Number of days to analyze + project: Optional project path filter + + Returns: + MCP usage data by server and tool + """ + cutoff = datetime.now() - timedelta(days=days) + where_clause, params = build_where_clause( + cutoff=cutoff, + project=project, + extra_conditions=["tool_name LIKE 'mcp__%'"], + ) + + rows = storage.execute_query( + f""" + SELECT + tool_name, + COUNT(*) as count + FROM events + WHERE {where_clause} + GROUP BY tool_name + ORDER BY count DESC + """, + params, + ) + + # Group by server (extract from mcp____) + servers: dict[str, dict] = {} + total = 0 + + for row in rows: + tool_name = row["tool_name"] + count = row["count"] + total += count + + # Parse mcp____ + parts = tool_name.split("__") + if len(parts) >= 3: + server = parts[1] + tool = "__".join(parts[2:]) # Handle tools with __ in name + else: + server = "unknown" + tool = tool_name + + if server not in servers: + servers[server] = {"total": 0, "tools": []} + + servers[server]["total"] += count + servers[server]["tools"].append({"tool": tool, "count": count}) + + # Sort servers by total and tools by count + server_list = sorted(servers.items(), key=lambda x: x[1]["total"], reverse=True) + result_servers = [] + for server_name, data in server_list: + data["tools"].sort(key=lambda x: x["count"], reverse=True) + result_servers.append( + { + "server": server_name, + "total": data["total"], + "tools": data["tools"], + } + ) + + return { + "days": days, + "total_mcp_calls": total, + "servers": result_servers, + } diff --git a/src/session_analytics/server.py b/src/session_analytics/server.py index f029313..35b4d0c 100644 --- a/src/session_analytics/server.py +++ b/src/session_analytics/server.py @@ -99,18 +99,20 @@ def ingest_logs(days: int = 7, project: str | None = None, force: bool = False) @mcp.tool() -def get_tool_frequency(days: int = 7, project: str | None = None) -> dict: +def get_tool_frequency(days: int = 7, project: str | None = None, expand: bool = True) -> dict: """Get tool usage frequency counts. Args: days: Number of days to analyze (default: 7) project: Optional project path filter + expand: Include breakdown for Skill (by skill_name), Task (by subagent_type), + and Bash (by command). Default: True Returns: - Tool frequency breakdown + Tool frequency breakdown with optional nested breakdowns """ queries.ensure_fresh_data(storage, days=days, project=project) - result = queries.query_tool_frequency(storage, days=days, project=project) + result = queries.query_tool_frequency(storage, days=days, project=project, expand=expand) return {"status": "ok", **result} @@ -207,26 +209,30 @@ def get_token_usage(days: int = 7, project: str | None = None, by: str = "day") @mcp.tool() -def get_tool_sequences(days: int = 7, min_count: int = 3, length: int = 2) -> dict: +def get_tool_sequences( + days: int = 7, min_count: int = 3, length: int = 2, expand: bool = False +) -> dict: """Get common tool patterns (sequences). Args: days: Number of days to analyze (default: 7) min_count: Minimum occurrences to include (default: 3) length: Sequence length (default: 2) + expand: Expand Bash→commands, Skill→skill names, Task→subagent types (default: False) Returns: Common tool sequences """ queries.ensure_fresh_data(storage, days=days) sequence_patterns = patterns.compute_sequence_patterns( - storage, days=days, sequence_length=length, min_count=min_count + storage, days=days, sequence_length=length, min_count=min_count, expand=expand ) return { "status": "ok", "days": days, "min_count": min_count, "sequence_length": length, + "expanded": expand, "sequences": [{"pattern": p.pattern_key, "count": p.count} for p in sequence_patterns], } @@ -611,6 +617,84 @@ def get_session_commits(session_id: str | None = None, days: int = 7) -> dict: } +@mcp.tool() +def get_file_activity( + days: int = 7, + project: str | None = None, + limit: int = 20, + collapse_worktrees: bool = False, +) -> dict: + """Get file activity (reads, edits, writes) with breakdown. + + Args: + days: Number of days to analyze (default: 7) + project: Optional project path filter + limit: Maximum files to return (default: 20) + collapse_worktrees: If True, consolidate .worktrees// paths + + Returns: + File activity data with read/edit/write breakdown per file + """ + queries.ensure_fresh_data(storage, days=days, project=project) + result = queries.query_file_activity( + storage, + days=days, + project=project, + limit=limit, + collapse_worktrees=collapse_worktrees, + ) + return {"status": "ok", **result} + + +@mcp.tool() +def get_languages(days: int = 7, project: str | None = None) -> dict: + """Get language distribution from file extensions. + + Args: + days: Number of days to analyze (default: 7) + project: Optional project path filter + + Returns: + Language distribution with counts and percentages + """ + queries.ensure_fresh_data(storage, days=days, project=project) + result = queries.query_languages(storage, days=days, project=project) + return {"status": "ok", **result} + + +@mcp.tool() +def get_projects(days: int = 7) -> dict: + """Get activity breakdown by project. + + Note: No project filter - this shows activity *across* all projects. + + Args: + days: Number of days to analyze (default: 7) + + Returns: + Project activity data with event counts and session counts per project + """ + queries.ensure_fresh_data(storage, days=days) + result = queries.query_projects(storage, days=days) + return {"status": "ok", **result} + + +@mcp.tool() +def get_mcp_usage(days: int = 7, project: str | None = None) -> dict: + """Get MCP server and tool usage breakdown. + + Args: + days: Number of days to analyze (default: 7) + project: Optional project path filter + + Returns: + MCP usage grouped by server with tool breakdown + """ + queries.ensure_fresh_data(storage, days=days, project=project) + result = queries.query_mcp_usage(storage, days=days, project=project) + return {"status": "ok", **result} + + def create_app(): """Create the ASGI app for uvicorn.""" # stateless_http=True allows resilience to server restarts diff --git a/tests/test_cli.py b/tests/test_cli.py index 04629da..b4b43ef 100644 --- a/tests/test_cli.py +++ b/tests/test_cli.py @@ -148,7 +148,7 @@ def test_status_format(self): } result = format_output(data) assert "Database:" in result - assert "Events: 1000" in result + assert "Events: 1,000" in result # Comma-formatted assert "Sessions: 10" in result def test_sessions_format(self): @@ -174,8 +174,8 @@ def test_insights_format(self): } } result = format_output(data) - assert "Insights summary:" in result - assert "Tools: 10" in result + assert "Pre-computed patterns" in result + assert "Tools tracked: 10" in result class TestCliCommands: @@ -249,7 +249,7 @@ class Args: cmd_tokens(Args()) captured = capsys.readouterr() - assert "Token usage" in captured.out + assert "Token consumption" in captured.out def test_cmd_sequences(self, populated_storage, capsys): """Test sequences command.""" @@ -259,12 +259,13 @@ class Args: days = 7 min_count = 1 length = 2 + expand = False with patch("session_analytics.cli.SQLiteStorage", return_value=populated_storage): cmd_sequences(Args()) captured = capsys.readouterr() - assert "Common tool sequences:" in captured.out + assert "Tool chains showing workflow patterns" in captured.out def test_cmd_permissions(self, populated_storage, capsys): """Test permissions command.""" @@ -293,7 +294,7 @@ class Args: cmd_insights(Args()) captured = capsys.readouterr() - assert "Insights summary:" in captured.out + assert "Pre-computed patterns" in captured.out def test_json_output_mode(self, populated_storage, capsys): """Test JSON output mode.""" @@ -402,7 +403,7 @@ class Args: cmd_signals(Args()) captured = capsys.readouterr() - assert "Session Signals" in captured.out + assert "Session metrics" in captured.out assert "Sessions analyzed:" in captured.out def test_cmd_signals_json(self, populated_storage, capsys): @@ -505,7 +506,7 @@ def test_signals_format(self): ], } result = format_output(data) - assert "Session Signals" in result + assert "Session metrics" in result assert "Sessions analyzed: 5" in result assert "session-1-abc" in result assert "50 events" in result diff --git a/tests/test_queries.py b/tests/test_queries.py index fb6247d..7e13eb5 100644 --- a/tests/test_queries.py +++ b/tests/test_queries.py @@ -9,6 +9,10 @@ from session_analytics.queries import ( ensure_fresh_data, query_commands, + query_file_activity, + query_languages, + query_mcp_usage, + query_projects, query_sessions, query_timeline, query_tokens, @@ -1256,3 +1260,278 @@ def test_journey_without_projects(self, storage): assert result["project_switches"] is None for event in result["journey"]: assert "project" not in event + + +class TestQueryFileActivity: + """Tests for file activity queries.""" + + def test_basic_file_activity(self, storage): + """Test basic file activity query.""" + now = datetime.now() + events = [ + Event( + id=None, + uuid="f1", + timestamp=now - timedelta(hours=1), + session_id="s1", + project_path="-test", + entry_type="tool_use", + tool_name="Read", + file_path="/path/to/file.py", + ), + Event( + id=None, + uuid="f2", + timestamp=now - timedelta(hours=2), + session_id="s1", + project_path="-test", + entry_type="tool_use", + tool_name="Edit", + file_path="/path/to/file.py", + ), + Event( + id=None, + uuid="f3", + timestamp=now - timedelta(hours=3), + session_id="s1", + project_path="-test", + entry_type="tool_use", + tool_name="Write", + file_path="/path/to/new.py", + ), + ] + storage.add_events_batch(events) + + result = query_file_activity(storage, days=7) + assert result["file_count"] == 2 + assert len(result["files"]) == 2 + + # file.py should have 2 operations (1 read, 1 edit) + file_py = next(f for f in result["files"] if "file.py" in f["file"]) + assert file_py["reads"] == 1 + assert file_py["edits"] == 1 + assert file_py["writes"] == 0 + assert file_py["total"] == 2 + + def test_collapse_worktrees(self, storage): + """Test worktree path collapsing.""" + now = datetime.now() + events = [ + Event( + id=None, + uuid="w1", + timestamp=now - timedelta(hours=1), + session_id="s1", + project_path="-test", + entry_type="tool_use", + tool_name="Read", + file_path="/projects/myrepo/src/main.rs", + ), + Event( + id=None, + uuid="w2", + timestamp=now - timedelta(hours=2), + session_id="s1", + project_path="-test", + entry_type="tool_use", + tool_name="Edit", + file_path="/projects/myrepo/.worktrees/feature-branch/src/main.rs", + ), + ] + storage.add_events_batch(events) + + # Without collapse, should be 2 files + result_no_collapse = query_file_activity(storage, days=7, collapse_worktrees=False) + assert result_no_collapse["file_count"] == 2 + + # With collapse, should be 1 file (worktree path collapsed) + result_collapse = query_file_activity(storage, days=7, collapse_worktrees=True) + assert result_collapse["file_count"] == 1 + assert result_collapse["files"][0]["total"] == 2 + + +class TestQueryLanguages: + """Tests for language distribution queries.""" + + def test_basic_languages(self, storage): + """Test basic language distribution.""" + now = datetime.now() + events = [ + Event( + id=None, + uuid="l1", + timestamp=now - timedelta(hours=1), + session_id="s1", + project_path="-test", + entry_type="tool_use", + tool_name="Read", + file_path="/path/to/file.py", + ), + Event( + id=None, + uuid="l2", + timestamp=now - timedelta(hours=2), + session_id="s1", + project_path="-test", + entry_type="tool_use", + tool_name="Edit", + file_path="/path/to/file.py", + ), + Event( + id=None, + uuid="l3", + timestamp=now - timedelta(hours=3), + session_id="s1", + project_path="-test", + entry_type="tool_use", + tool_name="Read", + file_path="/path/to/code.rs", + ), + Event( + id=None, + uuid="l4", + timestamp=now - timedelta(hours=4), + session_id="s1", + project_path="-test", + entry_type="tool_use", + tool_name="Read", + file_path="/path/to/doc.md", + ), + ] + storage.add_events_batch(events) + + result = query_languages(storage, days=7) + assert result["total_operations"] == 4 + + langs = {lang["language"]: lang["count"] for lang in result["languages"]} + assert langs.get("Python") == 2 + assert langs.get("Rust") == 1 + assert langs.get("Markdown") == 1 + + +class TestQueryProjects: + """Tests for project activity queries.""" + + def test_basic_projects(self, storage): + """Test basic project activity.""" + now = datetime.now() + events = [ + Event( + id=None, + uuid="p1", + timestamp=now - timedelta(hours=1), + session_id="s1", + project_path="-Users-dev-projects-myapp", + entry_type="tool_use", + tool_name="Read", + ), + Event( + id=None, + uuid="p2", + timestamp=now - timedelta(hours=2), + session_id="s1", + project_path="-Users-dev-projects-myapp", + entry_type="tool_use", + tool_name="Edit", + ), + Event( + id=None, + uuid="p3", + timestamp=now - timedelta(hours=3), + session_id="s2", + project_path="-Users-dev-projects-other", + entry_type="tool_use", + tool_name="Read", + ), + ] + storage.add_events_batch(events) + + storage.upsert_session( + Session( + id="s1", + project_path="-Users-dev-projects-myapp", + first_seen=now - timedelta(hours=2), + last_seen=now - timedelta(hours=1), + entry_count=2, + ) + ) + storage.upsert_session( + Session( + id="s2", + project_path="-Users-dev-projects-other", + first_seen=now - timedelta(hours=3), + last_seen=now - timedelta(hours=3), + entry_count=1, + ) + ) + + result = query_projects(storage, days=7) + assert result["project_count"] == 2 + + # project names are extracted from project_path using get_repo_name() + # which falls back to last component when no known markers found + projects = {p["name"]: p for p in result["projects"]} + assert projects["-Users-dev-projects-myapp"]["events"] == 2 + assert projects["-Users-dev-projects-myapp"]["sessions"] == 1 + assert projects["-Users-dev-projects-other"]["events"] == 1 + + +class TestQueryMcpUsage: + """Tests for MCP usage queries.""" + + def test_basic_mcp_usage(self, storage): + """Test basic MCP usage breakdown.""" + now = datetime.now() + events = [ + Event( + id=None, + uuid="m1", + timestamp=now - timedelta(hours=1), + session_id="s1", + project_path="-test", + entry_type="tool_use", + tool_name="mcp__github__get_issue", + ), + Event( + id=None, + uuid="m2", + timestamp=now - timedelta(hours=2), + session_id="s1", + project_path="-test", + entry_type="tool_use", + tool_name="mcp__github__create_pr", + ), + Event( + id=None, + uuid="m3", + timestamp=now - timedelta(hours=3), + session_id="s1", + project_path="-test", + entry_type="tool_use", + tool_name="mcp__event-bus__publish_event", + ), + Event( + id=None, + uuid="m4", + timestamp=now - timedelta(hours=4), + session_id="s1", + project_path="-test", + entry_type="tool_use", + tool_name="Read", # Non-MCP tool, should be ignored + ), + ] + storage.add_events_batch(events) + + result = query_mcp_usage(storage, days=7) + assert result["total_mcp_calls"] == 3 + + servers = {s["server"]: s for s in result["servers"]} + assert "github" in servers + assert "event-bus" in servers + + assert servers["github"]["total"] == 2 + github_tools = {t["tool"]: t["count"] for t in servers["github"]["tools"]} + assert github_tools.get("get_issue") == 1 + assert github_tools.get("create_pr") == 1 + + assert servers["event-bus"]["total"] == 1