Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 22 additions & 117 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,7 @@

Queryable analytics for Claude Code session logs, exposed as an MCP server and CLI.

**API Reference**: Run `session-analytics-cli --help` or read `src/session_analytics/guide.md` (served as `session-analytics://guide` MCP resource).

**Related**: [claude-event-bus](https://github.com/evansenter/claude-event-bus) shares design patterns with this project.
**API Reference**: `session-analytics-cli --help` or `src/session_analytics/guide.md` (MCP resource: `session-analytics://guide`).

---

Expand All @@ -13,9 +11,8 @@ Queryable analytics for Claude Code session logs, exposed as an MCP server and C
**The database at `~/.claude/contrib/analytics/data.db` contains irreplaceable historical data.**

### NEVER:
- Delete the database file (`os.remove()`, `unlink()`, `rm`)
- `DROP TABLE` on `events`, `sessions`, `ingested_files`, or `git_commits`
- `DELETE FROM` user data tables (only `patterns` is safe - it's re-computed)
- Delete the database file
- `DROP TABLE` or `DELETE FROM` user data tables (only `patterns` is safe - it's re-computed)
- Add "reset" or "clear all" functionality

### Before schema changes:
Expand All @@ -27,48 +24,23 @@ cp ~/.claude/contrib/analytics/data.db ~/.claude/contrib/analytics/data.db.backu

## Design Philosophy

This API is consumed by LLMs. Every endpoint should be designed with that in mind.

### Principle 1: Don't Over-Distill

Raw data with light structure beats heavily processed summaries. The LLM can handle context.

```python
# BAD: Pre-computed interpretation
{"outcome": "frustrated", "confidence": 0.75}

# GOOD: Raw signals for LLM interpretation
{"error_count": 5, "error_rate": 0.25, "has_rework": True, "commit_count": 0}
```

### Principle 2: Aggregate → Drill-Down
This API is consumed by LLMs. Design with that in mind:

Every aggregate endpoint needs a path to actionable detail. If an LLM sees "821 Bash errors", it should be able to discover WHICH commands failed.
1. **Don't over-distill** - Raw signals (`error_count: 5, has_rework: true`) beat pre-computed interpretations (`outcome: "frustrated"`)

**The test**: Can an LLM go from high-level insight to actionable fix using only MCP calls?
2. **Aggregate → drill-down** - If an endpoint shows "821 Bash errors", there must be a path to discover WHICH commands failed

See RFC #49 for current drill-down gaps and solutions.

### Principle 3: Self-Play Testing

Before merging new API endpoints, test them as an LLM would:

1. Start from a high-level question ("What's causing errors?")
2. Use only MCP tools (no direct DB access)
3. Attempt to reach an actionable conclusion
4. If blocked, the API is incomplete
3. **Self-play test** - Before merging, try reaching an actionable conclusion using only MCP tools. If blocked, the API is incomplete

---

## Quick Reference

### Commands
## Commands

```bash
make check # Run fmt, lint, test
make install # Install LaunchAgent + CLI + MCP config
make restart # Restart LaunchAgent to pick up code changes
make reinstall # pip install -e . + restart (for pyproject.toml changes)
make check # fmt, lint, test
make install # LaunchAgent + CLI + MCP config
make restart # Restart LaunchAgent for code changes
make reinstall # pip install -e . + restart (for pyproject.toml)
```

### When to Restart
Expand All @@ -79,88 +51,21 @@ make reinstall # pip install -e . + restart (for pyproject.toml changes)
| `cli.py` only | None (CLI runs fresh) |
| `pyproject.toml` | `make reinstall` |

### Key Files

| File | Purpose |
|------|---------|
| `server.py` | MCP tools + entry point |
| `cli.py` | CLI with formatter registry |
| `storage.py` | SQLite + migrations |
| `ingest.py` | JSONL parsing |
| `queries.py` | Query implementations |
| `patterns.py` | Sequence/permission gap detection |
| `guide.md` | API reference (MCP resource) |

---

## Adding New Endpoints

Checklist for adding a new query:

1. **Query function** in `queries.py`
- Use `build_where_clause()` helper for filters
- Return structured dict, not raw tuples

2. **MCP tool** in `server.py`
- Follow naming: `get_*`, `list_*`, `search_*`, `analyze_*`
- Standard args: `days`, `limit`, `session_id`, `project`

3. **CLI command** in `cli.py`
- Add formatter with `@_register_formatter(predicate)`
- Support `--json` flag

4. **Documentation** in `guide.md`
- Add to appropriate section
- Include example usage

5. **Self-play test**
- Can you reach actionable info using only MCP?
- If aggregate, what's the drill-down path?

6. **Run `make check`**

---

## Architecture

```
~/.claude/projects/**/*.jsonl → SQLite DB → MCP Server / CLI
~/.claude/contrib/analytics/data.db
```

### Key Patterns
## Key Patterns

- **Storage API**: Use `storage.execute_query()` / `execute_write()`; avoid `_connect()`
- **Migrations**: Use `@migration(version, name)` decorator in storage.py
- **Formatters**: CLI uses `@_register_formatter(predicate)` - first match wins
- **CLI/MCP Parity**: Every query should be accessible from both

### Naming Conventions

| Prefix | Use | Example |
|--------|-----|---------|
| `list_*` | Enumerate (no complex filtering) | `list_sessions()` |
| `get_*` | Retrieve with filters | `get_session_events()` |
| `search_*` | Full-text search | `search_messages()` |
| `analyze_*` | Compute insights | `analyze_failures()` |
| `ingest_*` | Load/import data | `ingest_logs()` |

| Arg | Standard Name |
|-----|---------------|
| Session ID | `session_id` |
| Max results | `limit` |
| Time window | `days` (fractional OK: `0.5` = 12h) |
| Project filter | `project` |
- **Migrations**: `@migration(version, name)` decorator in storage.py
- **CLI/MCP parity**: Every query accessible from both interfaces

---

## Data Model
## Adding Endpoints

| Table | Purpose |
|-------|---------|
| `events` | Individual tool uses with timestamps, tokens, commands |
| `sessions` | Aggregated session metadata |
| `patterns` | Pre-computed patterns (safe to delete - re-computed) |
| `ingested_files` | Tracks file mtime/size for incremental updates |
| `git_commits` | Commit history for session correlation |
1. Query function in `queries.py` (use `build_where_clause()` helper)
2. MCP tool in `server.py` (naming: `get_*`, `list_*`, `search_*`, `analyze_*`)
3. CLI command in `cli.py` (formatter via `@_register_formatter`)
4. Documentation in `guide.md`
5. Self-play test: can you reach actionable info using only MCP?
6. Run `make check`