Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions docs/SCHEMA.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ This document describes the SQLite database schema for agent-session-analytics.
| `session_commits` | Junction table linking sessions to commits | ~3K |
| `bus_events` | Cross-session events from event-bus | ~2K |
| `events_fts` | FTS5 virtual table for user message search | N/A |
| `raw_entries` | Unparsed JSONL entries for future re-parsing | 100K+ |

---

Expand Down Expand Up @@ -192,6 +193,24 @@ CREATE TABLE patterns (
)
```

### raw_entries

Unparsed JSONL entries for future re-parsing. Stored separately from `events` to preserve original source material.

```sql
CREATE TABLE raw_entries (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
project_path TEXT,
timestamp TEXT NOT NULL,
entry_json TEXT NOT NULL, -- Full original JSONL entry
ingested_at TEXT NOT NULL DEFAULT (datetime('now')),
UNIQUE(session_id, timestamp, entry_json)
)
```

**Design note**: The UNIQUE constraint on `entry_json` ensures exact deduplication. While this means large JSON values are compared, SQLite handles this efficiently and it avoids hash collision edge cases.

---

## Indexes
Expand Down Expand Up @@ -224,6 +243,8 @@ Performance-critical indexes on the `events` table:
| `bus_events` | `idx_bus_events_type` | `event_type` |
| `bus_events` | `idx_bus_events_session` | `session_id` |
| `bus_events` | `idx_bus_events_repo` | `repo` |
| `raw_entries` | `idx_raw_entries_session` | `session_id` |
| `raw_entries` | `idx_raw_entries_timestamp` | `timestamp` |

---

Expand Down Expand Up @@ -262,6 +283,7 @@ Sync triggers maintain index consistency:
| 10 | backfill_compaction_and_result_size | Backfill compaction detection and result_size_bytes for existing data |
| 11 | fix_compaction_detection_user_entries | Fix compaction detection to look at user entries (not just summary) |
| 12 | fix_warmup_not_errors | Fix warmup events incorrectly marked as errors (Issue #75) |
| 13 | add_raw_entries_table | Raw JSONL storage for future re-parsing (Issue #93) |

---

Expand Down
196 changes: 196 additions & 0 deletions docs/TAILSCALE_SETUP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
# Tailscale Setup for Agent Session Analytics

Deploy agent-session-analytics across multiple machines using Tailscale for secure, authenticated access.

## Architecture

```
[Client Machine] [Server (speck-vm)]
~/.claude/projects/*.jsonl agent-session-analytics MCP
| |
CLI `push` command ----HTTPS----> tailscale serve (TLS + auth)
| |
Reads local JSONL Writes to SQLite
Incremental sync Dedupes by UUID
```

- Server runs on `localhost:8081` (unexposed)
- `tailscale serve` proxies HTTPS requests with TLS and identity headers
- Localhost connections are trusted; remote requires Tailscale auth

## Server Setup

### 1. Install the server

```bash
cd ~/Documents/projects/agent-session-analytics
make install-server
```

This installs:
- Python dependencies via uv
- systemd user service (`agent-session-analytics.service`)
- MCP config pointing to localhost

### 2. Configure Tailscale serve

```bash
# Path-based routing (recommended for multiple services)
tailscale serve --bg --https=443 /agent-session-analytics/mcp localhost:8081

# Verify
tailscale serve status
```

### 3. Verify the server

```bash
# Check service status
systemctl --user status agent-session-analytics

# Test MCP endpoint (from server)
curl -s localhost:8081/mcp -H 'Content-Type: application/json' \
-H 'Accept: application/json, text/event-stream' \
-d '{"jsonrpc":"2.0","method":"tools/list","params":{},"id":1}'
```

## Client Setup

### 1. Install the client

```bash
cd ~/Documents/projects/agent-session-analytics
make install-client REMOTE_URL=https://speck-vm.tailac7b3c.ts.net/agent-session-analytics/mcp
```

This configures Claude Code's MCP settings to point to the remote server.

### 2. Configure push command

Add to your shell profile (`.zshrc` or `.bashrc`):

```bash
export AGENT_SESSION_ANALYTICS_URL=https://speck-vm.tailac7b3c.ts.net/agent-session-analytics/mcp
```

### 3. Push local data

```bash
# Push last 7 days
agent-session-analytics-cli push --days 7

# Push all historical data (incremental, safe to re-run)
agent-session-analytics-cli push --days 365
```

## Incremental Sync

The push command uses incremental sync:

1. Client queries `get_sync_status` to get latest timestamp per session
2. Only entries newer than server's latest are sent
3. Server deduplicates by UUID (`INSERT OR IGNORE`)

This makes `push` safe and efficient to run repeatedly.

## Automatic Sync

### Option 1: Hook after compaction

Add to `~/.claude/settings.json`:

```json
{
"hooks": {
"SessionStart:compact": [
{
"type": "command",
"command": "agent-session-analytics-cli push --days 1"
}
]
}
}
```

### Option 2: Periodic sync via launchd (macOS)

Create `~/Library/LaunchAgents/com.agent-session-analytics.push.plist`:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.agent-session-analytics.push</string>
<key>ProgramArguments</key>
<array>
<string>/usr/bin/env</string>
<string>agent-session-analytics-cli</string>
<string>push</string>
<string>--days</string>
<string>1</string>
</array>
<key>EnvironmentVariables</key>
<dict>
<key>AGENT_SESSION_ANALYTICS_URL</key>
<string>https://speck-vm.tailac7b3c.ts.net/agent-session-analytics/mcp</string>
</dict>
<key>StartInterval</key>
<integer>3600</integer>
<key>RunAtLoad</key>
<true/>
</dict>
</plist>
```

Load with: `launchctl load ~/Library/LaunchAgents/com.agent-session-analytics.push.plist`

## Troubleshooting

### 401 Unauthorized

Remote requests must go through `tailscale serve`. Direct access to the port is blocked.

```bash
# Wrong (direct access)
curl https://speck-vm:8081/mcp

# Correct (through Tailscale)
curl https://speck-vm.tailac7b3c.ts.net/agent-session-analytics/mcp
```

### 406 Not Acceptable

MCP requires specific Accept header:

```bash
curl -H 'Accept: application/json, text/event-stream' ...
```

### Connection timeout during push

Try smaller batch sizes:

```bash
agent-session-analytics-cli push --days 7 --batch-size 50
```

### Check server logs

```bash
journalctl --user -u agent-session-analytics -f
```

## MCP Tools for Remote Sync

| Tool | Purpose |
|------|---------|
| `get_sync_status(session_ids?)` | Get latest timestamp per session |
| `upload_entries(entries, project_path)` | Upload raw JSONL entries |
| `finalize_sync()` | Update session stats after upload |

## Reference

- [agent-event-bus Tailscale setup](https://github.com/evansenter/agent-event-bus/blob/main/docs/TAILSCALE_SETUP.md)
- [Tailscale serve documentation](https://tailscale.com/kb/1242/tailscale-serve)
90 changes: 55 additions & 35 deletions src/agent_session_analytics/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -1602,42 +1602,54 @@ def mcp_call(method_name: str, arguments: dict) -> dict | None:
if not args.json:
print(f"Found {total_local_entries} entries across {len(all_entries_by_session)} sessions")

# Get sync status from server (what it already has)
if not args.json:
print("Checking server sync status...")

sync_status = mcp_call("get_sync_status", {"session_ids": list(all_entries_by_session.keys())})
if sync_status is None:
return

server_sessions = sync_status.get("sessions", {})

# Filter to only entries newer than server's latest per session
entries_to_send: list[tuple[str, dict]] = [] # [(project_path, entry), ...]
for session_id, entries in all_entries_by_session.items():
server_latest = server_sessions.get(session_id)
if server_latest:
# Parse server timestamp and filter
from datetime import timezone

server_ts = datetime.fromisoformat(server_latest.replace("Z", "+00:00"))
if server_ts.tzinfo is None:
server_ts = server_ts.replace(tzinfo=timezone.utc)
for project_path, entry in entries:
entry_ts_str = entry.get("timestamp")
if entry_ts_str:
try:
entry_ts = datetime.fromisoformat(entry_ts_str.replace("Z", "+00:00"))
# Ensure both are timezone-aware for comparison
if entry_ts.tzinfo is None:
entry_ts = entry_ts.replace(tzinfo=timezone.utc)
if entry_ts > server_ts:
entries_to_send.append((project_path, entry))
except ValueError:
entries_to_send.append((project_path, entry)) # Can't parse, send anyway
else:
# Server doesn't have this session, send all entries
# Get sync status from server (what it already has) - skip if --force
if args.force:
if not args.json:
print("Force mode: sending all entries (skipping incremental sync)")
# Send all entries
entries_to_send: list[tuple[str, dict]] = []
for entries in all_entries_by_session.values():
entries_to_send.extend(entries)
else:
if not args.json:
print("Checking server sync status...")

sync_status = mcp_call(
"get_sync_status", {"session_ids": list(all_entries_by_session.keys())}
)
if sync_status is None:
return

server_sessions = sync_status.get("sessions", {})

# Filter to only entries newer than server's latest per session
entries_to_send = []
for session_id, entries in all_entries_by_session.items():
server_latest = server_sessions.get(session_id)
if server_latest:
# Parse server timestamp and filter
from datetime import timezone

server_ts = datetime.fromisoformat(server_latest.replace("Z", "+00:00"))
if server_ts.tzinfo is None:
server_ts = server_ts.replace(tzinfo=timezone.utc)
for project_path, entry in entries:
entry_ts_str = entry.get("timestamp")
if entry_ts_str:
try:
entry_ts = datetime.fromisoformat(entry_ts_str.replace("Z", "+00:00"))
# Ensure both are timezone-aware for comparison
if entry_ts.tzinfo is None:
entry_ts = entry_ts.replace(tzinfo=timezone.utc)
if entry_ts > server_ts:
entries_to_send.append((project_path, entry))
except ValueError:
entries_to_send.append(
(project_path, entry)
) # Can't parse, send anyway
else:
# Server doesn't have this session, send all entries
entries_to_send.extend(entries)

if not entries_to_send:
output = {
Expand Down Expand Up @@ -1667,6 +1679,7 @@ def mcp_call(method_name: str, arguments: dict) -> dict | None:
total_added = 0
total_skipped = 0
total_errors = 0
total_raw_added = 0
entries_uploaded = 0
total_entries = len(entries_to_send)
start_time = datetime.now()
Expand All @@ -1684,6 +1697,7 @@ def mcp_call(method_name: str, arguments: dict) -> dict | None:
total_added += result.get("events_added", 0)
total_skipped += result.get("events_skipped", 0)
total_errors += result.get("parse_errors", 0)
total_raw_added += result.get("raw_entries_added", 0)
entries_uploaded += len(batch)

# Progress update with time estimate
Expand All @@ -1710,6 +1724,7 @@ def mcp_call(method_name: str, arguments: dict) -> dict | None:
"files_checked": len(files),
"entries_checked": total_local_entries,
"entries_sent": len(entries_to_send),
"raw_entries_added": total_raw_added,
"events_added": total_added,
"events_skipped": total_skipped,
"sessions_updated": sessions_updated,
Expand Down Expand Up @@ -2068,6 +2083,11 @@ def main():
default=500,
help="Entries per batch (default: 500)",
)
sub.add_argument(
"--force",
action="store_true",
help="Force re-send all entries (skip incremental sync, re-populate raw_entries)",
)
sub.set_defaults(func=cmd_push)

args = parser.parse_args()
Expand Down
5 changes: 5 additions & 0 deletions src/agent_session_analytics/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,15 @@ export AGENT_SESSION_ANALYTICS_URL=https://server.tailnet.ts.net/mcp

# Push local session data (incremental - only sends new entries)
agent-session-analytics-cli push --days 365

# Force re-send all entries (re-populates raw_entries table)
agent-session-analytics-cli push --days 365 --force
```

The `push` command queries `get_sync_status()` first to determine what the server already has, then only uploads entries newer than the server's latest per session.

**Raw entry storage:** All uploaded entries are stored in both parsed form (events table) and raw form (raw_entries table). This allows re-parsing historical data when the parser improves.

### Core Queries

| Tool | Purpose |
Expand Down
Loading