From d3783ae917825c8bcb04aba9943af8e57afe8f20 Mon Sep 17 00:00:00 2001
From: Siddhant Khare <Siddhantkhare2694@gmail.com>
Date: Sat, 23 May 2026 08:56:40 +0000
Subject: [PATCH] =?UTF-8?q?docs:=20overhaul=20README=20=E2=80=94=20landing?=
 =?UTF-8?q?=20page=20structure,=20move=20detail=20to=20docs/?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

README was 1911 lines with 93 sections. Rewrote as a landing page:
install, quick start, command table, production setup. Moved detailed
docs to docs/ directory (setup, commands, production, server,
integrations, vscode, security).

Co-authored-by: Ona <no-reply@ona.com>
---
 AGENTS.md            |   31 +-
 README.md            | 1899 ++----------------------------------------
 docs/commands.md     |  384 +++++++++
 docs/integrations.md |  136 +++
 docs/production.md   |  136 +++
 docs/security.md     |  180 ++++
 docs/server.md       |   79 ++
 docs/setup.md        |  167 ++++
 docs/vscode.md       |  105 +++
 9 files changed, 1301 insertions(+), 1816 deletions(-)
 create mode 100644 docs/commands.md
 create mode 100644 docs/integrations.md
 create mode 100644 docs/production.md
 create mode 100644 docs/security.md
 create mode 100644 docs/server.md
 create mode 100644 docs/setup.md
 create mode 100644 docs/vscode.md
diff --git a/AGENTS.md b/AGENTS.md
index 361571e..7252246 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -8,7 +8,7 @@ This file tells AI coding agents how to work with the agent-strace repository.
 src/agent_trace/    Core library — one module per feature
 tests/              One test file per module (test_<module>.py)
 ADRs/               Architecture Decision Records — read before adding dependencies
-docs/               Integration guides
+docs/               User-facing documentation (setup, commands, production, integrations, security)
 examples/           Usage examples for each integration
 pyproject.toml      Package config and optional extras
 ```
@@ -41,15 +41,36 @@ python -m pytest tests/test_watch.py -v
 3. Import and register it in `cli.py`
 4. Add new `EventType` values to `models.py` if needed
 5. Write tests in `tests/test_<feature>.py`
-6. Update README.md with the new command and an example
+6. Add the command to the appropriate table in `README.md` (one line, linked to docs/)
+7. Add full flag reference to `docs/commands.md`
+8. If the feature involves production export, update `docs/production.md`
+9. If the feature involves a new integration, update `docs/integrations.md`
 
 ## Version bumping
 
-- New feature (new CLI command, new integration, new flag): bump minor (`0.38.1` → `0.39.0`)
-- Bug fix or small improvement: bump patch (`0.38.1` → `0.38.2`)
+Current version: `0.51.0` in `src/agent_trace/__init__.py`.
+
+- New feature (new command, new flag, new integration): bump minor (`0.51.0` → `0.52.0`)
+- Bug fix or small improvement: bump patch (`0.51.0` → `0.51.1`)
 - Breaking change to CLI or storage format: bump major — check with maintainer first
 
-Version is in `src/agent_trace/__init__.py`.
+## docs/ structure
+
+The `docs/` directory contains user-facing documentation. Keep these files current when adding features:
+
+| File | Contents |
+|---|---|
+| `docs/setup.md` | Full setup for all 3 integration paths, complete JSON configs |
+| `docs/commands.md` | Full flag reference for every command |
+| `docs/production.md` | Per-backend OTLP setup (Datadog, Honeycomb, Grafana, etc.) |
+| `docs/server.md` | Server-side collector setup, Docker, API reference |
+| `docs/integrations.md` | Auto-instrumentation for each framework |
+| `docs/vscode.md` | VS Code extension docs (setup, commands, settings) |
+| `docs/security.md` | Secret redaction, PII anonymization, policy files |
+
+## README policy
+
+`README.md` is a landing page, not documentation. It must stay under 300 lines. Detailed content goes in `docs/`. Do not add detailed flag descriptions, long examples, or configuration dumps to `README.md`.
 
 ## ADRs to read before making architectural decisions
 
diff --git a/README.md b/README.md
index b1e134d..d70ea74 100644
--- a/README.md
+++ b/README.md
@@ -3,12 +3,12 @@
 [![Run in Ona](https://ona.com/run-in-ona.svg)](https://app.ona.com/#https://github.com/Siddhant-K-code/agent-trace)
 [![PyPI](https://img.shields.io/pypi/v/agent-strace)](https://pypi.org/project/agent-strace/)
 [![Python](https://img.shields.io/pypi/pyversions/agent-strace)](https://pypi.org/project/agent-strace/)
-[![License](https://img.shields.io/github/license/Siddhant-K-code/agent-trace)](LICENSE)
 [![CI](https://github.com/Siddhant-K-code/agent-trace/actions/workflows/test.yml/badge.svg)](https://github.com/Siddhant-K-code/agent-trace/actions/workflows/test.yml)
 [![Open VSX](https://img.shields.io/open-vsx/v/Siddhant-K-code/agent-strace)](https://open-vsx.org/extension/Siddhant-K-code/agent-strace)
-[![VS Marketplace](https://img.shields.io/badge/VS%20Marketplace-v0.1.2-blue?logo=visual-studio-code)](https://marketplace.visualstudio.com/items?itemName=Siddhant-K-code.agent-strace)
+[![VS Marketplace](https://img.shields.io/badge/VS%20Marketplace-v0.2.0-blue?logo=visual-studio-code)](https://marketplace.visualstudio.com/items?itemName=Siddhant-K-code.agent-strace)
+[![License](https://img.shields.io/github/license/Siddhant-K-code/agent-trace)](LICENSE)
 
-`strace` for AI agents. Capture and replay every tool call, prompt, and response from Claude Code, Cursor, Gemini CLI, or any MCP client. Analyse, diff, audit, and share what happened.
+`strace` for AI agents.
 
 ![demo](assets/demo.svg)
 
@@ -16,9 +16,7 @@
 
 A coding agent rewrites 20 files in a background session. You get a pull request. You do not get the story. Which files did it read first? Why did it call the same tool three times? What failed before it found the fix?
 
-Most tools trace LLM calls. That is one layer. The gap is everything around it: tool calls, file operations, decision points, error recovery, the actual commands the agent ran. `agent-strace` captures the full session and lets you replay it later. Export to Datadog, Honeycomb, New Relic, or Splunk for production observability.
-
-Set rules to stop the agent: cost ceiling, wrong file touched, too many tool calls. The agent stops. No prompt, no retry, no damage.
+Most tools trace LLM calls. That is one layer. The gap is everything around it: tool calls, file operations, decision points, error recovery, the actual commands the agent ran. `agent-strace` captures the full session and lets you replay it later. Export to Datadog, Honeycomb, New Relic, or Splunk for production observability. Set rules to stop the agent: cost ceiling, wrong file touched, too many tool calls. The agent stops. No prompt, no retry, no damage.
 
 ## Install
 
@@ -35,1877 +33,156 @@ uvx agent-strace replay
 
 **Zero dependencies.** Python 3.10+ standard library only.
 
-## VS Code / Cursor extension
-
-Install the **agent-strace** extension to see live session activity without leaving the editor.
-
-**Install:**
-- Search `agent-strace` in the Extensions panel (VS Code, Cursor, or any Open VSX-compatible editor)
-- Or install from [open-vsx.org/extension/Siddhant-K-code/agent-strace](https://open-vsx.org/extension/Siddhant-K-code/agent-strace)
-
-**What you get:**
-
-| Feature | Description |
-|---|---|
-| Status bar | Live cost, tool call count, and active tool name. Click to open the event stream. |
-| Gutter annotations | Blue border on files the agent read, amber on files it modified. Inline label shows read/write counts. |
-| Event stream panel | Live feed in the Explorer sidebar: every tool call, file op, LLM request, and error. |
-| Pause button | Stops the agent mid-session via SIGSTOP. Requires `agent-strace watch` running in a terminal. |
-
-**Setup:**
-
-```bash
-# 1. Install agent-strace
-pip install agent-strace
-
-# 2. Add hooks to Claude Code (one-time)
-agent-strace setup
-
-# 3. Open your project in VS Code / Cursor
-# The extension activates automatically when .agent-traces/ exists
-
-# 4. Start Claude Code — the status bar item appears immediately
-```
-
-The extension activates automatically when a `.agent-traces/` directory exists in the workspace root. No configuration required.
-
-**Pause / resume** (optional, requires watch running):
-
-```bash
-# In a separate terminal, start the watcher
-agent-strace watch
-
-# Then use the Pause button in the event stream panel,
-# or run: agent-trace: Pause Agent from the command palette
-```
-
 ## Quick start
 
-### Option 1: Claude Code hooks (full session capture)
-
-Captures everything: user prompts, assistant responses, and every tool call (Bash, Edit, Write, Read, Agent, Grep, Glob, WebFetch, WebSearch, all MCP tools).
-
-```bash
-agent-strace setup        # prints hooks config JSON
-agent-strace setup --global  # for all projects
-```
-
-Add the output to `.claude/settings.json`. Or paste it manually:
-
-```json
-{
-  "hooks": {
-    "UserPromptSubmit": [{ "hooks": [{ "type": "command", "command": "agent-strace hook user-prompt" }] }],
-    "PreToolUse": [{ "matcher": "", "hooks": [{ "type": "command", "command": "agent-strace hook pre-tool" }] }],
-    "PostToolUse": [{ "matcher": "", "hooks": [{ "type": "command", "command": "agent-strace hook post-tool" }] }],
-    "PostToolUseFailure": [{ "matcher": "", "hooks": [{ "type": "command", "command": "agent-strace hook post-tool-failure" }] }],
-    "Stop": [{ "hooks": [{ "type": "command", "command": "agent-strace hook stop" }] }],
-    "SessionStart": [{ "hooks": [{ "type": "command", "command": "agent-strace hook session-start" }] }],
-    "SessionEnd": [{ "hooks": [{ "type": "command", "command": "agent-strace hook session-end" }] }]
-  }
-}
-```
-
-Then use Claude Code normally.
+**Option 1: Claude Code hooks** — captures everything (prompts, responses, every tool call)
 
 ```bash
-agent-strace list     # list sessions
-agent-strace replay   # replay the latest
-agent-strace explain  # plain-English summary of what the agent did
-agent-strace stats    # tool call frequency and timing
+agent-strace setup   # prints hooks config — add to .claude/settings.json
+agent-strace list    # list sessions
+agent-strace replay  # replay the latest
 ```
 
-### Option 2: MCP proxy (any MCP client)
+Full config and JSON: [docs/setup.md](docs/setup.md)
 
-Wraps any MCP server. Works with Cursor, Windsurf, or any MCP client.
+**Option 2: MCP proxy** — wraps any MCP server, works with Cursor and Windsurf
 
 ```bash
 agent-strace record -- npx -y @modelcontextprotocol/server-filesystem /tmp
 agent-strace replay
 ```
 
-### Option 3: Python decorator
-
-Wraps your tool functions directly. No MCP required.
+**Option 3: Python decorator** — no MCP required
 
 ```python
-from agent_trace import trace_tool, trace_llm_call, start_session, end_session, log_decision
+from agent_trace import trace_tool, start_session, end_session
 
-start_session(name="my-agent")  # add redact=True to strip secrets
+start_session(name="my-agent")
 
 @trace_tool
 def search_codebase(query: str) -> str:
     return search(query)
 
-@trace_llm_call
-def call_llm(messages: list, model: str = "claude-4") -> str:
-    return client.chat(messages=messages, model=model)
-
-# Log decision points explicitly
-log_decision(
-    choice="read_file_first",
-    reason="Need to understand current implementation before making changes",
-    alternatives=["read_file_first", "search_codebase", "write_fix_directly"],
-)
+end_session()
+```
 
-search_codebase("authenticate")
-call_llm([{"role": "user", "content": "Fix the bug"}])
+Full setup guide: [docs/setup.md](docs/setup.md)
 
-meta = end_session()
-print(f"Replay with: agent-strace replay {meta.session_id}")
-```
+## What you can do
 
-## CLI commands
+### Understand a session
 
 | Command | What it does |
 |---|---|
-| `record` | Capture an MCP stdio session |
-| `record-http` | Capture an MCP HTTP/SSE session |
-| `replay` | Replay a session in the terminal or as HTML |
-| `inspect` | Show raw events for a session |
-| `stats` | Summary stats for a session |
-| `eval` | Score a session against configurable criteria |
-| `eval ci` | CI gate, exits non-zero if any scorer fails |
-| `eval compare` | Compare two sessions side by side |
-| `drift` | Detect behavioral drift across sessions |
-| `optimize` | Propose AGENTS.md improvements from trace failures |
-| `dashboard` | Aggregate view across sessions |
-| `dashboard --trend` | Eval quality and behavioral metrics over time (HTML) |
-| `export` | Export a session (JSON, CSV, OTLP, Langfuse) |
-| `diff` | Semantic diff between two sessions |
-| `why` | Causal chain for a tool call |
-| `explain` | Plain-English session summary |
-| `cost` | Estimate session cost |
-| `standup` | Structured standup report from a session |
-| `oncall` | On-call readiness report for agent-modified files |
-| `freshness` | Context freshness check vs last session |
-| `watch` | Live session monitor with kill-switch rules |
-| `annotate` | Add annotations to session events |
-| `audit-tools` | Shadow AI / MCP detection |
-| `inflation` | Token inflation across model versions |
-| `curve` | Personal cost-efficiency curve |
-| `a2a-tree` | Cross-agent trace correlation (A2A protocol) |
-| `mcp` | MCP server: expose traces as queryable tools for a debugging agent |
-| `timeline` | Structured phase-by-phase view of a session with costs, retries, and wasted spend |
-| `config-watch` | Snapshot and diff AGENTS.md and other config files; find affected sessions |
-
-```
-agent-strace setup [--redact] [--global]        Generate Claude Code hooks config
-agent-strace hook <event>                       Handle a Claude Code hook event (internal)
-agent-strace record -- <command>                Record an MCP stdio server session
-agent-strace record-http <url> [--port N]       Record an MCP HTTP/SSE server session
-agent-strace replay [session-id]                Replay a session (default: latest)
-agent-strace replay [session-id] --limit N      Cap output at N events (fast inspection of large sessions)
-agent-strace replay --format html [-o file]     Export a self-contained HTML replay viewer
-agent-strace replay --expand-subagents          Inline subagent sessions under parent tool_call
-agent-strace replay --tree                      Show session hierarchy without full replay
-agent-strace list                               List all sessions
-agent-strace explain [session-id]               Explain a session in plain English
-agent-strace stats [session-id]                 Show tool call frequency and timing
-agent-strace stats --include-subagents          Roll up stats across the full subagent tree
-agent-strace inspect <session-id>               Dump full session as JSON
-agent-strace export <session-id>                Export as JSON, CSV, NDJSON, or OTLP
-agent-strace import <session.jsonl>             Import a Claude Code JSONL session log
-agent-strace cost [session-id]                  Estimate token cost for a session
-agent-strace diff <session-a> <session-b>       Compare two sessions structurally
-agent-strace diff --compare <a> <b>             Side-by-side table with verdict
-agent-strace diff --semantic <a> <b>            Compare sessions by outcome, not event order
-agent-strace why [session-id] <event-number>    Trace the causal chain for an event
-agent-strace audit [session-id] [--policy]      Check tool calls against a policy file
-agent-strace audit-tools [--repo .] [--approved] Detect Shadow MCP servers and undeclared agent activity in any repo
-agent-strace policy [--output file]             Generate .agent-scope.json from observed traces
-agent-strace dashboard [--last N] [--html file] Aggregate stats and trends across sessions
-agent-strace annotate <session-id> <offset>     Add notes, labels, or bookmarks to events
-agent-strace token-budget <session-id>          Check token usage against model context limit
-agent-strace replay [session-id] [--limit N]    Replay a session (--limit caps events shown)
-agent-strace retention status                   Show session count, size, and what policy would delete
-agent-strace retention clean [--dry-run]        Delete sessions that exceed retention limits
-agent-strace sample --strategy worst --n 20     Export worst/diverse/random/recent sessions as JSONL
-agent-strace export <session> --format otlp-genai  Export with OTel GenAI semantic conventions
-agent-strace server [--port 4317] [--storage DIR]  Start a server-side event collector
-agent-strace auto [--framework NAME] -- <cmd>      Run a command with auto-instrumentation
-agent-strace watch [--timeout DURATION] [--budget $] [--on-death CMD] [--rules file]
-                                                Watch a live session; kill/pause on rule breach
-agent-strace lint [session-id] [--strict] [--all] [--since DURATION]
-                                                Analyse a session for bad behaviour patterns
-agent-strace compare <session-id-a> <session-id-b> [--tag TAG] [--format json]
-                                                Session-to-session regression report
-agent-strace budget-report [--since DATE] [--format text|markdown|json]
-                                                Weekly spend digest across sessions
-agent-strace share <session-id> [-o file]       Export a self-contained HTML report
-agent-strace standup [--session id]             Standup report from session trace (no LLM)
-agent-strace freshness [--scope glob]           Context freshness check vs last session
-agent-strace oncall --rotation-start DATE       On-call readiness for agent-modified files
-agent-strace curve [--export csv]               Personal agent cost-efficiency curve
-agent-strace inflation [--compare m1,m2]        Token inflation calculator across model versions
-agent-strace a2a-tree [session-id]              Visualise A2A agent call graph
-agent-strace timeline [session-id] [--format text|json] [--model MODEL]
-                                                Structured phase-by-phase session view with costs and retries
-agent-strace config-watch snapshot [--label TEXT] [--watch PATH]
-                                                Snapshot current config file state
-agent-strace config-watch check [--format text|json] [--watch PATH]
-                                                Diff current state vs last snapshot (exit 1 if changed)
-agent-strace config-watch history [--format text|json]
-                                                List all snapshots
-agent-strace config-watch affected [--since DURATION] [--format text|json]
-                                                Sessions that ran after a config change
-```
-
-### Import existing Claude Code sessions
-
-Already ran a session without hooks? Import it directly from Claude Code's native JSONL logs:
-
-```bash
-# Discover available sessions
-agent-strace import --discover
-
-# Import a specific session
-agent-strace import ~/.claude/projects/<project>/<session-id>.jsonl
-
-# Then use it like any captured session
-agent-strace replay <session-id>
-agent-strace explain <session-id>
-agent-strace stats <session-id>
-```
-
-Claude Code stores session logs in `~/.claude/projects/`. The import captures tool calls, token usage, subagent invocations, and session metadata.
-
-### Explain a session
-
-Plain-English breakdown of what the agent did, organized by phase, with retry and wasted-time detection:
-
-```bash
-agent-strace explain           # latest session
-agent-strace explain abc123    # specific session
-```
-
-```
-Session: abc123 (2m 05s, 47 events)
-
-Phase 1: fix the auth module (0:00–0:05, 5 events)
-  Read: AGENTS.md, src/auth.py
-
-Phase 2: run tests — FAILED (0:05–1:20, 12 events)
-  Ran: python -m pytest
-  Ran: python -m pytest  ← retry
-
-Phase 3: run tests (1:20–2:05, 8 events)
-  Ran: uv run pytest
-
-Files touched: 3 read, 0 written
-Retries: 1 (wasted 1m 15s, 60% of session)
-```
-
-### Estimate cost
-
-Token usage and dollar cost by phase. Flags wasted spend on failed phases.
-
-```bash
-agent-strace cost                          # latest session, sonnet pricing
-agent-strace cost abc123 --model opus      # specific session and model
-agent-strace cost abc123 --input-price 3.0 --output-price 15.0  # custom pricing
-```
-
-```
-Session: abc123 — Estimated cost: $0.0042
-Model: sonnet  |  8,200 input tokens, 3,100 output tokens
-
-  Phase 1: fix the auth module          $0.0008  (19%)  ...
-  Phase 2: run tests — FAILED           $0.0021  (50%)  ...  ← wasted
-  Phase 3: run tests                    $0.0013  (31%)  ...
-
-Wasted on failed phases: $0.0021 (50%)
-```
-
-Supported models: `sonnet` (default), `opus`, `haiku`, `gpt4`, `gpt4o`. Token counts are estimated from payload size (`len / 4`); see [ADR-0008](ADRs/0008-token-cost-estimation-heuristic.md) for details.
-
-See [examples/session_analysis.md](examples/session_analysis.md) for a full walkthrough combining `import`, `explain`, and `cost`.
-
-### Session regression testing (compare)
-
-Compare two sessions structurally and get a verdict on whether agent behaviour improved or regressed. Useful when changing models, prompts, or tool implementations.
-
-```bash
-# Compare two existing sessions
-agent-strace compare <session-id-a> <session-id-b>
-
-# Compare the last 2 sessions with a given task tag
-agent-strace compare --tag refactor-auth
-
-# Machine-readable output
-agent-strace compare <session-id-a> <session-id-b> --format json
-```
-
-Example output:
-
-```
-Session Comparison
-─────────────────────────────────────────────────────────────────
-                                 a84664242afa    bf1207728ee6    change
-─────────────────────────────────────────────────────────────────
-  Duration                              18m 00s         12m 00s     -33%
-  Total cost                            $4.2300         $2.8700     -32%
-  Tool calls                                 14              11     -21%
-  Files modified                              2               2    (same)
-  Errors                                      0               0
-─────────────────────────────────────────────────────────────────
-Verdict: bf1207728ee6 was 32% cheaper, 33% faster
-Decision divergence:  2 point(s)
-```
-
-`decision divergence` is the edit distance between the two sessions' decision event sequences — no LLM call required. `--tag` compares the last N sessions whose `agent_name` or `command` contains the tag string.
-
-### Weekly spend digest (budget-report)
-
-Aggregate cost across sessions for a configurable time window. Shows total spend, top sessions, cost by tool, and savings from watchdog budget ceilings.
-
-```bash
-# Last 7 days (default)
-agent-strace budget-report
-
-# Custom window
-agent-strace budget-report --since 2026-05-01 --until 2026-05-23
-
-# Markdown output (paste into Slack or email)
-agent-strace budget-report --format markdown
-
-# Machine-readable JSON
-agent-strace budget-report --format json
-```
-
-Example output:
-
-```
-Budget Report — May 16 to May 23, 2026
-
-Total spend:        $47.23  (↑ 12% vs prior period)
-Sessions:           34      (↑ 3 vs prior period)
-Avg cost/session:   $1.39
-
-Top 5 most expensive sessions:
-  1. a84664242afa  $8.43  refactor-auth                   2026-05-21
-  2. bf1207728ee6  $6.21  add-test-coverage               2026-05-22
-  3. c91ab3312fde  $4.87  fix-login-bug  ⚠ watchdog       2026-05-20
-
-Cost by tool (estimated):
-  Bash                  $18.43  (39%)
-  Read                  $12.11  (26%)
-  Write                  $9.87  (21%)
-
-Sessions terminated by watchdog:  3  ($14.21 saved by budget ceiling)
-```
-
-Week-over-week delta is shown when prior-period data exists. The `--format markdown` output is designed to paste directly into Slack without editing.
-
-### Static behaviour analysis (lint)
-
-Analyse a session for known bad patterns — tool loops, reasoning spirals, budget proximity, context saturation, redundant reads, error-retry loops, and sessions that produced no output.
-
-```bash
-# Lint the latest session
-agent-strace lint
-
-# Lint a specific session
-agent-strace lint <session-id>
-
-# Lint all sessions from the last 7 days
-agent-strace lint --all --since 7d
-
-# Machine-readable output for CI
-agent-strace lint <session-id> --format json
-
-# Exit code 1 on any WARN or ERROR (CI gate)
-agent-strace lint <session-id> --strict
-```
-
-Example output:
-
-```
-WARN   tool-loop              "Bash" called 7 times consecutively (events 34–41). Possible loop.
-WARN   reasoning-spiral       4 consecutive LLM calls with no tool call (events 12–15). Agent may be over-reasoning.
-ERROR  budget-proximity       Session reached 94% of a $5.00 budget ceiling. Consider raising or splitting the task.
-INFO   context-saturation     Input tokens exceeded 80% of model context window at event 28.
-INFO   redundant-read         "README.md" read 3 times in this session. Consider caching.
-
-2 error(s), 2 warning(s), 2 info(s). Use --strict for non-zero exit on warnings.
-```
-
-Rules are configurable via `.agent-strace-lint.json`:
-
-```json
-{
-  "tool-loop": { "threshold": 7 },
-  "reasoning-spiral": { "enabled": false }
-}
-```
-
-| Rule | Level | Trigger |
-|---|---|---|
-| `tool-loop` | WARN | Same tool called 5+ times consecutively |
-| `reasoning-spiral` | WARN | 3+ consecutive LLM calls with no tool call |
-| `budget-proximity` | ERROR | Session cost exceeded 90% of watchdog budget ceiling |
-| `context-saturation` | INFO | Input tokens exceeded 80% of model context window |
-| `redundant-read` | INFO | Same file read 3+ times in a session |
-| `error-retry-loop` | WARN | Same tool errored and was retried 3+ times |
-| `no-output` | WARN | Session completed with no write or file-modifying tool calls |
-
-### Session timeline
-
-A structured, phase-by-phase view of what happened in a session — tool calls, file operations, LLM requests, errors, retries, and a wasted-spend callout for failed phases.
-
-```bash
-# Timeline for the latest session
-agent-strace timeline
-
-# Timeline for a specific session
-agent-strace timeline <session-id>
-
-# Machine-readable output
-agent-strace timeline <session-id> --format json
-
-# Use a different model for cost estimates
-agent-strace timeline <session-id> --model opus
-```
-
-Example output:
+| [`agent-strace replay <id>`](docs/commands.md#replay) | Replay a session in the terminal or as HTML |
+| [`agent-strace explain <id>`](docs/commands.md#explain) | Plain-English phase summary, no LLM required |
+| [`agent-strace timeline <id>`](docs/commands.md#timeline) | Phase-by-phase view with costs and retries |
+| [`agent-strace why <id> <event>`](docs/commands.md#why) | Causal chain for a specific decision |
+| [`agent-strace diff <id-a> <id-b>`](docs/commands.md#diff) | Structural or semantic session comparison |
+| [`agent-strace compare <id-a> <id-b>`](docs/commands.md#compare) | Regression report with verdict |
 
-```
-Session: abc123def456 | 2026-05-19 14:32 | 4m 12s | $0.0043 | 3 errors
-
-Phase 1: Setup (0:00–0:45)  $0.0008
-  ✓ Read src/auth/handler.go
-  ✓ Read src/auth/middleware.go
-
-Phase 2: Implementation (0:45–2:10)  $0.0031
-  · Run Bash (1.2s)
-      pytest tests/test_auth.py
-  ✗ Error: Bash
-      FAILED tests/test_auth.py::TestAuthHandler
-  · Run Bash (attempt 2) (1.1s)
-      pytest tests/test_auth.py
-  ✗ Error: Bash
-      FAILED tests/test_auth.py::TestAuthHandler
-  ✓ Write src/auth/handler.go  +3 lines
-  ✓ Run Bash (0.9s)
-
-  ⚠ 2 retries in this phase
-
-⚠ Wasted spend: 2 retries on failed phases = ~$0.0008 (19% of session cost)
-```
-
-| Flag | Default | Description |
-|---|---|---|
-| `--format` | `text` | `text` or `json` |
-| `--model` | `sonnet` | Pricing model for cost estimates: `sonnet`, `opus`, `haiku`, `gpt4`, `gpt4o` |
-
-### Config change detector
-
-Track changes to AGENTS.md and other agent configuration files. Snapshot the current state before a change, then check what drifted and which sessions ran after it.
-
-```bash
-# Record a snapshot of all watched config files
-agent-strace config-watch snapshot
-
-# Add a label to identify the snapshot
-agent-strace config-watch snapshot --label "before-prompt-refactor"
-
-# Check whether anything changed since the last snapshot (exit 1 if yes)
-agent-strace config-watch check
-
-# Machine-readable diff
-agent-strace config-watch check --format json
-
-# List all snapshots
-agent-strace config-watch history
-
-# Find sessions that ran after a config change
-agent-strace config-watch affected
-
-# Limit to sessions from the last 7 days
-agent-strace config-watch affected --since 7d
-```
-
-Example output:
-
-```
-$ agent-strace config-watch check
-CHANGED  AGENTS.md          (sha256: a1b2c3d4 → e5f6a7b8)
-ADDED    .claude/settings.json
-No changes to: CLAUDE.md, system_prompt.md
-
-$ agent-strace config-watch affected
-2 session(s) ran after a config change:
-
-  abc123def456  2026-05-20T14:32:01  (after change to: AGENTS.md)
-  789xyz012abc  2026-05-20T15:10:44  (after change to: AGENTS.md)
-
-Run `agent-strace drift` to compare behaviour before and after the change.
-```
-
-Watched files by default: `AGENTS.md`, `CLAUDE.md`, `system_prompt.md`, `system_prompt.txt`, `.cursorrules`, `.github/copilot-instructions.md`. Add extra paths with `--watch`:
-
-```bash
-agent-strace config-watch snapshot --watch .claude/settings.json --watch custom_prompt.txt
-```
-
-Snapshots are stored in `.agent-traces/config-snapshots.json`. Use `check` as a CI gate — it exits 1 when config has changed since the last snapshot.
-
-### Data retention
-
-Enforce configurable retention policies to automatically delete old session data — required for GDPR, SOC 2, and internal data policies.
+### Control and protect
 
-```bash
-# Check current status and what policy would delete
-agent-strace retention status
-
-# Preview what would be deleted (no changes made)
-agent-strace retention clean --dry-run
-
-# Delete sessions older than 30 days
-agent-strace retention clean --max-age-days 30
-
-# Keep only the 1000 most recent sessions
-agent-strace retention clean --max-sessions 1000
-
-# Delete oldest sessions when storage exceeds 500 MB
-agent-strace retention clean --max-size-mb 500
-```
-
-Configure via `.agent-strace.yaml`:
-
-```yaml
-retention:
-  max_age_days: 30
-  max_sessions: 1000
-  max_size_mb: 500
-  on_delete: log    # log deletions to .agent-traces/retention.log
-```
-
-Policies are applied in order: age → count → size. Deletions are logged with session ID and timestamp (not content).
-
-### Trace anonymization
-
-Strip identifying information from traces at export time — original session data is never modified. Complements secret redaction (which strips secrets at capture time).
-
-```bash
-# Preview what would be anonymized
-agent-strace export SESSION_ID --anonymize --dry-run
-
-# Export with anonymization applied
-agent-strace export SESSION_ID --anonymize --output trace-anon.json
-```
-
-Anonymized by default:
-- Home directory paths → `~/relative/path`
-- Hostnames → `<hostname>`
-- OS usernames → `<user>`
-- Email addresses → `<email>`
-
-Add custom patterns via `.agent-strace/anonymize.yaml`:
+| Command | What it does |
+|---|---|
+| [`agent-strace watch`](docs/commands.md#watch) | Live monitor with kill-switch rules |
+| [`agent-strace watch --timeout 30m --budget $5`](docs/commands.md#watch) | Watchdog mode — kills on limit, writes post-mortem |
+| [`agent-strace audit <id>`](docs/commands.md#audit) | Audit tool calls against a policy file |
+| [`agent-strace record --redact`](docs/commands.md#record) | Strip secrets from traces before storage |
+| [`agent-strace export --anonymize`](docs/commands.md#export) | Remove PII at export time |
 
-```yaml
-rules:
-  - pattern: "ACME Corp"
-    replacement: "<company>"
-  - pattern: "192\.168\.\d+\.\d+"
-    replacement: "<internal-ip>"
-```
+### Analyse across sessions
 
-### Secret redaction
+| Command | What it does |
+|---|---|
+| [`agent-strace dashboard`](docs/commands.md#dashboard) | Multi-session overview |
+| [`agent-strace budget-report`](docs/commands.md#budget-report) | Weekly spend digest |
+| [`agent-strace lint <id>`](docs/commands.md#lint) | Flag bad behaviour patterns (loops, spirals, waste) |
+| [`agent-strace drift`](docs/commands.md#drift) | Detect behavioural drift over time |
+| [`agent-strace standup`](docs/commands.md#standup) | Plain-English summary of yesterday's sessions |
+| [`agent-strace eval <id>`](docs/commands.md#eval) | Score a session against behavioural baselines |
+| [`agent-strace eval ci`](docs/commands.md#eval) | Fail CI on behavioural regression |
 
-Strip API keys, tokens, and credentials from traces before they hit disk.
+### Export and integrate
 
-```bash
-# Stdio proxy with redaction
-agent-strace record --redact -- npx -y @modelcontextprotocol/server-filesystem /tmp
+| Command | What it does |
+|---|---|
+| [`agent-strace export --format otlp-genai`](docs/production.md) | Export to Datadog, Honeycomb, Grafana, Jaeger |
+| [`agent-strace server`](docs/server.md) | Server-side collector for multi-agent, multi-machine |
+| [`agent-strace share <id>`](docs/commands.md#share) | Generate a shareable HTML replay |
+| [`agent-strace sample`](docs/commands.md#sample) | Export worst sessions as JSONL for eval datasets |
 
-# HTTP proxy with redaction
-agent-strace record-http https://mcp.example.com --redact
-```
+Full flag reference: [docs/commands.md](docs/commands.md)
 
-Detected patterns: OpenAI (`sk-*`), GitHub (`ghp_*`, `github_pat_*`), AWS (`AKIA*`), Anthropic (`sk-ant-*`), Slack (`xox*`), JWTs, Bearer tokens, connection strings (`postgres://`, `mysql://`), and any value under keys like `password`, `secret`, `token`, `api_key`, `authorization`.
+## VS Code extension
 
-### HTTP/SSE proxy
+Install **agent-strace** from the [Extensions panel](https://open-vsx.org/extension/Siddhant-K-code/agent-strace) to see live session activity without leaving the editor.
 
-For MCP servers that use HTTP transport:
+| Feature | Description |
+|---|---|
+| Status bar | Live cost, tool call count, and active tool name. Click to open the event stream. |
+| Gutter annotations | Blue border on files the agent read, amber on files it modified. |
+| Event stream panel | Live feed: every tool call, file op, LLM request, and error. |
+| Pause button | Stops the agent mid-session via SIGSTOP. |
 
 ```bash
-# Proxy a remote MCP server
-agent-strace record-http https://mcp.example.com --port 3100
-
-# Your agent connects to http://127.0.0.1:3100 instead of the remote server
-# All JSON-RPC messages are captured, tool call latency is measured
-```
-
-The proxy forwards POST `/message` and GET `/sse` to the remote server, capturing every JSON-RPC message in both directions.
-
-### Replay output
-
-A real Claude Code session captured with hooks:
-
-<details><summary>Session Summary</summary>
-<p>
-
-```
-Session Summary
-──────────────────────────────────────────────────
-  Session:    201da364-edd6-49
-  Command:    claude-code (startup)
-  Agent:      claude-code
-  Duration:   112.54s
-  Tool calls: 8
-  Errors:     3
-──────────────────────────────────────────────────
-
-+  0.00s ▶ session_start
-+  0.07s 👤 user_prompt
-              "how many tests does this project have? run them and tell me the results"
-+  3.55s → tool_call Glob
-              **/*.test.*
-+  3.55s → tool_call Glob
-              **/test_*.*
-+  3.60s ← tool_result Glob (51ms)
-+  6.06s → tool_call Bash
-              $ python -m pytest tests/ -v 2>&1
-+ 27.65s ✗ error Bash
-              Command failed with exit code 1
-+ 29.89s → tool_call Bash
-              $ python3 -m pytest tests/ -v 2>&1
-+ 40.56s ✗ error Bash
-              No module named pytest
-+ 45.96s → tool_call Bash
-              $ which pytest || ls /Users/siddhant/Desktop/test-agent-trace/ 2>&1
-+ 46.01s ← tool_result Bash (51ms)
-+ 48.18s → tool_call Read
-              /Users/siddhant/Desktop/test-agent-trace/pyproject.toml
-+ 48.23s ← tool_result Read (43ms)
-+ 51.43s → tool_call Bash
-              $ uv run --with pytest pytest tests/ -v 2>&1
-+1m43.67s ← tool_result Bash (5.88s)
-              75 tests, all passing in 3.60s
-+1m52.54s 🤖 assistant_response
-              "75 tests, all passing in 3.60s. Breakdown by file: ..."
+pip install agent-strace   # 1. install
+agent-strace setup         # 2. add hooks to Claude Code
+# 3. open project in VS Code — extension activates when .agent-traces/ exists
+# 4. start Claude Code — status bar appears immediately
 ```
 
-Tool calls show actual values: commands, file paths, glob patterns. Errors show what failed. Assistant responses are stripped of markdown.
-
-</p>
-</details> 
+Full docs: [docs/vscode.md](docs/vscode.md)
 
-### Filtering
+## Production
 
-```bash
-# Show only tool calls and errors
-agent-strace replay --filter tool_call,error
-
-# Replay with timing (watch it unfold)
-agent-strace replay --live --speed 2
-```
-
-### Export
+**OTLP export** — sessions become traces, tool calls become spans:
 
 ```bash
-# JSON array
-agent-strace export a84664 --format json
-
-# CSV (for spreadsheets)
-agent-strace export a84664 --format csv
-
-# NDJSON (for streaming pipelines)
-agent-strace export a84664 --format ndjson
-```
-
-## Trace format
-
-Traces are stored as directories in `.agent-traces/`:
-
-```
-.agent-traces/
-  a84664242afa4516/
-    meta.json        # session metadata
-    events.ndjson    # newline-delimited JSON events
-```
-
-Each event is a single JSON line:
-
-```json
-{
-  "event_type": "tool_call",
-  "timestamp": 1773562735.09,
-  "event_id": "bf1207728ee6",
-  "session_id": "a84664242afa4516",
-  "data": {
-    "tool_name": "read_file",
-    "arguments": {"path": "src/auth.py"}
-  }
-}
+agent-strace export <session-id> --format otlp-genai \
+  --endpoint http://localhost:4318
 ```
 
-### Event types
+Per-backend setup (Datadog, Honeycomb, Grafana, New Relic, Splunk, Langfuse): [docs/production.md](docs/production.md)
 
-| Type | Description |
-|------|-------------|
-| `session_start` | Trace session began |
-| `session_end` | Trace session ended |
-| `user_prompt` | User submitted a prompt to the agent |
-| `assistant_response` | Agent produced a text response |
-| `tool_call` | Agent invoked a tool |
-| `tool_result` | Tool returned a result |
-| `llm_request` | Agent sent a prompt to an LLM |
-| `llm_response` | LLM returned a completion |
-| `file_read` | Agent read a file |
-| `file_write` | Agent wrote a file |
-| `decision` | Agent chose between alternatives |
-| `error` | Something failed |
-
-Events link to each other. A `tool_result` has a `parent_id` pointing to its `tool_call`. This lets you measure latency per tool and trace the full call chain.
-
-## Use with Claude Code, Cursor, Windsurf
-
-### Claude Code (hooks, recommended)
-
-Captures the full session: prompts, responses, and every tool call. See [examples/claude_code_config.md](examples/claude_code_config.md) for the full config.
+**Server-side collector** — for containers, CI, and multi-machine setups:
 
 ```bash
-agent-strace setup                    # per-project config
-agent-strace setup --redact --global  # all projects, with secret redaction
-```
-
-### Cursor
-
-Edit `~/.cursor/mcp.json` (global) or `.cursor/mcp.json` (per-project):
-
-```json
-{
-  "mcpServers": {
-    "filesystem": {
-      "command": "agent-strace",
-      "args": ["record", "--name", "filesystem", "--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
-    }
-  }
-}
-```
-
-### Windsurf
-
-Edit `~/.codeium/windsurf/mcp_config.json`:
-
-```json
-{
-  "mcpServers": {
-    "filesystem": {
-      "command": "agent-strace",
-      "args": ["record", "--name", "filesystem", "--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
-    }
-  }
-}
+agent-strace server --port 4317 --storage ./traces
+AGENT_STRACE_ENDPOINT=http://collector:4317 python my_agent.py
 ```
 
-### Any MCP client
-
-The pattern is the same for any tool that uses MCP over stdio:
-
-1. Replace the server `command` with `agent-strace`
-2. Prepend `record --name <label> --` to the original args
-3. Use the tool normally
-4. Run `agent-strace replay` to see what happened
+Full guide: [docs/server.md](docs/server.md)
 
-See the [examples/](examples/) directory for full config files.
+**Auto-instrumentation** — no code changes required:
 
-### Subagent tracing
-
-When an agent spawns subagents (e.g. Claude Code's Agent tool), sessions are linked into a parent-child tree. Replay the full tree inline or view a compact hierarchy:
-
-```bash
-# Inline replay: subagent events appear under the parent tool_call that spawned them
-agent-strace replay --expand-subagents
-
-# Compact hierarchy: session IDs, durations, tool counts
-agent-strace replay --tree
-
-# Aggregated stats across the full tree (tokens, tool calls, errors)
-agent-strace stats --include-subagents
-```
-
-```
-▶ session_start  a84664242afa  agent=claude-code  depth=0
-  + 0.00s  👤 "refactor the auth module"
-  + 1.23s  → tool_call  Agent  "extract helper functions"
-│  ▶ session_start  b12345678901  agent=claude-code  depth=1
-│    + 0.00s  → tool_call  Read  src/auth.py
-│    + 0.12s  ← tool_result
-│    + 0.45s  → tool_call  Write  src/auth_helpers.py
-│    + 0.51s  ■ session_end
-  + 3.10s  ← tool_result
-  + 3.20s  ■ session_end
+```python
+from agent_trace.integrations import instrument_langchain
+instrument_langchain()
 ```
 
-Subagent sessions are linked via `parent_session_id` and `parent_event_id` in session metadata. Existing sessions without these fields are unaffected.
-
-### Session diff
-
-Compare two sessions structurally. Useful for understanding why the same prompt produces different results across runs, or comparing a broken session against a known-good one. Phases are aligned by label using LCS, then per-phase differences in files touched, commands run, and outcomes are reported:
-
-```bash
-agent-strace diff abc123 def456
-```
+Supported: OpenAI Agents SDK, LangChain, LiteLLM, Anthropic SDK, OpenAI SDK, AWS Strands. Guide: [docs/integrations.md](docs/integrations.md)
 
-```
-Comparing: abc123 vs def456
+## How it works
 
-Diverged at phase 2:
+**Claude Code hooks** — Claude Code fires hook events at every stage of its agentic loop. agent-strace registers as a handler, reads JSON from stdin, and writes trace events. Each hook runs as a separate process; session state in `.agent-traces/.active-session` correlates PreToolUse and PostToolUse for latency measurement.
 
-  Phase 2: run tests
-    abc123 only:  $ python -m pytest
-    def456 only:  $ uv run pytest
+**MCP stdio proxy** — sits between the agent and the MCP server, reads JSON-RPC messages (Content-Length framed or newline-delimited), classifies each one, and writes a trace event. Messages are forwarded unchanged. The agent and server do not know the proxy exists.
 
-  abc123: 4m 12s, 47 events, 8 tools, 2 retries
-  def456: 2m 05s, 31 events, 5 tools, 0 retries
-```
+**MCP HTTP/SSE proxy** — same idea, different transport. Listens on a local port, forwards POST and SSE requests to the remote server, captures every JSON-RPC message in both directions.
 
-### Causal chain (why)
+**Python decorator** — `@trace_tool` logs a `tool_call` event before execution and a `tool_result` after. Errors and timing are captured automatically. `@trace_llm_call` does the same for LLM calls.
 
-Trace backwards from any event to find what caused it. Run `agent-strace replay <session-id>` first. The `#N` numbers in the left column are the event numbers:
+## Running tests
 
 ```bash
-agent-strace why abc123 4
+python -m pytest tests/ -v
 ```
 
-```
-Why did event #4 happen?
+## License
 
-  #  4  tool_call: Bash  $ pytest tests/
+MIT. Use it however you want.
 
-Causal chain (root → target):
+---
 
-    #  1  user_prompt: "run the test suite"
-       (prompt at #1 triggered this)
-  ←  #  3  error: exit 1
-       (retry after error at #3)
-  ←  #  4  tool_call: Bash  $ pytest tests/
-```
-
-Causal links are detected via `parent_id` (tool_result → tool_call), error→retry matching (same tool and command), path references (tool_result text containing a path used by a later call), and read→write pairs on the same file.
-
-### Permission audit
-
-Check every tool call against a policy file. Flags sensitive file access (`.env`, `*.pem`, `.ssh/*`, `.github/workflows/*`, etc.) even without a policy:
-
-```bash
-agent-strace audit                          # latest session, no policy required
-agent-strace audit abc123 --policy .agent-scope.json
-
-# In CI: fail the build if the agent accessed anything outside policy
-agent-strace audit --policy .agent-scope.json || exit 1
-```
-
-```
-AUDIT: Session abc123 (47 events, 23 tool calls)
-
-✅ Allowed (19):
-  Read src/auth.py
-  Ran: uv run pytest
-
-⚠️  No policy (2):
-  Read README.md  (no file read policy for this path)
-
-❌ Violations (2):
-  Read .env  ← denied by files.read.deny
-  Ran: curl https://example.com  ← denied by commands.deny
-
-🔐 Sensitive files accessed (1):
-  Read .env  (event #12)
-```
-
-Exits with code 1 when violations are found. Usable in CI.
-
-**Policy file** (`.agent-scope.json`):
-
-```json
-{
-  "files": {
-    "read":  { "allow": ["src/**", "tests/**"], "deny": [".env"] },
-    "write": { "allow": ["src/**"], "deny": [".github/**"] }
-  },
-  "commands": {
-    "allow": ["pytest", "uv run", "cat"],
-    "deny":  ["curl", "wget", "rm -rf"]
-  },
-  "network": { "deny_all": true, "allow": ["localhost"] }
-}
-```
-
-Glob patterns support `**` as a recursive wildcard. File read policy applies to `Read`, `View`, `Grep`, and `Glob` tool calls. Network policy checks URLs embedded in `Bash` commands.
-
-### Auto-generate a policy from your traces
-
-Let agent-trace observe a few sessions and generate `.agent-scope.json` for you:
-
-```bash
-# Dry-run: print the suggested policy without writing anything
-agent-strace policy
-
-# Write it to disk
-agent-strace policy --output .agent-scope.json
-
-# Observe a specific set of sessions
-agent-strace policy --last 20 --output .agent-scope.json
-```
-
-The generated policy covers every file path and command the agent used, collapsed into glob patterns. Review it, tighten the deny list, and commit it alongside your code.
-
-### Optimize instruction files from trace failures
-
-Cluster failures by root cause and propose concrete additions to `AGENTS.md`, `CLAUDE.md`, or any instruction file. Three built-in heuristic patterns require no LLM.
-
-```bash
-# Show proposed additions to AGENTS.md (dry run, no writes)
-agent-strace optimize --target AGENTS.md
-
-# Analyze a dataset of failures
-agent-strace optimize --dataset auth-failures --target AGENTS.md
-
-# Apply changes
-agent-strace optimize --target AGENTS.md --apply
-
-# Use a local Ollama model for LLM-assisted clustering
-agent-strace optimize --target AGENTS.md \
-  --base-url http://localhost:11434/v1 \
-  --model llama3 \
-  --api-key ollama \
-  --apply
-```
-
-Built-in heuristic patterns (no LLM required):
-
-| Pattern | Detection | Proposed fix |
-|---|---|---|
-| `blind-retry` | Same tool called 3+ times consecutively | Add retry policy to AGENTS.md |
-| `error-no-change` | Tool retried after error with no write in between | Add error-handling rule |
-| `wide-blast-radius` | More than 8 distinct files written in one session | Add scope discipline rule |
-
-When `OPENAI_API_KEY` and `OPENAI_BASE_URL` are set (or `--api-key` / `--base-url`), the command uses an LLM to cluster failures and generate more targeted proposals. Falls back to heuristics if the LLM call fails.
-
-### PII masking
-
-Sensitive data is masked before it hits disk. Useful when tracing agents that handle user data or credentials.
-
-```bash
-# Stdio proxy with masking
-agent-strace record --mask -- npx -y @modelcontextprotocol/server-filesystem /tmp
-
-# HTTP proxy with masking
-agent-strace record-http https://mcp.example.com --mask
-```
-
-Masked by default: email addresses, phone numbers, credit card numbers, US Social Security Numbers, and AWS ARNs. You can also call `mask_event_data()` directly to sanitise events from an existing session before sharing or exporting them.
-
-### Eval scoring
-
-Score a session against configurable criteria. Built-in scorers require no LLM. They run on trace structure alone.
-
-```bash
-# Score the latest session (uses .agent-evals.yaml if present)
-agent-strace eval
-
-# Score a specific session
-agent-strace eval abc123
-
-# JSON output
-agent-strace eval abc123 --format json
-
-# Compare two sessions
-agent-strace eval compare abc123 def456
-```
-
-Built-in scorers:
-
-| Scorer | What it checks |
-|---|---|
-| `no_errors` | Session had zero ERROR events |
-| `cost_under` | Estimated cost stayed below a dollar threshold |
-| `files_scoped` | All file operations were within allowed paths |
-| `duration_under` | Session completed within a time limit |
-| `regex` | A pattern matched in agent responses |
-
-Configure scorers in `.agent-evals.yaml`:
-
-```yaml
-scorers:
-  - type: no_errors
-    threshold: 1.0
-  - type: cost_under
-    max_dollars: 0.10
-    threshold: 1.0
-  - type: files_scoped
-    allowed_paths: ["src/", "tests/"]
-    threshold: 0.90
-
-thresholds:
-  pass: 0.85
-  warn: 0.70
-```
-
-#### CI gate
-
-Block merges when agent quality regresses:
-
-```bash
-agent-strace eval ci
-```
-
-Exits non-zero if any scorer fails. Add to GitHub Actions:
-
-```yaml
-- name: Eval agent session
-  run: agent-strace eval ci
-  env:
-    PYTHONPATH: src
-```
-
-### Multi-session dashboard
-
-Get an aggregate view across all your sessions. Useful for spotting trends, outliers, and cost spikes without opening each session individually.
-
-```bash
-agent-strace dashboard                    # all sessions
-agent-strace dashboard --last 20          # last 20 sessions
-agent-strace dashboard --since 2024-06-01 # since a date
-agent-strace dashboard --html report.html # self-contained HTML export
-```
-
-The terminal view shows total tool calls, errors, tokens, and estimated cost, plus ASCII sparkline charts for each metric over time and a top-tools frequency table. The HTML export is self-contained. No server needed.
-
-### Dataset auto-sampler
-
-Export the sessions most useful for regression suites and eval datasets — without manual inspection.
-
-```bash
-# Export the 20 worst-performing sessions (highest error/retry/cost)
-agent-strace sample --strategy worst --n 20 --output regression.jsonl
-
-# Export 10 sessions that maximise behavioral variety
-agent-strace sample --strategy diverse --n 10 --output diverse.jsonl
-
-# Export the 5 most recent sessions
-agent-strace sample --strategy recent --n 5 --output recent.jsonl
-
-# Random sample, reproducible with a seed
-agent-strace sample --strategy random --n 15 --seed 42 --output random.jsonl
-
-# Skip sessions with identical tool call sequences
-agent-strace sample --strategy worst --n 20 --deduplicate --output regression.jsonl
-```
-
-Output is JSONL — one session per line — with full event data and a score breakdown. Compatible with LangSmith, Braintrust, and any custom eval framework.
-
-### Eval trend dashboard
-
-See whether your agent is getting better or worse over time. Reads eval scores and behavioral metrics from session events, then renders a self-contained HTML report with inline SVG charts.
-
-```bash
-# Terminal summary
-agent-strace dashboard --trend --since 30d
-
-# Self-contained HTML report (no CDN, no JS libraries)
-agent-strace dashboard --trend --since 30d --html trend-report.html
-
-# Add a timeline annotation (appears as a vertical marker on all charts)
-agent-strace dashboard annotate --date 2026-05-10 --note "Added retry policy to AGENTS.md"
-```
-
-The HTML report shows:
-- **Eval quality**: pass rate per judge over time, with annotation markers for config changes
-- **Behavioral metrics**: error rate, retry rate, cost, session duration as sparklines
-- **Recent sessions table**: eval scores inline, click any row to open the full replay
-
-The file is fully self-contained. Attach it to a PR, commit it as a weekly snapshot, or share it with someone who doesn't have agent-strace installed.
-
-### Session attribution
-
-Every session records who and what spawned it: OS user, detected agent provider, git repo and branch, and the chain of parent processes.
-
-```bash
-agent-strace show SESSION_ID
-# Attribution
-#   User:     alice
-#   Provider: claude-code
-#   Branch:   feat/my-feature
-#   Commit:   a1b2c3d
-#   CWD:      /home/alice/projects/myapp
-```
-
-Detected providers: `claude-code`, `cursor`, `github-copilot`, `cody`, `continue`, and a generic `mcp-client` fallback. Attribution is collected automatically. Nothing to configure.
-
-### Replay annotations
-
-Add notes, labels, and bookmarks to any event. Useful for code review, debugging, and building eval datasets.
-
-```bash
-# Add a note to event #12
-agent-strace annotate SESSION_ID 12 --note "Why did it call bash here instead of write_file?"
-
-# Tag an event
-agent-strace annotate SESSION_ID 12 --label regression
-
-# Bookmark for quick navigation in the HTML viewer
-agent-strace annotate SESSION_ID 12 --bookmark
-
-# List all annotations
-agent-strace annotate SESSION_ID --list
-
-# Remove one
-agent-strace annotate SESSION_ID 12 --delete ANNOTATION_ID
-```
-
-Annotations persist alongside the session and appear as a bookmarks sidebar in shared HTML reports. They're also useful for building eval datasets: label sessions as `pass` / `fail` / `interesting` and filter on those labels later.
-
-### Token budget tracking
-
-Long-running agents can burn through a model's context window without warning. The token budget command shows how close you are before you hit the limit.
-
-```bash
-agent-strace token-budget SESSION_ID
-agent-strace token-budget SESSION_ID --model claude-3-5-sonnet
-agent-strace token-budget SESSION_ID --model gpt-4o --warn-at 75
-```
-
-In watch mode, the same threshold applies in real time:
-
-```bash
-agent-strace watch --max-context-pct 80 SESSION_ID
-```
-
-Supported models and their limits:
-
-| Model | Context |
-|---|---|
-| claude-3-5-sonnet | 200k tokens |
-| claude-3-opus | 200k tokens |
-| gpt-4o | 128k tokens |
-| gpt-4-turbo | 128k tokens |
-| gemini-1.5-pro | 1M tokens |
-
-Pass `--limit` to set a custom ceiling for any other model.
-
-### Semantic session diff
-
-Compare two sessions by *outcome*, not raw event order. Useful for regression testing agent behaviour across model versions or prompt changes.
-
-```bash
-agent-strace diff SESSION_A SESSION_B --semantic
-```
-
-```
-Semantic diff: SESSION_A vs SESSION_B
-
-Tools added:    write_file
-Tools removed:  bash
-Δ tool calls:   +3
-Δ errors:       -2
-Δ tokens:       +1,200
-Outcome:        improved (fewer errors, same task completed)
-```
-
-Export a structured JSON report for CI assertions:
-
-```bash
-agent-strace diff SESSION_A SESSION_B --semantic --eval-config eval.json
-```
-
-### Rich side-by-side comparison
-
-`--compare` produces a structured table across cost, duration, tool calls, redundant reads, context resets, files modified, and errors. The verdict is deterministic and requires no LLM.
-
-```bash
-agent-strace diff SESSION_A SESSION_B --compare
-```
-
-New metrics: **redundant reads** (files read more than once), **context resets** (LLM requests separated by >120s), **approach divergence** (first phase pairs where behaviour differs). Useful for asserting on in CI.
-
-### Watchdog mode — timeout, budget ceiling, and post-mortem
-
-Enforce a wall-clock timeout and/or token-cost ceiling on any session. When either limit is breached the agent process is terminated and a structured `watchdog-postmortem.json` is written to the session directory. An optional `--on-death` command is invoked with the post-mortem path.
-
-```bash
-# Kill after 30 minutes
-agent-strace watch --timeout 30m --on-violation kill SESSION_ID
-
-# Kill when spend exceeds $5
-agent-strace watch --budget 5.00 --on-violation kill SESSION_ID
-
-# Both limits, with a recovery script
-agent-strace watch \
-  --timeout 30m \
-  --budget 5.00 \
-  --on-violation kill \
-  --on-death "python recover.py --post-mortem {post_mortem_path}" \
-  SESSION_ID
-```
-
-`--timeout` accepts human-readable durations: `30s`, `5m`, `2h`, `1h30m`.
-
-The `watchdog-postmortem.json` written on kill contains:
-
-```json
-{
-  "session_id": "abc123",
-  "reason": "DurationWatcher: 1800s elapsed",
-  "elapsed_seconds": 1800.0,
-  "cost_at_death": 2.34,
-  "last_tool_call": { "tool_name": "Bash", "arguments": { "command": "pytest" } },
-  "last_llm_response": { "model": "claude-3-5-sonnet", "content": "..." },
-  "recovery_context": "Session abc123 was terminated after 1800s ($2.34 spent). ..."
-}
-```
-
-### Kill switch for runaway sessions
-
-Add a declarative rules file to `agent-strace watch` to pause, kill, or alert when a session crosses a threshold. The agent stops when a rule fires. No prompt, no retry, no damage.
-
-```bash
-agent-strace watch --rules .watch-rules.json
-agent-strace watch --rules .watch-rules.json --dry-run  # evaluate without acting
-```
-
-Example `.watch-rules.json`:
-
-```json
-[
-  { "condition": "cost_usd", "threshold": 0.50, "action": "kill" },
-  { "condition": "file_path", "glob": "**/production.env", "action": "kill" },
-  { "condition": "files_modified", "threshold": 30, "action": "pause" }
-]
-```
-
-**Rule conditions:** `files_modified`, `cost_usd`, `consecutive_test_failures`, `duration_minutes`, `file_path` (glob).
-
-**Actions:**
-- `pause`: SIGSTOP the agent process (resume with SIGCONT)
-- `kill`: SIGTERM, then SIGKILL after 5s; auto-generates a postmortem
-- `alert`: log only, no interruption
-
-### Push-based event streaming
-
-Stream events to an external HTTP endpoint in real-time as they arrive during a watched session. Events are batched and POSTed as [NDJSON](https://ndjsonl.org) (`application/x-ndjson`), so any HTTP server or log aggregator can consume them.
-
-```bash
-# Stream all events to a collector
-agent-strace watch --stream-to https://collector.example.com/events SESSION_ID
-
-# Tune batch size and flush interval
-agent-strace watch \
-  --stream-to https://collector.example.com/events \
-  --stream-batch-size 20 \
-  --stream-flush-interval 5.0 \
-  SESSION_ID
-```
-
-Each POST body contains one JSON object per line:
-
-```
-{"event_type":"tool_call","timestamp":1700000001.0,"session_id":"abc123","data":{...}}
-{"event_type":"llm_response","timestamp":1700000002.5,"session_id":"abc123","data":{...}}
-```
-
-**Options:**
-
-| Flag | Default | Description |
-|---|---|---|
-| `--stream-to URL` | — | HTTP endpoint to POST events to |
-| `--stream-batch-size N` | `10` | Max events per POST |
-| `--stream-flush-interval S` | `2.0` | Max seconds between flushes |
-
-HTTP failures are logged to stderr but never interrupt the watch loop. The background flush thread is a daemon and is stopped cleanly when the session ends or the watcher exits.
-
-### Behavioral drift detection
-
-Detect when agent behavior has shifted across sessions without an LLM. Computes a behavioral fingerprint across six dimensions (tool mix, error rate, retry pattern, blast radius, session duration, decision depth) and measures divergence from a baseline using Jensen-Shannon divergence.
-
-```bash
-# Detect drift over the last 30 days (splits window in half automatically)
-agent-strace drift --since 30d
-
-# Compare against a saved baseline
-agent-strace drift --baseline .agent-traces/baselines/behavior-main.json
-
-# Save current fingerprint as a baseline
-agent-strace drift --since 30d --save-baseline .agent-traces/baselines/behavior-main.json
-
-# JSON output for CI
-agent-strace drift --since 30d --format json
-```
-
-Exits non-zero when the overall drift score exceeds `--threshold` (default: `0.20`). Commit baseline fingerprints alongside your agent config. They're under 2KB.
-
-Six dimensions tracked:
-
-| Dimension | Drift signal |
-|---|---|
-| Tool mix | Agent suddenly calling Bash 40% more often |
-| Error rate | New class of errors appearing |
-| Retry pattern | Agent retrying more after a model update |
-| Blast radius | Agent touching more files per task |
-| Session duration | Sessions getting longer |
-| Decision depth | Agent making fewer explicit decisions |
-
-### Shadow MCP detection
-
-Detect Shadow MCP servers and undeclared agent activity in any repo. No network calls, no API keys. A [CSA survey of 418 security professionals](https://cloudsecurityalliance.org/press-releases/2026/04/21/new-cloud-security-alliance-survey-reveals-82-of-enterprises-have-unknown-ai-agents-in-their-environments) found 82% of enterprises discovered at least one AI agent their security team didn't know about in the past year. `audit-tools` finds yours.
-
-```bash
-agent-strace audit-tools
-agent-strace audit-tools --repo . --since "90 days ago" --approved cursor,copilot
-```
-
-Detected tools: Claude Code, Cursor, GitHub Copilot, Codex/ChatGPT, Windsurf, Aider, Gemini CLI. Identified via file signals (`.cursorrules`, `CLAUDE.md`, `.github/copilot-instructions.md`, etc.) and commit message patterns. Flags unapproved tools, unknown LLM API endpoints in `.env` history, and PII patterns in recently committed files.
-
-### HTML session replay viewer
-
-Generate a single-file HTML viewer for any session. No server, no dependencies. Open in any browser.
-
-```bash
-agent-strace replay --format html
-agent-strace replay --format html --output review.html SESSION_ID
-```
-
-The viewer includes an animated event timeline, scrubber bar, running cost counter, click-to-expand event detail, color-coded event types, and dark theme. All event data is embedded as a JSON constant. Useful for attaching to PR reviews.
-
-### Standup report
-
-Generate a structured standup from a session trace. No LLM required.
-
-```bash
-agent-strace standup
-agent-strace standup --session SESSION_ID
-```
-
-Report covers: files read and modified, approaches tried (including abandoned ones detected from retry patterns), new dependencies added, TODO/FIXME comments written, large changes and auth/migration patterns to review, and session stats (tool calls, retries, errors).
-
-### Context freshness check
-
-Check how stale the agent's last view of the codebase is before handing it a task.
-
-```bash
-agent-strace freshness
-agent-strace freshness --since 2026-04-01 --scope "src/**"
-```
-
-Reports files changed since the last session, per-file change type and line count, a freshness score 0–100, and estimated catch-up reading time. Scope is auto-detected from `CLAUDE.md` / `AGENTS.md`, or overridden with `--scope`.
-
-### On-call readiness
-
-Cross-reference agent-modified files against git history to find gaps before a rotation.
-
-```bash
-agent-strace oncall --rotation-start 2026-04-25
-agent-strace oncall --rotation-start 2026-04-25 --scope "src/payments/**"
-```
-
-For each file the agent has written in the last N days: how long ago it was modified, lines changed, estimated reading time, and total catch-up time before rotation.
-
-### Cost-efficiency curve
-
-See which task types are worth delegating to an agent.
-
-```bash
-agent-strace curve
-agent-strace curve --min-sessions 10 --export csv
-```
-
-Sessions are classified into 10 task types (unit tests, debugging, refactoring, architecture, etc.) and compared against a community sweet-spot benchmark. Verdict per type: **efficient / over sweet spot / do this yourself**. Potential monthly savings are calculated for types running above 1.5× their sweet spot.
-
-### Token inflation calculator
-
-Measure the tokenizer cost impact of switching model versions before committing to an upgrade. No API calls required.
-
-```bash
-agent-strace inflation
-agent-strace inflation --compare claude-opus-4-6,claude-opus-4-7 --sessions 30
-```
-
-Applies per-model inflation factors to stored session content and breaks down the impact by content type (system prompt, tool definitions, user messages, assistant messages). Projects per-session, daily, and monthly cost delta.
-
-| Model | Factor |
-|---|---|
-| claude-opus-4-7 | 1.38× (community median: 1.3–1.47×, April 2026) |
-| gpt-4o | 1.05× (cl100k_base → o200k_base) |
-
-### A2A protocol support
-
-First-class support for agent-to-agent calls following the Google A2A spec. A2A calls are captured as `TOOL_CALL` events with `event_subtype=a2a_call`, backward-compatible with all existing replay and export tooling.
-
-```bash
-agent-strace a2a-tree
-agent-strace a2a-tree SESSION_ID --format json
-```
-
-Builds the full agent call graph by following `sub_session_id` links and `parent_session_id` back-references. Renders as an ASCII tree or exports as OTLP spans for Jaeger, Tempo, or any OpenTelemetry backend.
-
-## Use with security-critical codebases
-
-When AI coding agents work on codebases that handle secrets, attestation logic, or cryptographic material, agent-strace gives you two things: an audit trail of every file touched and every command run, and redaction of secrets before they reach any log.
-
-### Recommended setup for sensitive repos
-
-Add `.claude/settings.json` to the repo root and commit it. Every developer gets the same instrumentation:
-
-```json
-{
-  "hooks": {
-    "PreToolUse": [{
-      "matcher": ".*",
-      "hooks": [{ "type": "command", "command": "agent-strace hook pre-tool" }]
-    }],
-    "PostToolUse": [{
-      "matcher": ".*",
-      "hooks": [{ "type": "command", "command": "agent-strace hook post-tool" }]
-    }]
-  }
-}
-```
-
-Or use the setup command:
-
-```bash
-cd your-sensitive-repo
-agent-strace setup --redact
-```
-
-### Secret redaction for TEE and confidential computing stacks
-
-For codebases that handle TEE secrets, the following patterns are redacted automatically:
-
-| Secret type | Pattern matched |
-|-------------|----------------|
-| EKM shared secrets | 64-char hex strings (e.g. `EKM_SHARED_SECRET`) |
-| Bearer tokens | `Bearer [A-Za-z0-9+/=]{20,}` |
-| Anthropic API keys | `sk-ant-...` |
-| AWS credentials | `AKIA...`, `aws_secret_access_key` |
-| Private keys | PEM blocks |
-
-If your codebase uses custom secret formats, add patterns via `--redact-pattern`:
-
-```bash
-agent-strace setup --redact --redact-pattern "ATTESTATION_KEY=[A-Fa-f0-9]{64}"
-```
-
-### Example: scoping agents away from sensitive components
-
-Combine with [agentic-authz](https://github.com/Siddhant-K-code/agentic-authz) to block agents from security-critical components entirely, and use agent-strace to audit everything they do access:
-
-```
-Agent scope:        frontend/ only (enforced by OpenFGA: no tuple = no access)
-agent-strace scope: all tool calls logged, secrets redacted, exported to Grafana
-```
-
-Any attempt by the agent to read `cvm/attestation-service/` or `cvm/auth-service/` is blocked at the authorization layer before it reaches the filesystem. agent-strace logs the denied attempt with the reason.
-
----
-
-## Auto-instrumentation
-
-Instrument any supported agent framework without modifying application code.
-
-```bash
-# Instrument a specific framework
-agent-strace auto --framework langchain -- python my_agent.py
-
-# Auto-detect all installed frameworks
-agent-strace auto --detect -- python my_agent.py
-
-# Via environment variable (no CLI wrapper needed)
-AGENT_STRACE_AUTO_INSTRUMENT=langchain,litellm python my_agent.py
-
-# Or in code
-from agent_trace.integrations import instrument_langchain
-instrument_langchain()
-```
-
-Supported frameworks:
-
-| Framework | Install | What's traced |
-|---|---|---|
-| OpenAI Agents SDK | `pip install agent-strace[openai-agents]` | Runner.run, FunctionTool calls |
-| LangChain / LangGraph | `pip install agent-strace[langchain]` | BaseTool._run, BaseChatModel._generate |
-| LiteLLM | `pip install agent-strace[litellm]` | litellm.completion |
-| Anthropic SDK | `pip install anthropic` | messages.create |
-| OpenAI SDK | `pip install openai` | chat.completions.create |
-| AWS Strands | `pip install agent-strace[strands]` | Agent.__call__, BaseTool.invoke |
-
-Each integration is an optional extra — the core package stays dependency-free (ADR-0003).
-
-## Server-side event collector
-
-Run a central collector so agents in containers, CI, and serverless functions can send traces over the network — no local disk required.
-
-```bash
-# Start the collector
-agent-strace server --port 4317 --storage ./traces
-
-# Agents point to it via environment variable — no code changes required
-AGENT_STRACE_ENDPOINT=http://collector:4317 python my_agent.py
-```
-
-The server writes traces in the same `.agent-traces/` format as local mode. All existing CLI commands work against its storage.
-
-### API
-
-| Method | Path | Description |
-|---|---|---|
-| `POST` | `/events` | Receive a batch of NDJSON events |
-| `POST` | `/sessions` | Create or update session metadata |
-| `GET` | `/sessions` | List all sessions |
-| `GET` | `/sessions/<id>/events` | Stream events for a session |
-| `GET` | `/health` | Liveness check |
-
-### Docker
-
-```dockerfile
-FROM python:3.12-slim
-RUN pip install agent-strace
-ENV AGENT_STRACE_STORAGE=/data
-VOLUME /data
-EXPOSE 4317
-CMD ["agent-strace", "server", "--port", "4317"]
-```
-
-No authentication in v1 — intended for internal/private network use. Add a reverse proxy (nginx, Caddy) for auth.
-
-## Production tracing (OTLP export)
-
-Export sessions as OpenTelemetry spans to your existing observability stack. Sessions become traces. Tool calls become spans with duration and inputs. Errors get exception events. No new dependencies.
-
-### OTel GenAI semantic conventions
-
-Use `--format otlp-genai` to export with strict [OTel GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/). This produces AI-native spans that populate token usage charts, cost views, and LLM dashboards in Datadog, Grafana, and Honeycomb automatically.
-
-```bash
-agent-strace export <session-id> --format otlp-genai \
-  --endpoint http://localhost:4318
-```
-
-Key differences from `--format otlp`:
-
-| Aspect | `--format otlp` | `--format otlp-genai` |
-|---|---|---|
-| LLM calls | Events on root span | `gen_ai.client.operation` child spans |
-| Tool calls | `tool/<name>` spans | `gen_ai.tool.call/<name>` spans |
-| Root span | `agent.name` attribute | `gen_ai.agent.id` + `gen_ai.agent.name` |
-| Errors | Custom error span | OTel `exception` event format |
-
-`--format otlp` is unchanged for backwards compatibility.
-
-### Datadog
-
-```bash
-# Via the Datadog Agent's OTLP receiver (port 4318)
-agent-strace export <session-id> --format otlp \
-  --endpoint http://localhost:4318
-
-# Or via Datadog's OTLP intake directly
-agent-strace export <session-id> --format otlp \
-  --endpoint https://http-intake.logs.datadoghq.com:443 \
-  --header "DD-API-KEY: $DD_API_KEY"
-```
-
-### Honeycomb
-
-```bash
-agent-strace export <session-id> --format otlp \
-  --endpoint https://api.honeycomb.io \
-  --header "x-honeycomb-team: $HONEYCOMB_API_KEY" \
-  --service-name my-agent
-```
-
-### New Relic
-
-```bash
-agent-strace export <session-id> --format otlp \
-  --endpoint https://otlp.nr-data.net \
-  --header "api-key: $NEW_RELIC_LICENSE_KEY"
-```
-
-### Splunk
-
-```bash
-agent-strace export <session-id> --format otlp \
-  --endpoint https://ingest.<realm>.signalfx.com \
-  --header "X-SF-Token: $SPLUNK_ACCESS_TOKEN"
-```
-
-### Grafana Tempo / Jaeger
-
-```bash
-# Local collector
-agent-strace export <session-id> --format otlp \
-  --endpoint http://localhost:4318
-```
-
-### Langfuse export
-
-Export sessions and eval scores to [Langfuse](https://langfuse.com). Sessions appear as Traces, tool calls as Spans, LLM calls as Generations, and eval scores as Langfuse Scores.
-
-```bash
-# Set credentials
-export LANGFUSE_PUBLIC_KEY=pk-lf-...
-export LANGFUSE_SECRET_KEY=sk-lf-...
-
-# Export latest session with eval scores
-agent-strace export --scores --backend langfuse
-
-# Export last 7 days
-agent-strace export --since 7d --scores --backend langfuse
-
-# Self-hosted Langfuse
-agent-strace export --scores --backend langfuse \
-  --langfuse-host https://langfuse.your-domain.com
-```
-
-### Export behavioral metrics to any OTLP backend
-
-Export per-session behavioral metrics as OTLP gauge metrics. Compatible with Datadog, Honeycomb, Grafana, New Relic, and any OpenTelemetry backend.
-
-```bash
-export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
-
-agent-strace export --metrics --backend otlp --since 30d
-```
-
-Metrics exported:
-
-| Metric | Description |
-|---|---|
-| `agent_strace.session.cost_usd` | Estimated cost per session |
-| `agent_strace.session.error_rate` | Errors / tool calls |
-| `agent_strace.session.retry_rate` | Consecutive same-tool retries / tool calls |
-| `agent_strace.session.blast_radius` | Distinct files written |
-| `agent_strace.session.duration_s` | Wall-clock session duration |
-| `agent_strace.eval.score` | Judge score per session (one per judge, with `judge=` attribute) |
-
-### Dump OTLP JSON without sending
-
-```bash
-# Inspect the OTLP payload
-agent-strace export <session-id> --format otlp > trace.json
-```
-
-### How it maps
-
-| agent-trace | OpenTelemetry |
-|---|---|
-| session | trace |
-| tool_call + tool_result | span (with duration) |
-| error | span with error status + exception event |
-| user_prompt | event on root span |
-| assistant_response | event on root span |
-| session_id | trace ID |
-| event_id | span ID |
-| parent_id | parent span ID |
-
-## Debug with MCP
-
-`agent-strace mcp` starts an MCP server that exposes your session store as queryable tools. Any MCP-compatible client (Claude Code, Cursor, VS Code Copilot) can query traces conversationally. The debugging agent reads its own execution history and surfaces what went wrong.
-
-```bash
-agent-strace mcp
-```
-
-**Claude Code config** (`.claude/settings.json`):
-
-```json
-{
-  "mcpServers": {
-    "agent-trace": {
-      "command": "agent-strace",
-      "args": ["mcp"]
-    }
-  }
-}
-```
-
-**Cursor config** (`.cursor/mcp.json`):
-
-```json
-{
-  "mcpServers": {
-    "agent-trace": {
-      "command": "agent-strace",
-      "args": ["mcp"]
-    }
-  }
-}
-```
-
-Once connected, you can ask the debugging agent questions like:
-
-> "Look at the most recent session and tell me why it called bash three times in a row."
-> "Which files did the agent write in session abc123 that it didn't write in def456?"
-> "Find all sessions where the agent hit an error after calling npm test."
-
-### MCP tools
-
-| Tool | Description |
-|---|---|
-| `list_sessions` | List captured sessions with metadata (timestamp, tool calls, cost, tokens) |
-| `get_session` | Full event stream for a session, with optional event type filter |
-| `search_events` | Filter events by tool name, file path, exit code, or error flag across sessions |
-| `get_session_summary` | Plain-English phase breakdown: what the agent did, files touched, retries |
-| `diff_sessions` | Compare two sessions: tool call delta, file overlap, cost delta, error delta |
-
-### Example interactions
-
-```
-# List recent sessions
-list_sessions(limit=5)
-
-# Get all errors from a session
-search_events(session_id="abc123", has_error=true)
-
-# Find all sessions where the agent wrote to package-lock.json
-search_events(file_path="package-lock.json")
-
-# Compare two sessions after changing AGENTS.md
-diff_sessions(session_a="before_change", session_b="after_change")
-
-# Get a plain-English summary of what went wrong
-get_session_summary(session_id="abc123")
-```
-
-## How it works
-
-### Claude Code hooks
-
-```
-Claude Code agentic loop
-  ├── UserPromptSubmit   → agent-strace hook user-prompt
-  ├── PreToolUse         → agent-strace hook pre-tool
-  ├── PostToolUse        → agent-strace hook post-tool
-  ├── PostToolUseFailure → agent-strace hook post-tool-failure
-  ├── Stop               → agent-strace hook stop
-  ├── SessionStart       → agent-strace hook session-start
-  └── SessionEnd         → agent-strace hook session-end
-                               ↓
-                         .agent-traces/
-```
-
-Claude Code fires hook events at every stage of its agentic loop. agent-strace registers as a handler, reads JSON from stdin, and writes trace events. Each hook runs as a separate process. Session state lives in `.agent-traces/.active-session` so PreToolUse and PostToolUse can be correlated for latency measurement.
-
-### MCP stdio proxy
-
-```
-Agent ←→ agent-strace proxy ←→ MCP Server (stdio)
-              ↓
-         .agent-traces/
-```
-
-The proxy reads JSON-RPC messages (Content-Length framed or newline-delimited), classifies each one, and writes a trace event. Messages are forwarded unchanged. The agent and server do not know the proxy exists.
-
-### MCP HTTP/SSE proxy
-
-```
-Agent ←→ agent-strace proxy (localhost:3100) ←→ Remote MCP Server (HTTPS)
-              ↓
-         .agent-traces/
-```
-
-Same idea, different transport. Listens on a local port, forwards POST and SSE requests to the remote server, captures every JSON-RPC message in both directions.
-
-### Decorator mode
-
-```python
-@trace_tool
-def my_function(x):
-    return x * 2
-```
-
-The decorator logs a `tool_call` event before execution and a `tool_result` after. Errors and timing are captured automatically.
-
-### Secret redaction
-
-When `--redact` is enabled (or `redact=True` in the decorator API), trace events pass through a redaction filter before hitting disk. The filter checks key names (`password`, `api_key`) and value patterns (`sk-*`, `ghp_*`, JWTs). Redacted values become `[REDACTED]`. The original data is never stored.
-
-## Project structure
-
-```
-src/agent_trace/
-  __init__.py       # version
-  models.py         # TraceEvent, SessionMeta, EventType
-  store.py          # NDJSON file storage
-  hooks.py          # Claude Code hooks integration
-  proxy.py          # MCP stdio proxy
-  http_proxy.py     # MCP HTTP/SSE proxy
-  redact.py         # secret redaction (key/value pattern matching)
-  masking.py        # PII masking (email, phone, CC, SSN, ARN)
-  otlp.py           # OTLP/HTTP JSON exporter with GenAI semantic conventions
-  replay.py         # terminal replay, HTML viewer export
-  decorator.py      # @trace_tool, @trace_llm_call, log_decision
-  jsonl_import.py   # Claude Code JSONL session import
-  explain.py        # session phase detection and plain-English summary
-  cost.py           # token and cost estimation
-  subagent.py       # parent-child session tree, tree replay, stats rollup
-  diff.py           # structural, semantic, and side-by-side session comparison
-  why.py            # causal chain tracing (backwards event walk)
-  audit.py          # policy-based tool call checking, sensitive file detection
-  audit_tools.py    # shadow AI detection (file signals + commit patterns)
-  policy.py         # generate .agent-scope.json from observed traces
-  attribution.py    # session attribution (user, process ancestry, git context)
-  dashboard.py      # multi-session aggregate view and trend charts
-  annotate.py       # replay annotations (notes, labels, bookmarks)
-  token_budget.py   # token budget tracking and context window early warning
-  watch.py          # live session watcher with rule-based kill switch
-  share.py          # self-contained HTML report export
-  standup.py        # standup report from session trace (no LLM)
-  freshness.py      # context freshness check vs last session
-  oncall.py         # on-call readiness for agent-modified files
-  curve.py          # personal agent cost-efficiency curve
-  inflation.py      # token inflation calculator across model versions
-  a2a.py            # A2A protocol support and cross-agent trace correlation
-  cli.py            # CLI entry point
-ADRs/               # Architecture Decision Records
-```
-
-## Running tests
-
-```bash
-pytest
-```
-
-## Development
-
-```bash
-git clone https://github.com/Siddhant-K-code/agent-trace.git
-cd agent-trace
-
-# Run tests
-pytest
-
-# Run the example
-PYTHONPATH=src python examples/basic_agent.py
-
-# Replay the example
-PYTHONPATH=src python -m agent_trace.cli replay
-
-# Build the package
-uv build
-
-# Install locally for testing
-uv tool install -e .
-```
-
-## Related
-
-- [AGENTS.md integration guide](docs/agents-md-integration.md) - how to use agent-strace with AGENTS.md for drift detection and CI gating
-- [Architecture Decision Records](ADRs/) - design decisions and their rationale
-- [The agent observability gap (blog)](https://siddhantkhare.com/writing/agent-observability-gap) - the problem this tool addresses
-- [The agent observability gap (thread)](https://x.com/Siddhant_K_code/status/2032834557628788940) - discussion on X
-- [The Agentic Engineering Guide](https://agents.siddhantkhare.com) - chapters 7, 9, 10 cover agent security; chapters 14, 15, 16 cover observability
-- [OpenTelemetry GenAI](https://opentelemetry.io/docs/specs/semconv/gen-ai/) - semantic conventions for LLM tracing (complementary)
-
-## Sponsor
-
-If agent-trace saves you time, consider [sponsoring the project](https://github.com/sponsors/Siddhant-K-code). It helps keep the work going.
-
-## License
-
-MIT. Use it however you want.
+[Sponsor](https://github.com/sponsors/Siddhant-K-code) · [ADRs](ADRs/) · [Security](docs/security.md) · [PyPI](https://pypi.org/project/agent-strace/)
diff --git a/docs/commands.md b/docs/commands.md
new file mode 100644
index 0000000..93b75f9
--- /dev/null
+++ b/docs/commands.md
@@ -0,0 +1,384 @@
+# Command reference
+
+Full flag reference for every `agent-strace` command.
+
+---
+
+## Session capture
+
+### `record`
+```
+agent-strace record [--name NAME] [--redact] [--mask] -- <command>
+```
+Capture an MCP stdio server session. Wraps `<command>` as a transparent proxy.
+
+| Flag | Description |
+|---|---|
+| `--name NAME` | Label for the session |
+| `--redact` | Strip secrets before writing to disk |
+| `--mask` | Mask PII (email, phone, CC, SSN) |
+
+### `record-http`
+```
+agent-strace record-http <url> [--port N] [--redact] [--mask]
+```
+Capture an MCP HTTP/SSE server session. Listens on `--port` (default: 3100) and proxies to `<url>`.
+
+### `setup`
+```
+agent-strace setup [--redact] [--global]
+```
+Print Claude Code hooks config JSON. Add `--global` to write to `~/.claude/settings.json`.
+
+### `import`
+```
+agent-strace import <path.jsonl> [--discover]
+```
+Import a Claude Code JSONL session log. `--discover` lists available sessions in `~/.claude/projects/`.
+
+---
+
+## Replay and inspection
+
+### `replay`
+```
+agent-strace replay [session-id] [--format terminal|html] [--live] [--speed N]
+                    [--filter TYPES] [--limit N] [--expand-subagents] [--tree]
+                    [-o FILE]
+```
+
+| Flag | Description |
+|---|---|
+| `--format html` | Export self-contained HTML viewer |
+| `--live` | Replay with real-time delays |
+| `--speed N` | Speed multiplier for `--live` (default: 1.0) |
+| `--filter TYPES` | Comma-separated event types to show |
+| `--limit N` | Cap at N events |
+| `--expand-subagents` | Inline subagent sessions under parent tool_call |
+| `--tree` | Show session hierarchy without full replay |
+
+### `list`
+```
+agent-strace list
+```
+List all captured sessions with ID, timestamp, duration, tool calls, and errors.
+
+### `inspect`
+```
+agent-strace inspect <session-id>
+```
+Dump full session as JSON (meta + events).
+
+### `stats`
+```
+agent-strace stats [session-id] [--include-subagents]
+```
+Tool call frequency and timing. `--include-subagents` rolls up across the full subagent tree.
+
+---
+
+## Understanding sessions
+
+### `explain`
+```
+agent-strace explain [session-id]
+```
+Plain-English phase breakdown: what the agent did, files touched, retries, wasted time. No LLM required.
+
+### `timeline`
+```
+agent-strace timeline [session-id] [--format text|json] [--model MODEL]
+```
+Structured phase-by-phase view with tool calls, errors, retries, and cost per phase.
+
+| Flag | Default | Description |
+|---|---|---|
+| `--format` | `text` | `text` or `json` |
+| `--model` | `sonnet` | Pricing model: `sonnet`, `opus`, `haiku`, `gpt4`, `gpt4o` |
+
+### `why`
+```
+agent-strace why [session-id] <event-number>
+```
+Trace the causal chain backwards from event `#N`. Run `replay` first to see event numbers.
+
+### `cost`
+```
+agent-strace cost [session-id] [--model MODEL] [--input-price N] [--output-price N]
+```
+Token and dollar cost by phase. Flags wasted spend on failed phases.
+
+| Flag | Default | Description |
+|---|---|---|
+| `--model` | `sonnet` | `sonnet`, `opus`, `haiku`, `gpt4`, `gpt4o` |
+| `--input-price` | — | Custom input price per 1M tokens (requires `--output-price`) |
+| `--output-price` | — | Custom output price per 1M tokens (requires `--input-price`) |
+
+### `diff`
+```
+agent-strace diff <session-a> <session-b> [--semantic] [--compare]
+```
+Compare two sessions structurally.
+
+| Flag | Description |
+|---|---|
+| `--semantic` | Compare by outcome, not event order |
+| `--compare` | Side-by-side table with verdict (cost, duration, tools, errors) |
+
+### `compare`
+```
+agent-strace compare [session-id-a] [session-id-b] [--tag TAG] [--format text|json]
+```
+Regression report with verdict. `--tag` compares the last two sessions whose name contains the tag.
+
+### `token-budget`
+```
+agent-strace token-budget <session-id> [--model MODEL] [--warn-at PCT]
+```
+Check token usage against model context limit.
+
+---
+
+## Control and protection
+
+### `watch`
+```
+agent-strace watch [session-id] [--timeout DURATION] [--budget $N] [--on-violation ACTION]
+                   [--on-death CMD] [--rules FILE] [--stream-to URL]
+                   [--stream-batch-size N] [--stream-flush-interval S]
+                   [--max-context-pct N] [--dry-run]
+```
+Live session monitor with kill-switch rules.
+
+| Flag | Description |
+|---|---|
+| `--timeout DURATION` | Kill after duration (e.g. `30m`, `2h`) |
+| `--budget $N` | Kill when spend exceeds N dollars |
+| `--on-violation kill\|pause\|alert` | Action when a rule fires |
+| `--on-death CMD` | Command to run after kill (receives `{post_mortem_path}`) |
+| `--rules FILE` | JSON rules file |
+| `--stream-to URL` | Stream events to HTTP endpoint in real-time |
+| `--dry-run` | Evaluate rules without acting |
+
+**Rules file format** (`.watch-rules.json`):
+```json
+[
+  { "condition": "cost_usd", "threshold": 0.50, "action": "kill" },
+  { "condition": "file_path", "glob": "**/production.env", "action": "kill" },
+  { "condition": "files_modified", "threshold": 30, "action": "pause" }
+]
+```
+
+### `audit`
+```
+agent-strace audit [session-id] [--policy FILE]
+```
+Check tool calls against a policy file. Flags sensitive file access even without a policy. Exits 1 on violations.
+
+**Policy file** (`.agent-scope.json`):
+```json
+{
+  "files": {
+    "read":  { "allow": ["src/**", "tests/**"], "deny": [".env"] },
+    "write": { "allow": ["src/**"], "deny": [".github/**"] }
+  },
+  "commands": {
+    "allow": ["pytest", "uv run", "cat"],
+    "deny":  ["curl", "wget", "rm -rf"]
+  },
+  "network": { "deny_all": true, "allow": ["localhost"] }
+}
+```
+
+### `policy`
+```
+agent-strace policy [--last N] [--output FILE]
+```
+Generate `.agent-scope.json` from observed traces. Review and tighten before committing.
+
+### `audit-tools`
+```
+agent-strace audit-tools [--repo .] [--since DATE] [--approved TOOLS]
+```
+Detect Shadow MCP servers and undeclared agent activity. No network calls, no API keys.
+
+### `postmortem`
+```
+agent-strace postmortem [session-id]
+```
+View the watchdog post-mortem for a killed session.
+
+---
+
+## Analysis across sessions
+
+### `dashboard`
+```
+agent-strace dashboard [--last N] [--since DATE] [--html FILE] [--trend]
+```
+Multi-session aggregate view. `--trend` shows eval quality and behavioral metrics over time.
+
+```bash
+# Add a timeline annotation (appears as a vertical marker on trend charts)
+agent-strace dashboard annotate --date 2026-05-10 --note "Added retry policy"
+```
+
+### `drift`
+```
+agent-strace drift [--since DURATION] [--baseline FILE] [--save-baseline FILE]
+                   [--threshold N] [--format text|json]
+```
+Detect behavioral drift across sessions. Exits non-zero when drift score exceeds `--threshold` (default: 0.20).
+
+### `lint`
+```
+agent-strace lint [session-id] [--all] [--since DURATION] [--strict] [--format text|json]
+```
+Flag bad behavior patterns: tool loops, reasoning spirals, budget proximity, context saturation, redundant reads, error-retry loops, no-output sessions.
+
+`--strict` exits 1 on any WARN or ERROR. Configure rules via `.agent-strace-lint.json`.
+
+### `eval`
+```
+agent-strace eval [session-id] [--format text|json]
+agent-strace eval compare <session-a> <session-b>
+agent-strace eval ci
+```
+Score a session against configurable criteria. `eval ci` exits non-zero if any scorer fails.
+
+Configure scorers in `.agent-evals.yaml`:
+```yaml
+scorers:
+  - type: no_errors
+  - type: cost_under
+    max_dollars: 0.10
+  - type: files_scoped
+    allowed_paths: ["src/", "tests/"]
+```
+
+### `budget-report`
+```
+agent-strace budget-report [--since DATE] [--until DATE] [--format text|markdown|json]
+```
+Weekly spend digest: total cost, top sessions, cost by tool, watchdog savings.
+
+### `standup`
+```
+agent-strace standup [--session SESSION_ID]
+```
+Structured standup from a session trace. No LLM required. Covers files touched, approaches tried, dependencies added, TODOs written.
+
+### `freshness`
+```
+agent-strace freshness [--since DATE] [--scope GLOB]
+```
+Check how stale the agent's last view of the codebase is. Reports files changed since last session and a freshness score 0–100.
+
+### `oncall`
+```
+agent-strace oncall --rotation-start DATE [--scope GLOB]
+```
+Cross-reference agent-modified files against git history to find gaps before a rotation.
+
+### `curve`
+```
+agent-strace curve [--min-sessions N] [--export csv]
+```
+Personal agent cost-efficiency curve by task type. Verdict per type: efficient / over sweet spot / do this yourself.
+
+### `inflation`
+```
+agent-strace inflation [--compare MODELS] [--sessions N]
+```
+Measure tokenizer cost impact of switching model versions. No API calls required.
+
+### `optimize`
+```
+agent-strace optimize [--target FILE] [--dataset NAME] [--apply]
+                      [--base-url URL] [--model MODEL] [--api-key KEY]
+```
+Cluster failures by root cause and propose additions to `AGENTS.md` or any instruction file. Three built-in heuristic patterns require no LLM.
+
+### `config-watch`
+```
+agent-strace config-watch snapshot [--label TEXT] [--watch PATH]
+agent-strace config-watch check [--format text|json] [--watch PATH]
+agent-strace config-watch history [--format text|json]
+agent-strace config-watch affected [--since DURATION] [--format text|json]
+```
+Track changes to AGENTS.md and other config files. `check` exits 1 when config has changed (CI gate).
+
+---
+
+## Export and integration
+
+### `export`
+```
+agent-strace export <session-id> [--format json|csv|ndjson|otlp|otlp-genai]
+                    [--endpoint URL] [--header KEY:VALUE] [--service-name NAME]
+                    [--anonymize] [--scores] [--metrics] [--backend otlp|langfuse]
+                    [--since DURATION] [--langfuse-host URL]
+```
+Export a session. See [production.md](production.md) for per-backend OTLP setup.
+
+### `share`
+```
+agent-strace share <session-id> [-o FILE]
+```
+Generate a self-contained HTML report. No server needed.
+
+### `sample`
+```
+agent-strace sample [--strategy worst|diverse|recent|random] [--n N]
+                    [--deduplicate] [--seed N] [--output FILE]
+```
+Export sessions as JSONL for eval datasets. Compatible with LangSmith, Braintrust, and custom eval frameworks.
+
+### `server`
+```
+agent-strace server [--port N] [--storage DIR]
+```
+Start a server-side event collector. See [server.md](server.md).
+
+### `auto`
+```
+agent-strace auto [--framework NAME] [--detect] -- <command>
+```
+Run a command with auto-instrumentation. See [integrations.md](integrations.md).
+
+### `mcp`
+```
+agent-strace mcp [--transport stdio|http] [--port N]
+```
+Start an MCP server that exposes your session store as queryable tools for a debugging agent.
+
+### `a2a-tree`
+```
+agent-strace a2a-tree [session-id] [--format text|json]
+```
+Visualise the A2A agent call graph. Exports as OTLP spans for Jaeger, Tempo, or any OpenTelemetry backend.
+
+---
+
+## Annotations and metadata
+
+### `annotate`
+```
+agent-strace annotate <session-id> <event-offset> [--note TEXT] [--label TEXT]
+                      [--bookmark] [--list] [--delete ANNOTATION_ID]
+```
+Add notes, labels, and bookmarks to session events. Annotations appear in shared HTML reports.
+
+### `retention`
+```
+agent-strace retention status
+agent-strace retention clean [--dry-run] [--max-age-days N] [--max-sessions N] [--max-size-mb N]
+```
+Enforce data retention policies. Configure via `.agent-strace.yaml`:
+```yaml
+retention:
+  max_age_days: 30
+  max_sessions: 1000
+  max_size_mb: 500
+  on_delete: log
+```
diff --git a/docs/integrations.md b/docs/integrations.md
new file mode 100644
index 0000000..7e964fb
--- /dev/null
+++ b/docs/integrations.md
@@ -0,0 +1,136 @@
+# Auto-instrumentation
+
+Instrument any supported agent framework without modifying application code.
+
+---
+
+## Quick start
+
+```bash
+# Instrument a specific framework
+agent-strace auto --framework langchain -- python my_agent.py
+
+# Auto-detect all installed frameworks
+agent-strace auto --detect -- python my_agent.py
+
+# Via environment variable (no CLI wrapper needed)
+AGENT_STRACE_AUTO_INSTRUMENT=langchain,litellm python my_agent.py
+```
+
+Or in code:
+
+```python
+from agent_trace.integrations import instrument_langchain
+instrument_langchain()
+```
+
+---
+
+## Supported frameworks
+
+| Framework | Install | What's traced |
+|---|---|---|
+| OpenAI Agents SDK | `pip install agent-strace[openai-agents]` | `Runner.run`, `FunctionTool` calls |
+| LangChain / LangGraph | `pip install agent-strace[langchain]` | `BaseTool._run`, `BaseChatModel._generate` |
+| LiteLLM | `pip install agent-strace[litellm]` | `litellm.completion` |
+| Anthropic SDK | `pip install anthropic` | `messages.create` |
+| OpenAI SDK | `pip install openai` | `chat.completions.create` |
+| AWS Strands | `pip install agent-strace[strands]` | `Agent.__call__`, `BaseTool.invoke` |
+
+Install all integrations at once:
+
+```bash
+pip install agent-strace[all-integrations]
+```
+
+Each integration is an optional extra — the core package stays dependency-free. See [ADR-0003](../ADRs/0003-zero-runtime-dependencies.md).
+
+---
+
+## OpenAI Agents SDK
+
+```python
+from agent_trace.integrations import instrument_openai_agents
+instrument_openai_agents()
+
+# Now use the SDK normally
+from agents import Agent, Runner
+agent = Agent(name="my-agent", instructions="...")
+result = Runner.run_sync(agent, "Do the task")
+```
+
+Traces: `Runner.run`, `Runner.run_sync`, `Runner.run_streamed`, all `FunctionTool` calls.
+
+---
+
+## LangChain / LangGraph
+
+```python
+from agent_trace.integrations import instrument_langchain
+instrument_langchain()
+
+# Now use LangChain normally
+from langchain_anthropic import ChatAnthropic
+llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")
+```
+
+Traces: `BaseTool._run`, `BaseChatModel._generate`, `BaseChatModel._stream`.
+
+---
+
+## LiteLLM
+
+```python
+from agent_trace.integrations import instrument_litellm
+instrument_litellm()
+
+import litellm
+response = litellm.completion(model="gpt-4o", messages=[...])
+```
+
+Traces: `litellm.completion`, `litellm.acompletion`.
+
+---
+
+## Anthropic SDK
+
+```python
+from agent_trace.integrations import instrument_anthropic
+instrument_anthropic()
+
+import anthropic
+client = anthropic.Anthropic()
+message = client.messages.create(model="claude-3-5-sonnet-20241022", ...)
+```
+
+Traces: `messages.create`, `messages.stream`.
+
+---
+
+## OpenAI SDK
+
+```python
+from agent_trace.integrations import instrument_openai
+instrument_openai()
+
+from openai import OpenAI
+client = OpenAI()
+response = client.chat.completions.create(model="gpt-4o", messages=[...])
+```
+
+Traces: `chat.completions.create`, `chat.completions.stream`.
+
+---
+
+## AWS Strands
+
+```python
+from agent_trace.integrations import instrument_strands
+instrument_strands()
+
+from strands import Agent
+agent = Agent(tools=[...])
+result = agent("Do the task")
+```
+
+Traces: `Agent.__call__`, `BaseTool.invoke`.
diff --git a/docs/production.md b/docs/production.md
new file mode 100644
index 0000000..543cb21
--- /dev/null
+++ b/docs/production.md
@@ -0,0 +1,136 @@
+# Production tracing (OTLP export)
+
+Export sessions as OpenTelemetry spans to your existing observability stack. Sessions become traces. Tool calls become spans with duration and inputs. Errors get exception events. No new dependencies.
+
+---
+
+## OTel GenAI semantic conventions
+
+Use `--format otlp-genai` to export with [OTel GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/). This produces AI-native spans that populate token usage charts, cost views, and LLM dashboards automatically.
+
+```bash
+agent-strace export <session-id> --format otlp-genai \
+  --endpoint http://localhost:4318
+```
+
+| Aspect | `--format otlp` | `--format otlp-genai` |
+|---|---|---|
+| LLM calls | Events on root span | `gen_ai.client.operation` child spans |
+| Tool calls | `tool/<name>` spans | `gen_ai.tool.call/<name>` spans |
+| Root span | `agent.name` attribute | `gen_ai.agent.id` + `gen_ai.agent.name` |
+| Errors | Custom error span | OTel `exception` event format |
+
+`--format otlp` is unchanged for backwards compatibility. See [ADR-0011](../ADRs/0011-otlp-genai-semantic-conventions.md) for design rationale.
+
+---
+
+## Per-backend setup
+
+### Datadog
+
+```bash
+# Via the Datadog Agent's OTLP receiver (port 4318)
+agent-strace export <session-id> --format otlp \
+  --endpoint http://localhost:4318
+
+# Via Datadog's OTLP intake directly
+agent-strace export <session-id> --format otlp \
+  --endpoint https://http-intake.logs.datadoghq.com:443 \
+  --header "DD-API-KEY: $DD_API_KEY"
+```
+
+### Honeycomb
+
+```bash
+agent-strace export <session-id> --format otlp \
+  --endpoint https://api.honeycomb.io \
+  --header "x-honeycomb-team: $HONEYCOMB_API_KEY" \
+  --service-name my-agent
+```
+
+### Grafana Tempo / Jaeger
+
+```bash
+# Local collector
+agent-strace export <session-id> --format otlp \
+  --endpoint http://localhost:4318
+```
+
+### New Relic
+
+```bash
+agent-strace export <session-id> --format otlp \
+  --endpoint https://otlp.nr-data.net \
+  --header "api-key: $NEW_RELIC_LICENSE_KEY"
+```
+
+### Splunk
+
+```bash
+agent-strace export <session-id> --format otlp \
+  --endpoint https://ingest.<realm>.signalfx.com \
+  --header "X-SF-Token: $SPLUNK_ACCESS_TOKEN"
+```
+
+### Langfuse
+
+Sessions appear as Traces, tool calls as Spans, LLM calls as Generations, and eval scores as Langfuse Scores.
+
+```bash
+export LANGFUSE_PUBLIC_KEY=pk-lf-...
+export LANGFUSE_SECRET_KEY=sk-lf-...
+
+# Export latest session with eval scores
+agent-strace export --scores --backend langfuse
+
+# Export last 7 days
+agent-strace export --since 7d --scores --backend langfuse
+
+# Self-hosted Langfuse
+agent-strace export --scores --backend langfuse \
+  --langfuse-host https://langfuse.your-domain.com
+```
+
+---
+
+## Behavioral metrics
+
+Export per-session behavioral metrics as OTLP gauge metrics:
+
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
+
+agent-strace export --metrics --backend otlp --since 30d
+```
+
+| Metric | Description |
+|---|---|
+| `agent_strace.session.cost_usd` | Estimated cost per session |
+| `agent_strace.session.error_rate` | Errors / tool calls |
+| `agent_strace.session.retry_rate` | Consecutive same-tool retries / tool calls |
+| `agent_strace.session.blast_radius` | Distinct files written |
+| `agent_strace.session.duration_s` | Wall-clock session duration |
+| `agent_strace.eval.score` | Judge score per session |
+
+---
+
+## How sessions map to OpenTelemetry
+
+| agent-trace | OpenTelemetry |
+|---|---|
+| session | trace |
+| tool_call + tool_result | span (with duration) |
+| error | span with error status + exception event |
+| user_prompt | event on root span |
+| assistant_response | event on root span |
+| session_id | trace ID |
+| event_id | span ID |
+| parent_id | parent span ID |
+
+---
+
+## Inspect the OTLP payload
+
+```bash
+agent-strace export <session-id> --format otlp > trace.json
+```
diff --git a/docs/security.md b/docs/security.md
new file mode 100644
index 0000000..cd09d64
--- /dev/null
+++ b/docs/security.md
@@ -0,0 +1,180 @@
+# Security
+
+agent-strace provides two complementary mechanisms for keeping sensitive data out of traces: secret redaction (at capture time) and PII anonymization (at export time). A policy file lets you audit and restrict what the agent is allowed to do.
+
+---
+
+## Secret redaction
+
+Strips API keys, tokens, and credentials before they hit disk. The original data is never stored.
+
+```bash
+# Enable redaction when capturing
+agent-strace record --redact -- npx -y @modelcontextprotocol/server-filesystem /tmp
+agent-strace record-http https://mcp.example.com --redact
+
+# Or via setup (Claude Code hooks)
+agent-strace setup --redact
+```
+
+Detected patterns:
+
+| Secret type | Pattern |
+|---|---|
+| OpenAI API keys | `sk-*` |
+| GitHub tokens | `ghp_*`, `github_pat_*` |
+| AWS credentials | `AKIA*`, `aws_secret_access_key` |
+| Anthropic API keys | `sk-ant-*` |
+| Slack tokens | `xox*` |
+| JWTs | Three base64 segments separated by `.` |
+| Bearer tokens | `Bearer [A-Za-z0-9+/=]{20,}` |
+| Connection strings | `postgres://`, `mysql://`, `mongodb://` |
+| Key-named values | Any value under keys: `password`, `secret`, `token`, `api_key`, `authorization` |
+| EKM shared secrets | 64-char hex strings |
+| Private keys | PEM blocks |
+
+Redacted values become `[REDACTED]`. See [ADR-0007](../ADRs/0007-heuristic-redaction.md) for design rationale.
+
+### Custom patterns
+
+```bash
+agent-strace setup --redact --redact-pattern "ATTESTATION_KEY=[A-Fa-f0-9]{64}"
+```
+
+---
+
+## PII masking
+
+Masks personally identifiable information before it hits disk. Separate from secret redaction — use both for maximum coverage.
+
+```bash
+agent-strace record --mask -- npx -y @modelcontextprotocol/server-filesystem /tmp
+agent-strace record-http https://mcp.example.com --mask
+```
+
+Masked by default: email addresses, phone numbers, credit card numbers, US Social Security Numbers, AWS ARNs.
+
+Call `mask_event_data()` directly to sanitise events from an existing session before sharing or exporting:
+
+```python
+from agent_trace.masking import mask_event_data
+sanitised = mask_event_data(event)
+```
+
+---
+
+## Trace anonymization
+
+Strip identifying information from traces at export time. Original session data is never modified.
+
+```bash
+# Preview what would be anonymized
+agent-strace export SESSION_ID --anonymize --dry-run
+
+# Export with anonymization applied
+agent-strace export SESSION_ID --anonymize --output trace-anon.json
+```
+
+Anonymized by default:
+- Home directory paths → `~/relative/path`
+- Hostnames → `<hostname>`
+- OS usernames → `<user>`
+- Email addresses → `<email>`
+
+Add custom patterns via `.agent-strace/anonymize.yaml`:
+
+```yaml
+rules:
+  - pattern: "ACME Corp"
+    replacement: "<company>"
+  - pattern: "192\\.168\\.\\d+\\.\\d+"
+    replacement: "<internal-ip>"
+```
+
+---
+
+## Policy files
+
+Audit and restrict what the agent is allowed to do. Exits 1 on violations — usable in CI.
+
+```bash
+agent-strace audit                          # latest session, no policy required
+agent-strace audit abc123 --policy .agent-scope.json
+
+# CI gate
+agent-strace audit --policy .agent-scope.json || exit 1
+```
+
+**Policy file** (`.agent-scope.json`):
+
+```json
+{
+  "files": {
+    "read":  { "allow": ["src/**", "tests/**"], "deny": [".env"] },
+    "write": { "allow": ["src/**"], "deny": [".github/**"] }
+  },
+  "commands": {
+    "allow": ["pytest", "uv run", "cat"],
+    "deny":  ["curl", "wget", "rm -rf"]
+  },
+  "network": { "deny_all": true, "allow": ["localhost"] }
+}
+```
+
+Glob patterns support `**` as a recursive wildcard. File read policy applies to `Read`, `View`, `Grep`, and `Glob` tool calls. Network policy checks URLs embedded in `Bash` commands.
+
+### Auto-generate a policy from traces
+
+```bash
+# Dry-run: print the suggested policy
+agent-strace policy
+
+# Write it to disk
+agent-strace policy --output .agent-scope.json
+
+# Observe a specific set of sessions
+agent-strace policy --last 20 --output .agent-scope.json
+```
+
+---
+
+## Shadow AI detection
+
+Detect undeclared agent activity and Shadow MCP servers in any repo. No network calls, no API keys.
+
+```bash
+agent-strace audit-tools
+agent-strace audit-tools --repo . --since "90 days ago" --approved cursor,copilot
+```
+
+Detected tools: Claude Code, Cursor, GitHub Copilot, Codex/ChatGPT, Windsurf, Aider, Gemini CLI. Identified via file signals (`.cursorrules`, `CLAUDE.md`, `.github/copilot-instructions.md`, etc.) and commit message patterns.
+
+---
+
+## Recommended setup for sensitive repos
+
+Commit `.claude/settings.json` to the repo root so every developer gets the same instrumentation:
+
+```json
+{
+  "hooks": {
+    "PreToolUse": [{
+      "matcher": ".*",
+      "hooks": [{ "type": "command", "command": "agent-strace hook pre-tool" }]
+    }],
+    "PostToolUse": [{
+      "matcher": ".*",
+      "hooks": [{ "type": "command", "command": "agent-strace hook post-tool" }]
+    }]
+  }
+}
+```
+
+Or use the setup command:
+
+```bash
+cd your-sensitive-repo
+agent-strace setup --redact
+```
+
+Combine with [agentic-authz](https://github.com/Siddhant-K-code/agentic-authz) to block agents from security-critical components entirely, and use agent-strace to audit everything they do access.
diff --git a/docs/server.md b/docs/server.md
new file mode 100644
index 0000000..4b34d62
--- /dev/null
+++ b/docs/server.md
@@ -0,0 +1,79 @@
+# Server-side event collector
+
+Run a central collector so agents in containers, CI, and serverless functions can send traces over the network — no local disk required.
+
+See [ADR-0012](../ADRs/0012-server-side-event-collector.md) for design rationale.
+
+---
+
+## Quick start
+
+```bash
+# Start the collector
+agent-strace server --port 4317 --storage ./traces
+
+# Agents point to it via environment variable — no code changes required
+AGENT_STRACE_ENDPOINT=http://collector:4317 python my_agent.py
+```
+
+The server writes traces in the same `.agent-traces/` format as local mode. All existing CLI commands work against its storage directory.
+
+---
+
+## Docker
+
+```dockerfile
+FROM python:3.12-slim
+RUN pip install agent-strace
+ENV AGENT_STRACE_STORAGE=/data
+VOLUME /data
+EXPOSE 4317
+CMD ["agent-strace", "server", "--port", "4317"]
+```
+
+```bash
+docker build -t agent-strace-server .
+docker run -p 4317:4317 -v $(pwd)/traces:/data agent-strace-server
+```
+
+---
+
+## API reference
+
+| Method | Path | Description |
+|---|---|---|
+| `POST` | `/events` | Receive a batch of NDJSON events |
+| `POST` | `/sessions` | Create or update session metadata |
+| `GET` | `/sessions` | List all sessions |
+| `GET` | `/sessions/<id>/events` | Stream events for a session |
+| `GET` | `/health` | Liveness check |
+
+Events are accepted as NDJSON (`application/x-ndjson`), one event per line.
+
+---
+
+## Multi-agent correlation
+
+When multiple agents send to the same collector, sessions are linked via `parent_session_id` and `parent_event_id` in session metadata. Use `agent-strace replay --tree` or `agent-strace a2a-tree` to visualise the full call graph.
+
+---
+
+## Security note
+
+No authentication in v1 — intended for internal/private network use. Add a reverse proxy (nginx, Caddy) for auth and TLS.
+
+---
+
+## Live streaming from watch
+
+Stream events to the collector in real-time during a watched session:
+
+```bash
+agent-strace watch \
+  --stream-to http://collector:4317/events \
+  --stream-batch-size 20 \
+  --stream-flush-interval 5.0 \
+  SESSION_ID
+```
+
+HTTP failures are logged to stderr but never interrupt the watch loop.
diff --git a/docs/setup.md b/docs/setup.md
new file mode 100644
index 0000000..5fc0ac3
--- /dev/null
+++ b/docs/setup.md
@@ -0,0 +1,167 @@
+# Setup
+
+Three ways to capture agent sessions. Pick the one that matches your agent.
+
+---
+
+## Option 1: Claude Code hooks (recommended)
+
+Captures everything: user prompts, assistant responses, and every tool call (Bash, Edit, Write, Read, Agent, Grep, Glob, WebFetch, WebSearch, all MCP tools).
+
+```bash
+# Generate and apply hooks config
+agent-strace setup
+
+# For all projects (global config)
+agent-strace setup --global
+
+# With secret redaction enabled
+agent-strace setup --redact
+```
+
+`agent-strace setup` prints the hooks JSON. Add it to `.claude/settings.json`:
+
+```json
+{
+  "hooks": {
+    "UserPromptSubmit": [{ "hooks": [{ "type": "command", "command": "agent-strace hook user-prompt" }] }],
+    "PreToolUse": [{ "matcher": "", "hooks": [{ "type": "command", "command": "agent-strace hook pre-tool" }] }],
+    "PostToolUse": [{ "matcher": "", "hooks": [{ "type": "command", "command": "agent-strace hook post-tool" }] }],
+    "PostToolUseFailure": [{ "matcher": "", "hooks": [{ "type": "command", "command": "agent-strace hook post-tool-failure" }] }],
+    "Stop": [{ "hooks": [{ "type": "command", "command": "agent-strace hook stop" }] }],
+    "SessionStart": [{ "hooks": [{ "type": "command", "command": "agent-strace hook session-start" }] }],
+    "SessionEnd": [{ "hooks": [{ "type": "command", "command": "agent-strace hook session-end" }] }]
+  }
+}
+```
+
+Then use Claude Code normally. Sessions appear in `.agent-traces/`.
+
+```bash
+agent-strace list     # list sessions
+agent-strace replay   # replay the latest
+agent-strace explain  # plain-English summary
+```
+
+### Import existing sessions
+
+Already ran sessions without hooks? Import from Claude Code's native JSONL logs:
+
+```bash
+# Discover available sessions
+agent-strace import --discover
+
+# Import a specific session
+agent-strace import ~/.claude/projects/<project>/<session-id>.jsonl
+```
+
+---
+
+## Option 2: MCP proxy (any MCP client)
+
+Wraps any MCP server. Works with Cursor, Windsurf, or any MCP client that uses stdio transport.
+
+```bash
+# Wrap any MCP server
+agent-strace record -- npx -y @modelcontextprotocol/server-filesystem /tmp
+agent-strace replay
+```
+
+### Cursor
+
+Edit `~/.cursor/mcp.json` (global) or `.cursor/mcp.json` (per-project):
+
+```json
+{
+  "mcpServers": {
+    "filesystem": {
+      "command": "agent-strace",
+      "args": ["record", "--name", "filesystem", "--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
+    }
+  }
+}
+```
+
+### Windsurf
+
+Edit `~/.codeium/windsurf/mcp_config.json`:
+
+```json
+{
+  "mcpServers": {
+    "filesystem": {
+      "command": "agent-strace",
+      "args": ["record", "--name", "filesystem", "--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
+    }
+  }
+}
+```
+
+### Any MCP client (general pattern)
+
+1. Replace the server `command` with `agent-strace`
+2. Prepend `record --name <label> --` to the original args
+3. Use the tool normally
+4. Run `agent-strace replay` to see what happened
+
+### HTTP/SSE proxy
+
+For MCP servers that use HTTP transport:
+
+```bash
+agent-strace record-http https://mcp.example.com --port 3100
+# Your agent connects to http://127.0.0.1:3100
+```
+
+---
+
+## Option 3: Python decorator
+
+Wraps your tool functions directly. No MCP required.
+
+```python
+from agent_trace import trace_tool, trace_llm_call, start_session, end_session, log_decision
+
+start_session(name="my-agent")  # add redact=True to strip secrets
+
+@trace_tool
+def search_codebase(query: str) -> str:
+    return search(query)
+
+@trace_llm_call
+def call_llm(messages: list, model: str = "claude-4") -> str:
+    return client.chat(messages=messages, model=model)
+
+log_decision(
+    choice="read_file_first",
+    reason="Need to understand current implementation before making changes",
+    alternatives=["read_file_first", "search_codebase", "write_fix_directly"],
+)
+
+search_codebase("authenticate")
+call_llm([{"role": "user", "content": "Fix the bug"}])
+
+meta = end_session()
+print(f"Replay with: agent-strace replay {meta.session_id}")
+```
+
+---
+
+## Security-sensitive repos
+
+For repos that handle secrets, attestation logic, or cryptographic material:
+
+```bash
+cd your-sensitive-repo
+agent-strace setup --redact
+```
+
+This enables automatic redaction of API keys, tokens, and credentials before they hit disk. Detected patterns: OpenAI (`sk-*`), GitHub (`ghp_*`, `github_pat_*`), AWS (`AKIA*`), Anthropic (`sk-ant-*`), Slack (`xox*`), JWTs, Bearer tokens, connection strings, and any value under keys like `password`, `secret`, `token`, `api_key`, `authorization`.
+
+Add custom patterns:
+
+```bash
+agent-strace setup --redact --redact-pattern "ATTESTATION_KEY=[A-Fa-f0-9]{64}"
+```
+
+See [security.md](security.md) for the full security guide.
diff --git a/docs/vscode.md b/docs/vscode.md
new file mode 100644
index 0000000..a0964d4
--- /dev/null
+++ b/docs/vscode.md
@@ -0,0 +1,105 @@
+# VS Code extension
+
+The **agent-strace** extension shows live session activity without leaving the editor. Works in VS Code, Cursor, and any Open VSX-compatible editor.
+
+- [Open VSX](https://open-vsx.org/extension/Siddhant-K-code/agent-strace)
+- [VS Marketplace](https://marketplace.visualstudio.com/items?itemName=Siddhant-K-code.agent-strace)
+
+---
+
+## Features
+
+| Feature | Description |
+|---|---|
+| Status bar | Live cost, tool call count, and active tool name. Click to open the event stream. |
+| Gutter annotations | Blue border on files the agent read, amber on files it modified. Inline label shows read/write counts. |
+| Event stream panel | Live feed in the Explorer sidebar: every tool call, file op, LLM request, and error. |
+| Pause button | Stops the agent mid-session via SIGSTOP. Requires `agent-strace watch` running in a terminal. |
+| Watchdog status bar | Polls the active session for cost and tool count; updates every 5 seconds (configurable). |
+| Post-mortem viewer | Auto-opens when a session is killed by the watchdog. Shows kill reason, cost at death, and a copyable recovery context. |
+| Session browser | Explorer sidebar tree listing all sessions with timestamp, duration, tool calls, and error count. |
+
+---
+
+## Setup
+
+```bash
+# 1. Install agent-strace
+pip install agent-strace
+
+# 2. Add hooks to Claude Code (one-time)
+agent-strace setup
+
+# 3. Open your project in VS Code / Cursor
+# The extension activates automatically when .agent-traces/ exists
+
+# 4. Start Claude Code — the status bar item appears immediately
+```
+
+The extension activates automatically when a `.agent-traces/` directory exists in the workspace root. No configuration required.
+
+---
+
+## Commands
+
+All commands are available from the Command Palette (`Cmd/Ctrl+Shift+P`):
+
+| Command | Description |
+|---|---|
+| `agent-trace: Open Live Stream` | Open the event stream panel |
+| `agent-trace: Open Post-Mortem` | View the watchdog post-mortem for the latest killed session |
+| `agent-trace: Refresh Session Browser` | Reload the session list in the Explorer sidebar |
+| `agent-trace: Reveal Session` | Jump to a session in the browser |
+| `agent-trace: Pause Agent` | Send SIGSTOP to the agent process (requires `watch` running) |
+| `agent-trace: Resume Agent` | Send SIGCONT to resume a paused agent |
+| `agent-trace: Open Panel` | Open the main agent-strace panel |
+| `agent-trace: Clear Decorations` | Remove all gutter annotations from the editor |
+
+---
+
+## Pause / resume
+
+The pause button requires `agent-strace watch` running in a separate terminal:
+
+```bash
+# In a separate terminal, start the watcher
+agent-strace watch
+
+# Then use the Pause button in the event stream panel,
+# or run: agent-trace: Pause Agent from the command palette
+```
+
+When paused, the agent process receives SIGSTOP and freezes. Resume with the Resume command or SIGCONT.
+
+---
+
+## Settings
+
+All settings are under `agentTrace.*` in VS Code settings:
+
+| Setting | Default | Description |
+|---|---|---|
+| `agentTrace.traceDir` | `.agent-traces` | Path to trace directory, relative to workspace root |
+| `agentTrace.collectorEndpoint` | `""` | Remote collector URL (leave empty for local mode) |
+| `agentTrace.watchdogPollIntervalSeconds` | `5` | How often (in seconds) the status bar polls for cost/tool updates |
+| `agentTrace.sessionBrowserRefreshInterval` | `5` | How often (in seconds) the session browser refreshes |
+| `agentTrace.showGutterAnnotations` | `true` | Show gutter icons on files the agent read or modified |
+| `agentTrace.showInlineText` | `true` | Show inline read/write counts at the top of agent-touched files |
+
+---
+
+## Post-mortem viewer
+
+When `agent-strace watch` kills a session, a `watchdog-postmortem.json` is written to the session directory. The extension detects this file and offers to open the post-mortem viewer automatically.
+
+The viewer shows:
+- Kill reason (timeout, budget, rule)
+- Elapsed time and cost at death
+- Last tool call and LLM response
+- Recovery context (copyable, for pasting into a new session)
+
+---
+
+## Session browser
+
+The session browser appears in the Explorer sidebar when `.agent-traces/` exists. It lists all sessions with timestamp, duration, tool calls, and error count. Click any session to open a summary. Use `Reveal Session` to jump to a specific session by ID prefix.