From d3783ae917825c8bcb04aba9943af8e57afe8f20 Mon Sep 17 00:00:00 2001 From: Siddhant Khare Date: Sat, 23 May 2026 08:56:40 +0000 Subject: [PATCH] =?UTF-8?q?docs:=20overhaul=20README=20=E2=80=94=20landing?= =?UTF-8?q?=20page=20structure,=20move=20detail=20to=20docs/?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit README was 1911 lines with 93 sections. Rewrote as a landing page: install, quick start, command table, production setup. Moved detailed docs to docs/ directory (setup, commands, production, server, integrations, vscode, security). Co-authored-by: Ona --- AGENTS.md | 31 +- README.md | 1899 ++---------------------------------------- docs/commands.md | 384 +++++++++ docs/integrations.md | 136 +++ docs/production.md | 136 +++ docs/security.md | 180 ++++ docs/server.md | 79 ++ docs/setup.md | 167 ++++ docs/vscode.md | 105 +++ 9 files changed, 1301 insertions(+), 1816 deletions(-) create mode 100644 docs/commands.md create mode 100644 docs/integrations.md create mode 100644 docs/production.md create mode 100644 docs/security.md create mode 100644 docs/server.md create mode 100644 docs/setup.md create mode 100644 docs/vscode.md diff --git a/AGENTS.md b/AGENTS.md index 361571e..7252246 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -8,7 +8,7 @@ This file tells AI coding agents how to work with the agent-strace repository. src/agent_trace/ Core library — one module per feature tests/ One test file per module (test_.py) ADRs/ Architecture Decision Records — read before adding dependencies -docs/ Integration guides +docs/ User-facing documentation (setup, commands, production, integrations, security) examples/ Usage examples for each integration pyproject.toml Package config and optional extras ``` @@ -41,15 +41,36 @@ python -m pytest tests/test_watch.py -v 3. Import and register it in `cli.py` 4. Add new `EventType` values to `models.py` if needed 5. Write tests in `tests/test_.py` -6. Update README.md with the new command and an example +6. Add the command to the appropriate table in `README.md` (one line, linked to docs/) +7. Add full flag reference to `docs/commands.md` +8. If the feature involves production export, update `docs/production.md` +9. If the feature involves a new integration, update `docs/integrations.md` ## Version bumping -- New feature (new CLI command, new integration, new flag): bump minor (`0.38.1` → `0.39.0`) -- Bug fix or small improvement: bump patch (`0.38.1` → `0.38.2`) +Current version: `0.51.0` in `src/agent_trace/__init__.py`. + +- New feature (new command, new flag, new integration): bump minor (`0.51.0` → `0.52.0`) +- Bug fix or small improvement: bump patch (`0.51.0` → `0.51.1`) - Breaking change to CLI or storage format: bump major — check with maintainer first -Version is in `src/agent_trace/__init__.py`. +## docs/ structure + +The `docs/` directory contains user-facing documentation. Keep these files current when adding features: + +| File | Contents | +|---|---| +| `docs/setup.md` | Full setup for all 3 integration paths, complete JSON configs | +| `docs/commands.md` | Full flag reference for every command | +| `docs/production.md` | Per-backend OTLP setup (Datadog, Honeycomb, Grafana, etc.) | +| `docs/server.md` | Server-side collector setup, Docker, API reference | +| `docs/integrations.md` | Auto-instrumentation for each framework | +| `docs/vscode.md` | VS Code extension docs (setup, commands, settings) | +| `docs/security.md` | Secret redaction, PII anonymization, policy files | + +## README policy + +`README.md` is a landing page, not documentation. It must stay under 300 lines. Detailed content goes in `docs/`. Do not add detailed flag descriptions, long examples, or configuration dumps to `README.md`. ## ADRs to read before making architectural decisions diff --git a/README.md b/README.md index b1e134d..d70ea74 100644 --- a/README.md +++ b/README.md @@ -3,12 +3,12 @@ [![Run in Ona](https://ona.com/run-in-ona.svg)](https://app.ona.com/#https://github.com/Siddhant-K-code/agent-trace) [![PyPI](https://img.shields.io/pypi/v/agent-strace)](https://pypi.org/project/agent-strace/) [![Python](https://img.shields.io/pypi/pyversions/agent-strace)](https://pypi.org/project/agent-strace/) -[![License](https://img.shields.io/github/license/Siddhant-K-code/agent-trace)](LICENSE) [![CI](https://github.com/Siddhant-K-code/agent-trace/actions/workflows/test.yml/badge.svg)](https://github.com/Siddhant-K-code/agent-trace/actions/workflows/test.yml) [![Open VSX](https://img.shields.io/open-vsx/v/Siddhant-K-code/agent-strace)](https://open-vsx.org/extension/Siddhant-K-code/agent-strace) -[![VS Marketplace](https://img.shields.io/badge/VS%20Marketplace-v0.1.2-blue?logo=visual-studio-code)](https://marketplace.visualstudio.com/items?itemName=Siddhant-K-code.agent-strace) +[![VS Marketplace](https://img.shields.io/badge/VS%20Marketplace-v0.2.0-blue?logo=visual-studio-code)](https://marketplace.visualstudio.com/items?itemName=Siddhant-K-code.agent-strace) +[![License](https://img.shields.io/github/license/Siddhant-K-code/agent-trace)](LICENSE) -`strace` for AI agents. Capture and replay every tool call, prompt, and response from Claude Code, Cursor, Gemini CLI, or any MCP client. Analyse, diff, audit, and share what happened. +`strace` for AI agents. ![demo](assets/demo.svg) @@ -16,9 +16,7 @@ A coding agent rewrites 20 files in a background session. You get a pull request. You do not get the story. Which files did it read first? Why did it call the same tool three times? What failed before it found the fix? -Most tools trace LLM calls. That is one layer. The gap is everything around it: tool calls, file operations, decision points, error recovery, the actual commands the agent ran. `agent-strace` captures the full session and lets you replay it later. Export to Datadog, Honeycomb, New Relic, or Splunk for production observability. - -Set rules to stop the agent: cost ceiling, wrong file touched, too many tool calls. The agent stops. No prompt, no retry, no damage. +Most tools trace LLM calls. That is one layer. The gap is everything around it: tool calls, file operations, decision points, error recovery, the actual commands the agent ran. `agent-strace` captures the full session and lets you replay it later. Export to Datadog, Honeycomb, New Relic, or Splunk for production observability. Set rules to stop the agent: cost ceiling, wrong file touched, too many tool calls. The agent stops. No prompt, no retry, no damage. ## Install @@ -35,1877 +33,156 @@ uvx agent-strace replay **Zero dependencies.** Python 3.10+ standard library only. -## VS Code / Cursor extension - -Install the **agent-strace** extension to see live session activity without leaving the editor. - -**Install:** -- Search `agent-strace` in the Extensions panel (VS Code, Cursor, or any Open VSX-compatible editor) -- Or install from [open-vsx.org/extension/Siddhant-K-code/agent-strace](https://open-vsx.org/extension/Siddhant-K-code/agent-strace) - -**What you get:** - -| Feature | Description | -|---|---| -| Status bar | Live cost, tool call count, and active tool name. Click to open the event stream. | -| Gutter annotations | Blue border on files the agent read, amber on files it modified. Inline label shows read/write counts. | -| Event stream panel | Live feed in the Explorer sidebar: every tool call, file op, LLM request, and error. | -| Pause button | Stops the agent mid-session via SIGSTOP. Requires `agent-strace watch` running in a terminal. | - -**Setup:** - -```bash -# 1. Install agent-strace -pip install agent-strace - -# 2. Add hooks to Claude Code (one-time) -agent-strace setup - -# 3. Open your project in VS Code / Cursor -# The extension activates automatically when .agent-traces/ exists - -# 4. Start Claude Code — the status bar item appears immediately -``` - -The extension activates automatically when a `.agent-traces/` directory exists in the workspace root. No configuration required. - -**Pause / resume** (optional, requires watch running): - -```bash -# In a separate terminal, start the watcher -agent-strace watch - -# Then use the Pause button in the event stream panel, -# or run: agent-trace: Pause Agent from the command palette -``` - ## Quick start -### Option 1: Claude Code hooks (full session capture) - -Captures everything: user prompts, assistant responses, and every tool call (Bash, Edit, Write, Read, Agent, Grep, Glob, WebFetch, WebSearch, all MCP tools). - -```bash -agent-strace setup # prints hooks config JSON -agent-strace setup --global # for all projects -``` - -Add the output to `.claude/settings.json`. Or paste it manually: - -```json -{ - "hooks": { - "UserPromptSubmit": [{ "hooks": [{ "type": "command", "command": "agent-strace hook user-prompt" }] }], - "PreToolUse": [{ "matcher": "", "hooks": [{ "type": "command", "command": "agent-strace hook pre-tool" }] }], - "PostToolUse": [{ "matcher": "", "hooks": [{ "type": "command", "command": "agent-strace hook post-tool" }] }], - "PostToolUseFailure": [{ "matcher": "", "hooks": [{ "type": "command", "command": "agent-strace hook post-tool-failure" }] }], - "Stop": [{ "hooks": [{ "type": "command", "command": "agent-strace hook stop" }] }], - "SessionStart": [{ "hooks": [{ "type": "command", "command": "agent-strace hook session-start" }] }], - "SessionEnd": [{ "hooks": [{ "type": "command", "command": "agent-strace hook session-end" }] }] - } -} -``` - -Then use Claude Code normally. +**Option 1: Claude Code hooks** — captures everything (prompts, responses, every tool call) ```bash -agent-strace list # list sessions -agent-strace replay # replay the latest -agent-strace explain # plain-English summary of what the agent did -agent-strace stats # tool call frequency and timing +agent-strace setup # prints hooks config — add to .claude/settings.json +agent-strace list # list sessions +agent-strace replay # replay the latest ``` -### Option 2: MCP proxy (any MCP client) +Full config and JSON: [docs/setup.md](docs/setup.md) -Wraps any MCP server. Works with Cursor, Windsurf, or any MCP client. +**Option 2: MCP proxy** — wraps any MCP server, works with Cursor and Windsurf ```bash agent-strace record -- npx -y @modelcontextprotocol/server-filesystem /tmp agent-strace replay ``` -### Option 3: Python decorator - -Wraps your tool functions directly. No MCP required. +**Option 3: Python decorator** — no MCP required ```python -from agent_trace import trace_tool, trace_llm_call, start_session, end_session, log_decision +from agent_trace import trace_tool, start_session, end_session -start_session(name="my-agent") # add redact=True to strip secrets +start_session(name="my-agent") @trace_tool def search_codebase(query: str) -> str: return search(query) -@trace_llm_call -def call_llm(messages: list, model: str = "claude-4") -> str: - return client.chat(messages=messages, model=model) - -# Log decision points explicitly -log_decision( - choice="read_file_first", - reason="Need to understand current implementation before making changes", - alternatives=["read_file_first", "search_codebase", "write_fix_directly"], -) +end_session() +``` -search_codebase("authenticate") -call_llm([{"role": "user", "content": "Fix the bug"}]) +Full setup guide: [docs/setup.md](docs/setup.md) -meta = end_session() -print(f"Replay with: agent-strace replay {meta.session_id}") -``` +## What you can do -## CLI commands +### Understand a session | Command | What it does | |---|---| -| `record` | Capture an MCP stdio session | -| `record-http` | Capture an MCP HTTP/SSE session | -| `replay` | Replay a session in the terminal or as HTML | -| `inspect` | Show raw events for a session | -| `stats` | Summary stats for a session | -| `eval` | Score a session against configurable criteria | -| `eval ci` | CI gate, exits non-zero if any scorer fails | -| `eval compare` | Compare two sessions side by side | -| `drift` | Detect behavioral drift across sessions | -| `optimize` | Propose AGENTS.md improvements from trace failures | -| `dashboard` | Aggregate view across sessions | -| `dashboard --trend` | Eval quality and behavioral metrics over time (HTML) | -| `export` | Export a session (JSON, CSV, OTLP, Langfuse) | -| `diff` | Semantic diff between two sessions | -| `why` | Causal chain for a tool call | -| `explain` | Plain-English session summary | -| `cost` | Estimate session cost | -| `standup` | Structured standup report from a session | -| `oncall` | On-call readiness report for agent-modified files | -| `freshness` | Context freshness check vs last session | -| `watch` | Live session monitor with kill-switch rules | -| `annotate` | Add annotations to session events | -| `audit-tools` | Shadow AI / MCP detection | -| `inflation` | Token inflation across model versions | -| `curve` | Personal cost-efficiency curve | -| `a2a-tree` | Cross-agent trace correlation (A2A protocol) | -| `mcp` | MCP server: expose traces as queryable tools for a debugging agent | -| `timeline` | Structured phase-by-phase view of a session with costs, retries, and wasted spend | -| `config-watch` | Snapshot and diff AGENTS.md and other config files; find affected sessions | - -``` -agent-strace setup [--redact] [--global] Generate Claude Code hooks config -agent-strace hook Handle a Claude Code hook event (internal) -agent-strace record -- Record an MCP stdio server session -agent-strace record-http [--port N] Record an MCP HTTP/SSE server session -agent-strace replay [session-id] Replay a session (default: latest) -agent-strace replay [session-id] --limit N Cap output at N events (fast inspection of large sessions) -agent-strace replay --format html [-o file] Export a self-contained HTML replay viewer -agent-strace replay --expand-subagents Inline subagent sessions under parent tool_call -agent-strace replay --tree Show session hierarchy without full replay -agent-strace list List all sessions -agent-strace explain [session-id] Explain a session in plain English -agent-strace stats [session-id] Show tool call frequency and timing -agent-strace stats --include-subagents Roll up stats across the full subagent tree -agent-strace inspect Dump full session as JSON -agent-strace export Export as JSON, CSV, NDJSON, or OTLP -agent-strace import Import a Claude Code JSONL session log -agent-strace cost [session-id] Estimate token cost for a session -agent-strace diff Compare two sessions structurally -agent-strace diff --compare Side-by-side table with verdict -agent-strace diff --semantic Compare sessions by outcome, not event order -agent-strace why [session-id] Trace the causal chain for an event -agent-strace audit [session-id] [--policy] Check tool calls against a policy file -agent-strace audit-tools [--repo .] [--approved] Detect Shadow MCP servers and undeclared agent activity in any repo -agent-strace policy [--output file] Generate .agent-scope.json from observed traces -agent-strace dashboard [--last N] [--html file] Aggregate stats and trends across sessions -agent-strace annotate Add notes, labels, or bookmarks to events -agent-strace token-budget Check token usage against model context limit -agent-strace replay [session-id] [--limit N] Replay a session (--limit caps events shown) -agent-strace retention status Show session count, size, and what policy would delete -agent-strace retention clean [--dry-run] Delete sessions that exceed retention limits -agent-strace sample --strategy worst --n 20 Export worst/diverse/random/recent sessions as JSONL -agent-strace export --format otlp-genai Export with OTel GenAI semantic conventions -agent-strace server [--port 4317] [--storage DIR] Start a server-side event collector -agent-strace auto [--framework NAME] -- Run a command with auto-instrumentation -agent-strace watch [--timeout DURATION] [--budget $] [--on-death CMD] [--rules file] - Watch a live session; kill/pause on rule breach -agent-strace lint [session-id] [--strict] [--all] [--since DURATION] - Analyse a session for bad behaviour patterns -agent-strace compare [--tag TAG] [--format json] - Session-to-session regression report -agent-strace budget-report [--since DATE] [--format text|markdown|json] - Weekly spend digest across sessions -agent-strace share [-o file] Export a self-contained HTML report -agent-strace standup [--session id] Standup report from session trace (no LLM) -agent-strace freshness [--scope glob] Context freshness check vs last session -agent-strace oncall --rotation-start DATE On-call readiness for agent-modified files -agent-strace curve [--export csv] Personal agent cost-efficiency curve -agent-strace inflation [--compare m1,m2] Token inflation calculator across model versions -agent-strace a2a-tree [session-id] Visualise A2A agent call graph -agent-strace timeline [session-id] [--format text|json] [--model MODEL] - Structured phase-by-phase session view with costs and retries -agent-strace config-watch snapshot [--label TEXT] [--watch PATH] - Snapshot current config file state -agent-strace config-watch check [--format text|json] [--watch PATH] - Diff current state vs last snapshot (exit 1 if changed) -agent-strace config-watch history [--format text|json] - List all snapshots -agent-strace config-watch affected [--since DURATION] [--format text|json] - Sessions that ran after a config change -``` - -### Import existing Claude Code sessions - -Already ran a session without hooks? Import it directly from Claude Code's native JSONL logs: - -```bash -# Discover available sessions -agent-strace import --discover - -# Import a specific session -agent-strace import ~/.claude/projects//.jsonl - -# Then use it like any captured session -agent-strace replay -agent-strace explain -agent-strace stats -``` - -Claude Code stores session logs in `~/.claude/projects/`. The import captures tool calls, token usage, subagent invocations, and session metadata. - -### Explain a session - -Plain-English breakdown of what the agent did, organized by phase, with retry and wasted-time detection: - -```bash -agent-strace explain # latest session -agent-strace explain abc123 # specific session -``` - -``` -Session: abc123 (2m 05s, 47 events) - -Phase 1: fix the auth module (0:00–0:05, 5 events) - Read: AGENTS.md, src/auth.py - -Phase 2: run tests — FAILED (0:05–1:20, 12 events) - Ran: python -m pytest - Ran: python -m pytest ← retry - -Phase 3: run tests (1:20–2:05, 8 events) - Ran: uv run pytest - -Files touched: 3 read, 0 written -Retries: 1 (wasted 1m 15s, 60% of session) -``` - -### Estimate cost - -Token usage and dollar cost by phase. Flags wasted spend on failed phases. - -```bash -agent-strace cost # latest session, sonnet pricing -agent-strace cost abc123 --model opus # specific session and model -agent-strace cost abc123 --input-price 3.0 --output-price 15.0 # custom pricing -``` - -``` -Session: abc123 — Estimated cost: $0.0042 -Model: sonnet | 8,200 input tokens, 3,100 output tokens - - Phase 1: fix the auth module $0.0008 (19%) ... - Phase 2: run tests — FAILED $0.0021 (50%) ... ← wasted - Phase 3: run tests $0.0013 (31%) ... - -Wasted on failed phases: $0.0021 (50%) -``` - -Supported models: `sonnet` (default), `opus`, `haiku`, `gpt4`, `gpt4o`. Token counts are estimated from payload size (`len / 4`); see [ADR-0008](ADRs/0008-token-cost-estimation-heuristic.md) for details. - -See [examples/session_analysis.md](examples/session_analysis.md) for a full walkthrough combining `import`, `explain`, and `cost`. - -### Session regression testing (compare) - -Compare two sessions structurally and get a verdict on whether agent behaviour improved or regressed. Useful when changing models, prompts, or tool implementations. - -```bash -# Compare two existing sessions -agent-strace compare - -# Compare the last 2 sessions with a given task tag -agent-strace compare --tag refactor-auth - -# Machine-readable output -agent-strace compare --format json -``` - -Example output: - -``` -Session Comparison -───────────────────────────────────────────────────────────────── - a84664242afa bf1207728ee6 change -───────────────────────────────────────────────────────────────── - Duration 18m 00s 12m 00s -33% - Total cost $4.2300 $2.8700 -32% - Tool calls 14 11 -21% - Files modified 2 2 (same) - Errors 0 0 -───────────────────────────────────────────────────────────────── -Verdict: bf1207728ee6 was 32% cheaper, 33% faster -Decision divergence: 2 point(s) -``` - -`decision divergence` is the edit distance between the two sessions' decision event sequences — no LLM call required. `--tag` compares the last N sessions whose `agent_name` or `command` contains the tag string. - -### Weekly spend digest (budget-report) - -Aggregate cost across sessions for a configurable time window. Shows total spend, top sessions, cost by tool, and savings from watchdog budget ceilings. - -```bash -# Last 7 days (default) -agent-strace budget-report - -# Custom window -agent-strace budget-report --since 2026-05-01 --until 2026-05-23 - -# Markdown output (paste into Slack or email) -agent-strace budget-report --format markdown - -# Machine-readable JSON -agent-strace budget-report --format json -``` - -Example output: - -``` -Budget Report — May 16 to May 23, 2026 - -Total spend: $47.23 (↑ 12% vs prior period) -Sessions: 34 (↑ 3 vs prior period) -Avg cost/session: $1.39 - -Top 5 most expensive sessions: - 1. a84664242afa $8.43 refactor-auth 2026-05-21 - 2. bf1207728ee6 $6.21 add-test-coverage 2026-05-22 - 3. c91ab3312fde $4.87 fix-login-bug ⚠ watchdog 2026-05-20 - -Cost by tool (estimated): - Bash $18.43 (39%) - Read $12.11 (26%) - Write $9.87 (21%) - -Sessions terminated by watchdog: 3 ($14.21 saved by budget ceiling) -``` - -Week-over-week delta is shown when prior-period data exists. The `--format markdown` output is designed to paste directly into Slack without editing. - -### Static behaviour analysis (lint) - -Analyse a session for known bad patterns — tool loops, reasoning spirals, budget proximity, context saturation, redundant reads, error-retry loops, and sessions that produced no output. - -```bash -# Lint the latest session -agent-strace lint - -# Lint a specific session -agent-strace lint - -# Lint all sessions from the last 7 days -agent-strace lint --all --since 7d - -# Machine-readable output for CI -agent-strace lint --format json - -# Exit code 1 on any WARN or ERROR (CI gate) -agent-strace lint --strict -``` - -Example output: - -``` -WARN tool-loop "Bash" called 7 times consecutively (events 34–41). Possible loop. -WARN reasoning-spiral 4 consecutive LLM calls with no tool call (events 12–15). Agent may be over-reasoning. -ERROR budget-proximity Session reached 94% of a $5.00 budget ceiling. Consider raising or splitting the task. -INFO context-saturation Input tokens exceeded 80% of model context window at event 28. -INFO redundant-read "README.md" read 3 times in this session. Consider caching. - -2 error(s), 2 warning(s), 2 info(s). Use --strict for non-zero exit on warnings. -``` - -Rules are configurable via `.agent-strace-lint.json`: - -```json -{ - "tool-loop": { "threshold": 7 }, - "reasoning-spiral": { "enabled": false } -} -``` - -| Rule | Level | Trigger | -|---|---|---| -| `tool-loop` | WARN | Same tool called 5+ times consecutively | -| `reasoning-spiral` | WARN | 3+ consecutive LLM calls with no tool call | -| `budget-proximity` | ERROR | Session cost exceeded 90% of watchdog budget ceiling | -| `context-saturation` | INFO | Input tokens exceeded 80% of model context window | -| `redundant-read` | INFO | Same file read 3+ times in a session | -| `error-retry-loop` | WARN | Same tool errored and was retried 3+ times | -| `no-output` | WARN | Session completed with no write or file-modifying tool calls | - -### Session timeline - -A structured, phase-by-phase view of what happened in a session — tool calls, file operations, LLM requests, errors, retries, and a wasted-spend callout for failed phases. - -```bash -# Timeline for the latest session -agent-strace timeline - -# Timeline for a specific session -agent-strace timeline - -# Machine-readable output -agent-strace timeline --format json - -# Use a different model for cost estimates -agent-strace timeline --model opus -``` - -Example output: +| [`agent-strace replay `](docs/commands.md#replay) | Replay a session in the terminal or as HTML | +| [`agent-strace explain `](docs/commands.md#explain) | Plain-English phase summary, no LLM required | +| [`agent-strace timeline `](docs/commands.md#timeline) | Phase-by-phase view with costs and retries | +| [`agent-strace why `](docs/commands.md#why) | Causal chain for a specific decision | +| [`agent-strace diff `](docs/commands.md#diff) | Structural or semantic session comparison | +| [`agent-strace compare `](docs/commands.md#compare) | Regression report with verdict | -``` -Session: abc123def456 | 2026-05-19 14:32 | 4m 12s | $0.0043 | 3 errors - -Phase 1: Setup (0:00–0:45) $0.0008 - ✓ Read src/auth/handler.go - ✓ Read src/auth/middleware.go - -Phase 2: Implementation (0:45–2:10) $0.0031 - · Run Bash (1.2s) - pytest tests/test_auth.py - ✗ Error: Bash - FAILED tests/test_auth.py::TestAuthHandler - · Run Bash (attempt 2) (1.1s) - pytest tests/test_auth.py - ✗ Error: Bash - FAILED tests/test_auth.py::TestAuthHandler - ✓ Write src/auth/handler.go +3 lines - ✓ Run Bash (0.9s) - - ⚠ 2 retries in this phase - -⚠ Wasted spend: 2 retries on failed phases = ~$0.0008 (19% of session cost) -``` - -| Flag | Default | Description | -|---|---|---| -| `--format` | `text` | `text` or `json` | -| `--model` | `sonnet` | Pricing model for cost estimates: `sonnet`, `opus`, `haiku`, `gpt4`, `gpt4o` | - -### Config change detector - -Track changes to AGENTS.md and other agent configuration files. Snapshot the current state before a change, then check what drifted and which sessions ran after it. - -```bash -# Record a snapshot of all watched config files -agent-strace config-watch snapshot - -# Add a label to identify the snapshot -agent-strace config-watch snapshot --label "before-prompt-refactor" - -# Check whether anything changed since the last snapshot (exit 1 if yes) -agent-strace config-watch check - -# Machine-readable diff -agent-strace config-watch check --format json - -# List all snapshots -agent-strace config-watch history - -# Find sessions that ran after a config change -agent-strace config-watch affected - -# Limit to sessions from the last 7 days -agent-strace config-watch affected --since 7d -``` - -Example output: - -``` -$ agent-strace config-watch check -CHANGED AGENTS.md (sha256: a1b2c3d4 → e5f6a7b8) -ADDED .claude/settings.json -No changes to: CLAUDE.md, system_prompt.md - -$ agent-strace config-watch affected -2 session(s) ran after a config change: - - abc123def456 2026-05-20T14:32:01 (after change to: AGENTS.md) - 789xyz012abc 2026-05-20T15:10:44 (after change to: AGENTS.md) - -Run `agent-strace drift` to compare behaviour before and after the change. -``` - -Watched files by default: `AGENTS.md`, `CLAUDE.md`, `system_prompt.md`, `system_prompt.txt`, `.cursorrules`, `.github/copilot-instructions.md`. Add extra paths with `--watch`: - -```bash -agent-strace config-watch snapshot --watch .claude/settings.json --watch custom_prompt.txt -``` - -Snapshots are stored in `.agent-traces/config-snapshots.json`. Use `check` as a CI gate — it exits 1 when config has changed since the last snapshot. - -### Data retention - -Enforce configurable retention policies to automatically delete old session data — required for GDPR, SOC 2, and internal data policies. +### Control and protect -```bash -# Check current status and what policy would delete -agent-strace retention status - -# Preview what would be deleted (no changes made) -agent-strace retention clean --dry-run - -# Delete sessions older than 30 days -agent-strace retention clean --max-age-days 30 - -# Keep only the 1000 most recent sessions -agent-strace retention clean --max-sessions 1000 - -# Delete oldest sessions when storage exceeds 500 MB -agent-strace retention clean --max-size-mb 500 -``` - -Configure via `.agent-strace.yaml`: - -```yaml -retention: - max_age_days: 30 - max_sessions: 1000 - max_size_mb: 500 - on_delete: log # log deletions to .agent-traces/retention.log -``` - -Policies are applied in order: age → count → size. Deletions are logged with session ID and timestamp (not content). - -### Trace anonymization - -Strip identifying information from traces at export time — original session data is never modified. Complements secret redaction (which strips secrets at capture time). - -```bash -# Preview what would be anonymized -agent-strace export SESSION_ID --anonymize --dry-run - -# Export with anonymization applied -agent-strace export SESSION_ID --anonymize --output trace-anon.json -``` - -Anonymized by default: -- Home directory paths → `~/relative/path` -- Hostnames → `` -- OS usernames → `` -- Email addresses → `` - -Add custom patterns via `.agent-strace/anonymize.yaml`: +| Command | What it does | +|---|---| +| [`agent-strace watch`](docs/commands.md#watch) | Live monitor with kill-switch rules | +| [`agent-strace watch --timeout 30m --budget $5`](docs/commands.md#watch) | Watchdog mode — kills on limit, writes post-mortem | +| [`agent-strace audit `](docs/commands.md#audit) | Audit tool calls against a policy file | +| [`agent-strace record --redact`](docs/commands.md#record) | Strip secrets from traces before storage | +| [`agent-strace export --anonymize`](docs/commands.md#export) | Remove PII at export time | -```yaml -rules: - - pattern: "ACME Corp" - replacement: "" - - pattern: "192\.168\.\d+\.\d+" - replacement: "" -``` +### Analyse across sessions -### Secret redaction +| Command | What it does | +|---|---| +| [`agent-strace dashboard`](docs/commands.md#dashboard) | Multi-session overview | +| [`agent-strace budget-report`](docs/commands.md#budget-report) | Weekly spend digest | +| [`agent-strace lint `](docs/commands.md#lint) | Flag bad behaviour patterns (loops, spirals, waste) | +| [`agent-strace drift`](docs/commands.md#drift) | Detect behavioural drift over time | +| [`agent-strace standup`](docs/commands.md#standup) | Plain-English summary of yesterday's sessions | +| [`agent-strace eval `](docs/commands.md#eval) | Score a session against behavioural baselines | +| [`agent-strace eval ci`](docs/commands.md#eval) | Fail CI on behavioural regression | -Strip API keys, tokens, and credentials from traces before they hit disk. +### Export and integrate -```bash -# Stdio proxy with redaction -agent-strace record --redact -- npx -y @modelcontextprotocol/server-filesystem /tmp +| Command | What it does | +|---|---| +| [`agent-strace export --format otlp-genai`](docs/production.md) | Export to Datadog, Honeycomb, Grafana, Jaeger | +| [`agent-strace server`](docs/server.md) | Server-side collector for multi-agent, multi-machine | +| [`agent-strace share `](docs/commands.md#share) | Generate a shareable HTML replay | +| [`agent-strace sample`](docs/commands.md#sample) | Export worst sessions as JSONL for eval datasets | -# HTTP proxy with redaction -agent-strace record-http https://mcp.example.com --redact -``` +Full flag reference: [docs/commands.md](docs/commands.md) -Detected patterns: OpenAI (`sk-*`), GitHub (`ghp_*`, `github_pat_*`), AWS (`AKIA*`), Anthropic (`sk-ant-*`), Slack (`xox*`), JWTs, Bearer tokens, connection strings (`postgres://`, `mysql://`), and any value under keys like `password`, `secret`, `token`, `api_key`, `authorization`. +## VS Code extension -### HTTP/SSE proxy +Install **agent-strace** from the [Extensions panel](https://open-vsx.org/extension/Siddhant-K-code/agent-strace) to see live session activity without leaving the editor. -For MCP servers that use HTTP transport: +| Feature | Description | +|---|---| +| Status bar | Live cost, tool call count, and active tool name. Click to open the event stream. | +| Gutter annotations | Blue border on files the agent read, amber on files it modified. | +| Event stream panel | Live feed: every tool call, file op, LLM request, and error. | +| Pause button | Stops the agent mid-session via SIGSTOP. | ```bash -# Proxy a remote MCP server -agent-strace record-http https://mcp.example.com --port 3100 - -# Your agent connects to http://127.0.0.1:3100 instead of the remote server -# All JSON-RPC messages are captured, tool call latency is measured -``` - -The proxy forwards POST `/message` and GET `/sse` to the remote server, capturing every JSON-RPC message in both directions. - -### Replay output - -A real Claude Code session captured with hooks: - -
Session Summary -

- -``` -Session Summary -────────────────────────────────────────────────── - Session: 201da364-edd6-49 - Command: claude-code (startup) - Agent: claude-code - Duration: 112.54s - Tool calls: 8 - Errors: 3 -────────────────────────────────────────────────── - -+ 0.00s ▶ session_start -+ 0.07s 👤 user_prompt - "how many tests does this project have? run them and tell me the results" -+ 3.55s → tool_call Glob - **/*.test.* -+ 3.55s → tool_call Glob - **/test_*.* -+ 3.60s ← tool_result Glob (51ms) -+ 6.06s → tool_call Bash - $ python -m pytest tests/ -v 2>&1 -+ 27.65s ✗ error Bash - Command failed with exit code 1 -+ 29.89s → tool_call Bash - $ python3 -m pytest tests/ -v 2>&1 -+ 40.56s ✗ error Bash - No module named pytest -+ 45.96s → tool_call Bash - $ which pytest || ls /Users/siddhant/Desktop/test-agent-trace/ 2>&1 -+ 46.01s ← tool_result Bash (51ms) -+ 48.18s → tool_call Read - /Users/siddhant/Desktop/test-agent-trace/pyproject.toml -+ 48.23s ← tool_result Read (43ms) -+ 51.43s → tool_call Bash - $ uv run --with pytest pytest tests/ -v 2>&1 -+1m43.67s ← tool_result Bash (5.88s) - 75 tests, all passing in 3.60s -+1m52.54s 🤖 assistant_response - "75 tests, all passing in 3.60s. Breakdown by file: ..." +pip install agent-strace # 1. install +agent-strace setup # 2. add hooks to Claude Code +# 3. open project in VS Code — extension activates when .agent-traces/ exists +# 4. start Claude Code — status bar appears immediately ``` -Tool calls show actual values: commands, file paths, glob patterns. Errors show what failed. Assistant responses are stripped of markdown. - -

-
+Full docs: [docs/vscode.md](docs/vscode.md) -### Filtering +## Production -```bash -# Show only tool calls and errors -agent-strace replay --filter tool_call,error - -# Replay with timing (watch it unfold) -agent-strace replay --live --speed 2 -``` - -### Export +**OTLP export** — sessions become traces, tool calls become spans: ```bash -# JSON array -agent-strace export a84664 --format json - -# CSV (for spreadsheets) -agent-strace export a84664 --format csv - -# NDJSON (for streaming pipelines) -agent-strace export a84664 --format ndjson -``` - -## Trace format - -Traces are stored as directories in `.agent-traces/`: - -``` -.agent-traces/ - a84664242afa4516/ - meta.json # session metadata - events.ndjson # newline-delimited JSON events -``` - -Each event is a single JSON line: - -```json -{ - "event_type": "tool_call", - "timestamp": 1773562735.09, - "event_id": "bf1207728ee6", - "session_id": "a84664242afa4516", - "data": { - "tool_name": "read_file", - "arguments": {"path": "src/auth.py"} - } -} +agent-strace export --format otlp-genai \ + --endpoint http://localhost:4318 ``` -### Event types +Per-backend setup (Datadog, Honeycomb, Grafana, New Relic, Splunk, Langfuse): [docs/production.md](docs/production.md) -| Type | Description | -|------|-------------| -| `session_start` | Trace session began | -| `session_end` | Trace session ended | -| `user_prompt` | User submitted a prompt to the agent | -| `assistant_response` | Agent produced a text response | -| `tool_call` | Agent invoked a tool | -| `tool_result` | Tool returned a result | -| `llm_request` | Agent sent a prompt to an LLM | -| `llm_response` | LLM returned a completion | -| `file_read` | Agent read a file | -| `file_write` | Agent wrote a file | -| `decision` | Agent chose between alternatives | -| `error` | Something failed | - -Events link to each other. A `tool_result` has a `parent_id` pointing to its `tool_call`. This lets you measure latency per tool and trace the full call chain. - -## Use with Claude Code, Cursor, Windsurf - -### Claude Code (hooks, recommended) - -Captures the full session: prompts, responses, and every tool call. See [examples/claude_code_config.md](examples/claude_code_config.md) for the full config. +**Server-side collector** — for containers, CI, and multi-machine setups: ```bash -agent-strace setup # per-project config -agent-strace setup --redact --global # all projects, with secret redaction -``` - -### Cursor - -Edit `~/.cursor/mcp.json` (global) or `.cursor/mcp.json` (per-project): - -```json -{ - "mcpServers": { - "filesystem": { - "command": "agent-strace", - "args": ["record", "--name", "filesystem", "--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"] - } - } -} -``` - -### Windsurf - -Edit `~/.codeium/windsurf/mcp_config.json`: - -```json -{ - "mcpServers": { - "filesystem": { - "command": "agent-strace", - "args": ["record", "--name", "filesystem", "--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"] - } - } -} +agent-strace server --port 4317 --storage ./traces +AGENT_STRACE_ENDPOINT=http://collector:4317 python my_agent.py ``` -### Any MCP client - -The pattern is the same for any tool that uses MCP over stdio: - -1. Replace the server `command` with `agent-strace` -2. Prepend `record --name