Add diff and why subcommands — session comparison and causal chain tracing by Siddhant-K-code · Pull Request #12 · Siddhant-K-code/agent-trace

Siddhant-K-code · 2026-03-25T06:16:13Z

Closes #4

What

Two new subcommands for understanding why a session behaved the way it did and how it differed from another session.

Changes

diff.py (new):

diff_sessions(store, session_a, session_b) — aligns phases from both sessions via LCS (longest common subsequence) on phase labels, then reports per-phase differences: files only in A, files only in B, commands only in A/B, failed vs passed outcomes
format_diff() — shows divergence point and per-phase delta

why.py (new):

build_causal_chain(events, target_index) — walks backwards from a target event using: parent_id links (tool_result → tool_call), error→retry detection, path-reference matching (tool_result text containing a path referenced by a later tool_call), and user_prompt as the root cause terminator
format_why() — renders the chain root → target with reasons at each link

CLI additions:

agent-strace diff <session-a> <session-b>
agent-strace why [session-id] <event-number> (1-based, matches replay output)

Example output

Comparing: abc123 vs def456

Diverged at phase 2:

  Phase 2: run tests
    abc123 only:  $ python -m pytest
    def456 only:  $ uv run pytest

  abc123: 4m 12s, 47 events, 8 tools, 2 retries
  def456: 2m 05s, 31 events, 5 tools, 0 retries

Why did event #4 happen?

  #  4  tool_call: Bash  $ pytest tests/

Causal chain (root → target):

    #  1  user_prompt: "run the test suite"
       (prompt at #1 triggered this)
  ←  #  3  error: exit 1
       (retry after error at #3)
  ←  #  4  tool_call: Bash  $ pytest tests/

Tests

20 new tests (192 total passing)

- diff <session-a> <session-b>: structural comparison of two sessions via LCS-based phase alignment; reports divergence point, per-phase file/command differences, and failed vs passed outcomes - why [session-id] <event-number>: traces the causal chain backwards from a target event using parent_id links, error→retry detection, path-reference matching, and user_prompt as root cause - 20 new tests (192 total passing) Co-authored-by: Ona <no-reply@ona.com>

… fix import order Co-authored-by: Ona <no-reply@ona.com>

Siddhant-K-code · 2026-03-25T06:28:42Z

Review: diff and why subcommands

Two dead variables found; both fixed upstream in 2ee71c8 before this review landed.

Bug 1 — `_event_paths`: dead `result` variable (fixed in `2ee71c8`)

result = data.get("result", "") or data.get("content_preview", "")
return paths  # result never added to paths

The variable was computed but never used. Removed.

Bug 2 — `build_causal_chain`: dead `target_tool` variable (fixed in `2ee71c8`)

target_tool = event.data.get("tool_name", "").lower()
# target_tool never referenced again in the function

Also removed.

Design note — `_lcs_indices` O(m×n) space

The DP table allocates (m+1) × (n+1) cells. For typical session phase counts (< 50) this is negligible. Worth a comment if phase counts are expected to grow significantly.

Design note — causal chain completeness

_walk terminates silently when idx is already in visited. The fallback _walk(0, "session start") at the end of the backwards scan can hit this if index 0 was already visited, producing an incomplete chain with no indication. This is acceptable for the current use case but worth documenting.

No remaining issues. ✅ Ready to merge.

Siddhant-K-code · 2026-03-28T15:52:09Z

src/agent_trace/why.py

+        for prev_idx in range(idx - 1, -1, -1):
+            prev = events[prev_idx]
+
+            # Error → next tool_call is a retry


False positive: error→retry rule fires on any ERROR before any TOOL_CALL, not just adjacent ones.

The backwards scan hits the first ERROR it finds and immediately attributes the target TOOL_CALL to it, regardless of how many unrelated events are between them. A session like:

#1 user_prompt: "run tests" #2 tool_call: Bash $ pytest ← fails #3 error: exit 1 #4 tool_call: Bash $ git status ← unrelated, but scan hits #3 first #5 tool_call: Bash $ pytest ← actual retry

why #5 correctly links to #3. But why #4 also links to #3 even though git status is not a retry — it's the next command after the error. The rule should only fire if the error is the immediately preceding event (or at most separated by a tool_result).

Siddhant-K-code · 2026-03-28T15:52:10Z

src/agent_trace/why.py

+                    return
+
+            # file_read → file_write of same file
+            if (prev.event_type in (EventType.TOOL_CALL, EventType.FILE_READ)


Overly broad match: any TOOL_CALL with a shared path is attributed to any prior FILE_READ.

The condition event.event_type in (EventType.TOOL_CALL, EventType.FILE_WRITE) means a TOOL_CALL (e.g. Bash $ pytest) that happens to have no paths will never match here — fine. But a TOOL_CALL like Write src/foo.py will match any prior TOOL_CALL that also references src/foo.py, even if that prior call was a Bash command that merely mentioned the path in its output. The intent is read→write causality; tightening the target condition to EventType.FILE_WRITE (or tool_name in ("write", "edit")) would reduce false positives.

Siddhant-K-code · 2026-03-28T15:52:10Z

src/agent_trace/diff.py

+    session_a: str,
+    session_b: str,
+) -> SessionDiff:
+    result_a = explain_session(store, session_a)


explain_session is called twice (once per session) and each call calls store.load_events internally. For large sessions this is fine, but store.load_meta is then called again on lines 169–170 to get tool_calls_a/b. The meta is already available inside ExplainResult (via explain_session → store.load_meta). Consider exposing it on ExplainResult to avoid the extra store reads, or at minimum document why the double load is acceptable.

Siddhant-K-code · 2026-03-28T15:52:26Z

Review: PR #12 — diff + why

Both modules are well-structured. The LCS alignment for diff is the right algorithm choice — it handles insertions/deletions gracefully and the implementation is correct. The why causal chain walker is a good heuristic approach with proper cycle protection via visited.

Two bugs in why.py (inline comments on lines 108 and 122):

The error→retry rule fires on any preceding ERROR, not just adjacent ones — produces false positives for unrelated tool calls after an error
The read→write path match is too broad — any TOOL_CALL with a shared path qualifies as a "write", not just actual write operations

One nit in diff.py (line 98): double store.load_meta call — minor but worth noting.

Test gaps:

test_error_causes_retry only tests the happy path (error immediately before retry). No test for the false-positive case described above — a tool call after an error that is not a retry
No test for diff_sessions where one session has more phases than the other (unaligned phases)
No test for why on a session where the target event has parent_id pointing to a non-existent event ID (should fall through to heuristic scan, not crash)
_lcs_indices is tested well — 5 cases covering empty, identical, no-common, partial, and insertion

What's good:

_lcs_indices uses bottom-up DP with correct backtracking — O(mn) time, O(mn) space, no recursion limit issues
_walk uses a visited set so cycles in parent_id links (malformed data) can't cause infinite recursion
format_why handles the empty chain case explicitly — no crash on chain.links[-1] when links is empty
cmd_why validates event number bounds before calling build_causal_chain
build_phases is reused from explain.py rather than reimplemented — good

why.py: - error→retry: only fire when no substantive event sits between the error and the tool_call (tool_result events are allowed as separators). Prevents unrelated tool calls after an error from being misattributed as retries. - read→write: restrict the path-match rule to actual write operations (FILE_WRITE event type, or tool_name in write/edit/create). Previously any TOOL_CALL sharing a path with a prior read would match, including Bash commands that merely mentioned the path. tests: - error→retry false positive: unrelated call between error and retry - write tool_call correctly linked to prior read of same path - Bash call with shared path not linked via read→write rule - dangling parent_id falls through to heuristic without crashing - diff_sessions with unaligned phase counts Co-authored-by: Ona <no-reply@ona.com>

The previous adjacency check was insufficient — git status immediately after a failed pytest is adjacent to the error but is not a retry. Now we look up the tool_call that caused the error and only attribute the retry if the target tool_call uses the same tool name. Co-authored-by: Ona <no-reply@ona.com>

Tool name alone is insufficient — git status and pytest are both Bash calls. For Bash tool calls, require the command string to match the failing call. For other tools, tool name match is sufficient. Co-authored-by: Ona <no-reply@ona.com>

Co-authored-by: Ona <no-reply@ona.com>

Siddhant-K-code and others added 2 commits March 25, 2026 06:12

Remove dead target_tool variable, drop unused result in _event_paths,…

2ee71c8

… fix import order Co-authored-by: Ona <no-reply@ona.com>

Siddhant-K-code marked this pull request as ready for review March 28, 2026 15:49

Siddhant-K-code commented Mar 28, 2026

View reviewed changes

Siddhant-K-code and others added 4 commits March 28, 2026 16:05

Resolve merge conflict: combine subagent and diff/why imports in cli.py

d762721

Co-authored-by: Ona <no-reply@ona.com>

Siddhant-K-code merged commit a724556 into main Mar 28, 2026
4 checks passed

Siddhant-K-code deleted the feature/diff-why branch March 28, 2026 16:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add diff and why subcommands — session comparison and causal chain tracing#12

Add diff and why subcommands — session comparison and causal chain tracing#12
Siddhant-K-code merged 6 commits intomainfrom
feature/diff-why

Siddhant-K-code commented Mar 25, 2026

Uh oh!

Siddhant-K-code commented Mar 25, 2026

Uh oh!

Siddhant-K-code Mar 28, 2026

Uh oh!

Siddhant-K-code Mar 28, 2026

Uh oh!

Siddhant-K-code Mar 28, 2026

Uh oh!

Siddhant-K-code commented Mar 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Siddhant-K-code commented Mar 25, 2026

What

Changes

Example output

Tests

Uh oh!

Siddhant-K-code commented Mar 25, 2026

Review: diff and why subcommands

Bug 1 — _event_paths: dead result variable (fixed in 2ee71c8)

Bug 2 — build_causal_chain: dead target_tool variable (fixed in 2ee71c8)

Design note — _lcs_indices O(m×n) space

Design note — causal chain completeness

Uh oh!

Siddhant-K-code Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Siddhant-K-code Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Siddhant-K-code Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Siddhant-K-code commented Mar 28, 2026

Review: PR #12 — diff + why

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bug 1 — `_event_paths`: dead `result` variable (fixed in `2ee71c8`)

Bug 2 — `build_causal_chain`: dead `target_tool` variable (fixed in `2ee71c8`)

Design note — `_lcs_indices` O(m×n) space