Workflow run visibility: live agent tree, script view, per-agent transcripts by t3dotgg · Pull Request #3650 · pingdotgg/t3code

t3dotgg · 2026-07-02T11:58:30Z

I had Fable try to implement better UI/UX for Claude Code's Workflows. Not sure if this will be trumped by Julius's work on orchestration but I wanted to give it a go.

The rest of this description is generated by Claude

AI generated from here down

Surfaces Claude Agent SDK workflow runs (the Workflow orchestration tool — phases, subagents, narration) end to end in the desktop app.

What you get

Inline workflow card in the chat timeline: name, status, phase-grouped agent rows with live per-agent status (queued/running/done/error), token + duration rollup, stop button, and a details affordance. Remote (remote: true) runs render a link to their cloud session instead.
Workflow right-panel surface with three tabs:
- Run — full phase/agent tree; agent rows expand into their live transcript (cursor-paged, polled every 2s while running)
- Script — the persisted workflow script, syntax-highlighted
- Logs — log() narration plus per-agent results from the run journal
Stop (via a new thread.task.stop orchestration command → query.stopTask) and a copy-resume-command affordance (Workflow({ scriptPath, resumeFromRunId })) for terminal runs.

How it works

The SDK streams a cumulative snapshot on every task_progress message via the undocumented workflow_progress field (deliberate wire surface — the CLI's own /workflows view renders it — but absent from the published .d.ts, verified against the 0.3.170 bundled runtime we ship). The read is confined to one cast helper in ClaudeAdapter which normalizes defensively: malformed entries dropped, previews clipped, entry counts capped, unknown agent states rendered as "running". An SDK change degrades to less detail, never a crash.

Because every activity is persisted (SQLite orchestration_events + projections) and re-shipped on reconnect, workflow snapshots are upserted under a stable per-(thread, task) activity id instead of appended per tick — reconnect payloads stay one row per run. Run handles (script path, transcript dir, run id) arrive once via a new task.workflowMeta runtime event emitted from the Workflow tool result.

Disk artifacts (script, journal.jsonl, agent-<id>.jsonl) are served by a new WorkflowInspectionService with realpath containment under ~/.claude/projects (including re-containment of joined leaf files against symlink escapes), fixed leaf filenames, an agent-id allowlist pattern, and size caps — the RPCs cannot be used as an arbitrary-file-read oracle.

Validation

vp run typecheck, vp check, and the full vp run -r test suite pass (1428 tests; one pre-existing flake in CursorAdapter.test.ts passes in isolation).
43 new tests: contracts decode, adapter normalization + tool round-trip, ingestion upsert semantics, reactor stop path, decider, inspection-service security cases (incl. leaf-symlink escapes), workflow-logic derivation, session-logic suppression.
Reviewed by a multi-agent pass (3 Claude lenses + an independent gpt-5.5 review, findings adversarially verified); the 7 confirmed findings are fixed in the second commit.

Known v1 tradeoffs

Transcript reads load the whole file per page server-side (capped response); fine at current transcript sizes, revisit with a seek-based reader if needed.
Per-tick snapshot activities still append to the (already append-only) orchestration_events log; only projections/reconnect are deduped. The SDK's own progress throttling bounds the rate.
No live elapsed-time ticker on agent rows; durations come from the usage rollup.

🤖 Generated with Claude Code

Note

Medium Risk
Touches orchestration commands, provider session control, and new path-based RPCs with containment rules; scope is large but bounded by tests and defensive normalization.

Overview
Adds end-to-end visibility for Claude Agent SDK workflow runs in the web app (and quieter mobile work logs), plus a way to stop a background workflow task.

Server / provider: The Claude adapter now reads undocumented workflow_progress on task_progress, normalizes it, and forwards workflow metadata on task.started and from Workflow tool results as task.workflowMeta. Runtime ingestion emits a per-tick task.progress row and upserts stable task.workflow-updated / task.workflow-meta activities per (threadId, taskId). A new thread.task.stop command flows through the decider and ProviderCommandReactor to ProviderService.stopTask → Claude query.stopTask, with failure activities when no session is bound. WorkflowInspectionService serves script, journal, and agent transcript reads under a contained projects root via new WebSocket RPCs.

Web UI: workflow-logic derives WorkflowRun view models; the chat timeline shows workflow cards and suppresses duplicate workflow noise in the work log. A Workflow right-panel tab (WorkflowPanel) exposes Run / Script / Logs, expandable agent transcripts, stop, and resume-command copy.

Mobile: Work-log derivation skips workflow snapshot/meta and workflow-owned progress rows (no workflow card yet).

^{Reviewed by Cursor Bugbot for commit 0f16c5e. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Add live workflow run visibility with agent tree, script view, and per-agent transcripts

Introduces a workflow-logic module that derives structured WorkflowRun models (phases, agents, logs, usage, handles, status) from raw thread activities, and surfaces them as timeline entries in the chat view
Adds a WorkflowPanel right-panel component with Run/Script/Logs tabs, and a WorkflowRunCard inline timeline card with prioritized agent rows, rollup stats, and status chips
Adds WebSocket RPCs workflow.readScript, workflow.readJournal, and workflow.readAgentTranscript backed by a new WorkflowInspectionService that validates paths are contained within the projects root before reading
Emits a new task.workflowMeta provider runtime event when a Workflow tool result returns run handles, and attaches normalized workflowProgress snapshots to task.progress events with string clipping and entry capping
Adds a thread.task.stop client command and thread.task-stop-requested domain event, wired through the decider, ProviderCommandReactor, and adapter stopTask method to stop a background workflow task
Workflow snapshot (task.workflow-updated) and meta (task.workflow-meta) activities are suppressed from work log entries on both web and mobile to reduce noise

^{Macroscope summarized 0f16c5e.}

Surface Claude Agent SDK workflow runs (the Workflow orchestration tool) end to end in the desktop app: - Contracts: WorkflowProgressEntry schemas, run handles, workflow inspection RPCs, thread.task.stop command + task-stop-requested event - ClaudeAdapter: normalize the SDK's undocumented workflow_progress snapshot (size-capped, tolerant), forward workflow identity on task events, emit task.workflowMeta from Workflow tool results, stopTask - Ingestion: upsert workflow snapshots under a stable per-task activity id so projections and reconnect payloads stay one row per run - WorkflowInspectionService: path-validated, size-capped reads of the run's script, journal, and per-agent transcripts (realpath containment under ~/.claude/projects) - Web: WorkflowRunCard inline in the chat timeline, a workflow right- panel surface with Run/Script/Logs tabs, cursor-paged transcript polling, stop + resume affordances; remote runs link to their cloud session Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

- WorkflowInspectionService: re-contain joined leaf files so a symlink named journal.jsonl / agent-<id>.jsonl cannot escape the projects root (readScript already did this); add regression tests for both - Stop serializing raw fs error causes in WorkflowInspectionError - ProviderCommandReactor: surface stopTask failures as a provider.task.stop.failed activity instead of a silent log warning - ProviderService.stopTask: don't resurrect a stopped session via recovery just to stop a task - Ingestion: scope stable workflow activity ids by threadId so SDK task id reuse across threads cannot collide in the projection table - WorkflowPanel: keep polling transcripts past prior EOF ("complete" means caught-up, not finished); reuse useCopyToClipboard for the resume-command button - workflow-logic: terminalize runs left "running" when the provider session is gone; add a revision counter so timeline rows re-render on content changes even when timestamps collide Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

coderabbitai · 2026-07-02T11:58:51Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a40b1552-71a6-4a3a-b6c4-4dbe39c051c4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/workflow-run-visibility

_{Comment @coderabbitai help to get the list of available commands.}

- workflow-logic: settle in-flight agents to "error" when terminalizing a run whose session died, so a stopped chip never sits above pulsing "running" agents; drop the dead hasStartedActivity field (snapshot-only runs render intentionally after history trims) - ChatView: gate workflow liveness on derivePhase so interrupted sessions also terminalize runs, matching disconnected-session UX - WorkflowPanel: fetch the transcript once more when a run leaves "running" so lines appended after the last poll tick are not lost - mobile: suppress workflow-owned task.progress/task.completed rows in the work log, mirroring desktop (mobile has no workflow card yet) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

t3dotgg · 2026-07-02T12:07:45Z

Addressed all five cursor bot findings in 1610d39:

Mobile workflow work log noise — mobile's deriveWorkLogEntries now suppresses workflow-owned task.progress/task.completed rows (mirrors desktop's collectWorkflowTaskIds gate; mobile has no workflow card yet, so all workflow rows are hidden rather than rendered as per-tick noise).
Stopped chip, running agents — terminalizing a run now also settles its in-flight agents to error ("Interrupted before completion") and recomputes the rollup, so a stopped chip never sits above pulsing agents. Transcript polling stopping there is intentional: the session owning the writer is gone.
Interrupted session still active — the liveness gate now uses derivePhase(...) !== "disconnected", which covers interrupted along with stopped/error.
Transcript polling stops at completion — the polling effect now issues one final fetch when the run leaves running (and when a row opens on an already-terminal run), so lines appended after the last poll tick are captured.
Phantom runs without start activity — the dead hasStartedActivity field is removed; rendering runs derived only from snapshot/meta activities is intentional resilience (e.g. a checkpoint revert can trim the task.started row while the snapshot survives), now documented at the derivation site.

Two new unit tests cover the terminalization behavior. Typecheck, lint, and the affected suites pass.

macroscopeapp · 2026-07-02T12:10:58Z

Approvability

Verdict: Needs human review

This PR introduces a significant new feature for workflow run visibility including new RPC endpoints, a file-reading inspection service with security-sensitive path handling, new orchestration commands, and multiple new UI components. The scope and nature of these changes warrant human review.

^{You can customize Macroscope's approvability policy. Learn more.}

- ClaudeAdapter: dedupe workflow agents/phases by index (last write wins) before applying entry caps, so repeated slot updates cannot exhaust the cap and freeze later agents stale; restrict sessionUrl to http(s) so a hostile tool result cannot smuggle a javascript: href - workflowUi: guard sessionUrl scheme again at both anchor render sites (defense in depth for payloads persisted before the adapter filter) - Ingestion: JSON-encode the (threadId, taskId) tuple in stable workflow activity ids to remove delimiter-ambiguity collisions - workflow-logic: order same-timestamp activities by provider sequence then lifecycle rank so task.completed can never apply before its task.started - WorkflowPanel/Card: scope phase and agent row keys by taskId so switching runs remounts rows instead of leaking expanded transcript state and cursors across runs - workflowUi: phase progress counter counts settled (done + error) agents instead of reporting 0/1 for an errored terminal phase - client-runtime: readJournal query polls every 4s while mounted so the Logs tab picks up new results during a live run - mobile: keep workflow task.completed rows (only per-tick progress is suppressed) — with no workflow card on mobile it is the only signal a workflow finished or failed Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

t3dotgg · 2026-07-02T12:15:13Z

All seven macroscope findings plus cursor's follow-up are addressed in 0eaeb94:

XSS via sessionUrl (High) — the adapter now only forwards http(s) URLs from tool results, and both anchor render sites re-validate the scheme (safeWorkflowSessionUrl) as defense in depth for payloads persisted before the filter. I kept the contract schema permissive on purpose: activity payloads aren't schema-validated on the read path, so enforcement lives at the producer and consumer boundaries where it actually executes.
Cap counts duplicates — normalizeWorkflowProgress now dedupes agents and phases by index (last write wins) before enforcing caps, so repeated slot updates can't exhaust the cap and freeze later agents stale.
Stable-id delimiter ambiguity — workflow activity ids now JSON-encode the (threadId, taskId) tuple, eliminating :-collision between distinct pairs.
Row state leaking across runs — phase/agent row keys in both the panel and the card are scoped by taskId, so switching runs remounts rows instead of carrying over expanded transcripts and cursors.
Phase counter vs errored agents — the x/y counter now counts settled (done + error) agents.
Frozen journal view — readJournal polls every 4s while mounted (it only mounts while the Logs tab is open, so the poll is bounded).
Same-millisecond ordering — deriveWorkflowRuns now orders by provider sequence when present, then lifecycle rank (started < updated/meta < completed), so a completion can never be applied before the start that creates its run.
Mobile workflow completion (cursor) — mobile keeps task.completed rows for workflow tasks (only per-tick task.progress is suppressed), so the finished/failed/stopped signal survives on a surface with no workflow card.

Typecheck, lint, and the affected suites (106 server + 1298 web tests) pass.

macroscopeapp

One Effect service convention issue: the new WorkflowInspectionError drops the underlying failure instead of preserving it as cause. See inline comments.

^{Posted via Macroscope — Effect Service Conventions}

Real filesystem failures (read-failed / not-found wraps) forward their underlying error as an optional cause, matching the GitCommandError convention; pure validation reasons stay cause-less. The paths a cause can carry are ones the client supplied in the request, so this does not reintroduce the earlier information-exposure concern. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

t3dotgg · 2026-07-02T12:17:35Z

Addressed the error-convention findings in the latest commit: WorkflowInspectionError carries an optional cause again, and all five real-failure wrap sites (realpath resolution + the three file reads) forward the underlying error, matching the GitCommandError convention. Pure validation failures (invalid-path) intentionally stay cause-less.

For the record on the apparent tension with the earlier information-exposure comment: the causes here can only reference paths the client itself supplied in the request, so preserving the chain doesn't leak anything the caller doesn't already know — and it keeps parity with how the other inspection-style errors in the codebase behave.

t3dotgg · 2026-07-02T12:20:10Z

The latest macroscope pass re-posted the six findings that were already fixed in 0eaeb94 — the comments quote pre-fix code (the :-joined activity id, the missing refreshIntervalMs, the done-only phase counter, etc.), all of which are visibly changed on the current head cb5a39a68. Resolved the duplicate threads; happy to revisit any of them if the next pass still flags the current code.

A task.completed for a known workflow task now creates the run when its task.started has not been applied yet — adopted runs can carry reset provider sequences across CLI restarts, letting the completion sort first. The later-applied started only fills metadata and can never resurrect a terminal status, so derivation correctness no longer depends on the comparator at all. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

t3dotgg · 2026-07-02T12:29:06Z

Good catch on the sequence-inversion path — rather than special-casing the comparator, the derivation is now order-robust: a task.completed for a known workflow task creates the run itself when its task.started hasn't been applied yet (the adopted-run scenario where provider sequences reset across CLI restarts), and a later-applied started only fills metadata, never resurrecting a terminal status. Covered by a new test with an inverted-sequence fixture.

DavidIlie · 2026-07-02T16:03:33Z

pls merge

- Forward the SDK snapshot's per-agent tokens / toolCalls / durationMs through the contract, adapter normalization, and view model; agent rows now show model plus "94.2k tok · 47 tools · 7m 03s" (duration once the agent settles), mirroring the Claude Code TUI - Rewrite the transcript renderer: only assistant text and tool calls render (tool rows as "→ Name input-preview"); user turns, tool results, attachments, and thinking are skipped instead of printing raw type names ("user attachment attachment") - Bound transcript memory: retain a 600-line tail with an "earlier activity trimmed" notice so million-token agent threads cannot grow client memory unbounded Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The SDK emits tokens/toolCalls on every progress tick (verified against the bundled runtime), so agent rows now show them live; elapsed time for a running agent derives from lastProgressAt - startedAt, which advances tick-by-tick without a client timer. Settled agents keep the reported total duration. Agent rows no longer render the routine inline preview text (result and last-tool snippets) — the expandable transcript owns that detail. Error text stays inline since it explains a red row at a glance. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit bc9ab83. Configure here.}

Long agent labels were shrink-0 and pushed the model/stats readout off the clipped right edge. Rows now stack: the label owns the first line and wraps freely, and model · stats · badges · error text sit on a muted meta line under it — nothing competes for horizontal space, so nothing clips regardless of label length. The chevron and status dot center against the first text line via fixed line-height boxes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

- WorkflowPanel: track a render-visible caught-up flag so the transcript shows "Loading transcript…" until the first read reaches EOF — an absence of parsed entries mid-drain means still paging, not that the agent produced no output - workflowUi: error rows fall back to resultPreview when the snapshot carries no error field, so red rows always explain the failure inline Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The test asserted the session/cancel request against the log snapshot returned by the *second* waitForJsonLogMatch, but only the first wait guaranteed that line — the two log appends are independent, so under CI load the second snapshot can miss (or tear) the cancel line and the assertion fails spuriously. Each assertion now checks the snapshot its own wait resolved on. This flake predates the branch (zero Cursor files are touched here) but kept failing this PR's Test job. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

t3dotgg · 2026-07-02T20:18:26Z

The Test job failure was the pre-existing CursorAdapter.test.ts flake (this branch touches zero Cursor files): the test asserted session/cancel presence against the log snapshot from the second waitForJsonLogMatch, while only the first wait guaranteed that line — under CI load the second snapshot can miss or tear it. Deflaked in the latest commit by asserting each condition against the snapshot its own wait resolved on. 5/5 isolated runs and a full vp run -r test pass locally; CI is re-running on the push.

t3dotgg and others added 2 commits July 2, 2026 03:26

github-actions Bot added the vouch:trusted PR author is trusted by repo permissions or the VOUCHED list. label Jul 2, 2026

github-actions Bot added the size:XXL 1,000+ changed lines (additions + deletions). label Jul 2, 2026

cursor Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread apps/mobile/src/lib/threadActivity.ts

Comment thread apps/web/src/workflow-logic.ts

Comment thread apps/web/src/components/ChatView.tsx Outdated

Comment thread apps/web/src/components/workflow/WorkflowPanel.tsx

Comment thread apps/web/src/workflow-logic.ts

macroscopeapp Bot reviewed Jul 2, 2026

View reviewed changes

cursor Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread apps/mobile/src/lib/threadActivity.ts

macroscopeapp Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread apps/server/src/workflow/WorkflowInspectionService.ts

Comment thread packages/contracts/src/workflow.ts

Uh oh!

Conversation

t3dotgg commented Jul 2, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI generated from here down

What you get

How it works

Validation

Known v1 tradeoffs

Add live workflow run visibility with agent tree, script view, and per-agent transcripts

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

t3dotgg commented Jul 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

macroscopeapp Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

Uh oh!

t3dotgg commented Jul 2, 2026

Uh oh!

macroscopeapp Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

t3dotgg commented Jul 2, 2026

Uh oh!

t3dotgg commented Jul 2, 2026

Uh oh!

Uh oh!

t3dotgg commented Jul 2, 2026

Uh oh!

DavidIlie commented Jul 2, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

t3dotgg commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

t3dotgg commented Jul 2, 2026 •

edited by macroscopeapp Bot

Loading

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

macroscopeapp Bot commented Jul 2, 2026 •

edited

Loading