Skip to content

Workflow run visibility: live agent tree, script view, per-agent transcripts#3650

Open
t3dotgg wants to merge 11 commits into
mainfrom
feat/workflow-run-visibility
Open

Workflow run visibility: live agent tree, script view, per-agent transcripts#3650
t3dotgg wants to merge 11 commits into
mainfrom
feat/workflow-run-visibility

Conversation

@t3dotgg

@t3dotgg t3dotgg commented Jul 2, 2026

Copy link
Copy Markdown
Member

I had Fable try to implement better UI/UX for Claude Code's Workflows. Not sure if this will be trumped by Julius's work on orchestration but I wanted to give it a go.

image

The rest of this description is generated by Claude

AI generated from here down

Surfaces Claude Agent SDK workflow runs (the Workflow orchestration tool — phases, subagents, narration) end to end in the desktop app.

What you get

  • Inline workflow card in the chat timeline: name, status, phase-grouped agent rows with live per-agent status (queued/running/done/error), token + duration rollup, stop button, and a details affordance. Remote (remote: true) runs render a link to their cloud session instead.
  • Workflow right-panel surface with three tabs:
    • Run — full phase/agent tree; agent rows expand into their live transcript (cursor-paged, polled every 2s while running)
    • Script — the persisted workflow script, syntax-highlighted
    • Logslog() narration plus per-agent results from the run journal
  • Stop (via a new thread.task.stop orchestration command → query.stopTask) and a copy-resume-command affordance (Workflow({ scriptPath, resumeFromRunId })) for terminal runs.

How it works

The SDK streams a cumulative snapshot on every task_progress message via the undocumented workflow_progress field (deliberate wire surface — the CLI's own /workflows view renders it — but absent from the published .d.ts, verified against the 0.3.170 bundled runtime we ship). The read is confined to one cast helper in ClaudeAdapter which normalizes defensively: malformed entries dropped, previews clipped, entry counts capped, unknown agent states rendered as "running". An SDK change degrades to less detail, never a crash.

Because every activity is persisted (SQLite orchestration_events + projections) and re-shipped on reconnect, workflow snapshots are upserted under a stable per-(thread, task) activity id instead of appended per tick — reconnect payloads stay one row per run. Run handles (script path, transcript dir, run id) arrive once via a new task.workflowMeta runtime event emitted from the Workflow tool result.

Disk artifacts (script, journal.jsonl, agent-<id>.jsonl) are served by a new WorkflowInspectionService with realpath containment under ~/.claude/projects (including re-containment of joined leaf files against symlink escapes), fixed leaf filenames, an agent-id allowlist pattern, and size caps — the RPCs cannot be used as an arbitrary-file-read oracle.

Validation

  • vp run typecheck, vp check, and the full vp run -r test suite pass (1428 tests; one pre-existing flake in CursorAdapter.test.ts passes in isolation).
  • 43 new tests: contracts decode, adapter normalization + tool round-trip, ingestion upsert semantics, reactor stop path, decider, inspection-service security cases (incl. leaf-symlink escapes), workflow-logic derivation, session-logic suppression.
  • Reviewed by a multi-agent pass (3 Claude lenses + an independent gpt-5.5 review, findings adversarially verified); the 7 confirmed findings are fixed in the second commit.

Known v1 tradeoffs

  • Transcript reads load the whole file per page server-side (capped response); fine at current transcript sizes, revisit with a seek-based reader if needed.
  • Per-tick snapshot activities still append to the (already append-only) orchestration_events log; only projections/reconnect are deduped. The SDK's own progress throttling bounds the rate.
  • No live elapsed-time ticker on agent rows; durations come from the usage rollup.

🤖 Generated with Claude Code


Note

Medium Risk
Touches orchestration commands, provider session control, and new path-based RPCs with containment rules; scope is large but bounded by tests and defensive normalization.

Overview
Adds end-to-end visibility for Claude Agent SDK workflow runs in the web app (and quieter mobile work logs), plus a way to stop a background workflow task.

Server / provider: The Claude adapter now reads undocumented workflow_progress on task_progress, normalizes it, and forwards workflow metadata on task.started and from Workflow tool results as task.workflowMeta. Runtime ingestion emits a per-tick task.progress row and upserts stable task.workflow-updated / task.workflow-meta activities per (threadId, taskId). A new thread.task.stop command flows through the decider and ProviderCommandReactor to ProviderService.stopTask → Claude query.stopTask, with failure activities when no session is bound. WorkflowInspectionService serves script, journal, and agent transcript reads under a contained projects root via new WebSocket RPCs.

Web UI: workflow-logic derives WorkflowRun view models; the chat timeline shows workflow cards and suppresses duplicate workflow noise in the work log. A Workflow right-panel tab (WorkflowPanel) exposes Run / Script / Logs, expandable agent transcripts, stop, and resume-command copy.

Mobile: Work-log derivation skips workflow snapshot/meta and workflow-owned progress rows (no workflow card yet).

Reviewed by Cursor Bugbot for commit 0f16c5e. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add live workflow run visibility with agent tree, script view, and per-agent transcripts

  • Introduces a workflow-logic module that derives structured WorkflowRun models (phases, agents, logs, usage, handles, status) from raw thread activities, and surfaces them as timeline entries in the chat view
  • Adds a WorkflowPanel right-panel component with Run/Script/Logs tabs, and a WorkflowRunCard inline timeline card with prioritized agent rows, rollup stats, and status chips
  • Adds WebSocket RPCs workflow.readScript, workflow.readJournal, and workflow.readAgentTranscript backed by a new WorkflowInspectionService that validates paths are contained within the projects root before reading
  • Emits a new task.workflowMeta provider runtime event when a Workflow tool result returns run handles, and attaches normalized workflowProgress snapshots to task.progress events with string clipping and entry capping
  • Adds a thread.task.stop client command and thread.task-stop-requested domain event, wired through the decider, ProviderCommandReactor, and adapter stopTask method to stop a background workflow task
  • Workflow snapshot (task.workflow-updated) and meta (task.workflow-meta) activities are suppressed from work log entries on both web and mobile to reduce noise

Macroscope summarized 0f16c5e.

t3dotgg and others added 2 commits July 2, 2026 03:26
Surface Claude Agent SDK workflow runs (the Workflow orchestration tool)
end to end in the desktop app:

- Contracts: WorkflowProgressEntry schemas, run handles, workflow
  inspection RPCs, thread.task.stop command + task-stop-requested event
- ClaudeAdapter: normalize the SDK's undocumented workflow_progress
  snapshot (size-capped, tolerant), forward workflow identity on task
  events, emit task.workflowMeta from Workflow tool results, stopTask
- Ingestion: upsert workflow snapshots under a stable per-task activity
  id so projections and reconnect payloads stay one row per run
- WorkflowInspectionService: path-validated, size-capped reads of the
  run's script, journal, and per-agent transcripts (realpath containment
  under ~/.claude/projects)
- Web: WorkflowRunCard inline in the chat timeline, a workflow right-
  panel surface with Run/Script/Logs tabs, cursor-paged transcript
  polling, stop + resume affordances; remote runs link to their cloud
  session

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- WorkflowInspectionService: re-contain joined leaf files so a symlink
  named journal.jsonl / agent-<id>.jsonl cannot escape the projects root
  (readScript already did this); add regression tests for both
- Stop serializing raw fs error causes in WorkflowInspectionError
- ProviderCommandReactor: surface stopTask failures as a
  provider.task.stop.failed activity instead of a silent log warning
- ProviderService.stopTask: don't resurrect a stopped session via
  recovery just to stop a task
- Ingestion: scope stable workflow activity ids by threadId so SDK task
  id reuse across threads cannot collide in the projection table
- WorkflowPanel: keep polling transcripts past prior EOF ("complete"
  means caught-up, not finished); reuse useCopyToClipboard for the
  resume-command button
- workflow-logic: terminalize runs left "running" when the provider
  session is gone; add a revision counter so timeline rows re-render on
  content changes even when timestamps collide

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions github-actions Bot added the vouch:trusted PR author is trusted by repo permissions or the VOUCHED list. label Jul 2, 2026
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a40b1552-71a6-4a3a-b6c4-4dbe39c051c4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/workflow-run-visibility

Comment @coderabbitai help to get the list of available commands.

@github-actions github-actions Bot added the size:XXL 1,000+ changed lines (additions + deletions). label Jul 2, 2026
Comment thread apps/mobile/src/lib/threadActivity.ts
Comment thread apps/web/src/workflow-logic.ts
Comment thread apps/web/src/components/ChatView.tsx Outdated
Comment thread apps/web/src/components/workflow/WorkflowPanel.tsx
Comment thread apps/web/src/workflow-logic.ts
- workflow-logic: settle in-flight agents to "error" when terminalizing
  a run whose session died, so a stopped chip never sits above pulsing
  "running" agents; drop the dead hasStartedActivity field (snapshot-only
  runs render intentionally after history trims)
- ChatView: gate workflow liveness on derivePhase so interrupted
  sessions also terminalize runs, matching disconnected-session UX
- WorkflowPanel: fetch the transcript once more when a run leaves
  "running" so lines appended after the last poll tick are not lost
- mobile: suppress workflow-owned task.progress/task.completed rows in
  the work log, mirroring desktop (mobile has no workflow card yet)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@t3dotgg

t3dotgg commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

Addressed all five cursor bot findings in 1610d39:

  • Mobile workflow work log noise — mobile's deriveWorkLogEntries now suppresses workflow-owned task.progress/task.completed rows (mirrors desktop's collectWorkflowTaskIds gate; mobile has no workflow card yet, so all workflow rows are hidden rather than rendered as per-tick noise).
  • Stopped chip, running agents — terminalizing a run now also settles its in-flight agents to error ("Interrupted before completion") and recomputes the rollup, so a stopped chip never sits above pulsing agents. Transcript polling stopping there is intentional: the session owning the writer is gone.
  • Interrupted session still active — the liveness gate now uses derivePhase(...) !== "disconnected", which covers interrupted along with stopped/error.
  • Transcript polling stops at completion — the polling effect now issues one final fetch when the run leaves running (and when a row opens on an already-terminal run), so lines appended after the last poll tick are captured.
  • Phantom runs without start activity — the dead hasStartedActivity field is removed; rendering runs derived only from snapshot/meta activities is intentional resilience (e.g. a checkpoint revert can trim the task.started row while the snapshot survives), now documented at the derivation site.

Two new unit tests cover the terminalization behavior. Typecheck, lint, and the affected suites pass.

Comment thread apps/server/src/provider/Layers/ClaudeAdapter.ts Outdated
Comment thread apps/server/src/orchestration/Layers/ProviderRuntimeIngestion.ts
Comment thread apps/web/src/components/workflow/WorkflowPanel.tsx Outdated
Comment thread apps/web/src/components/workflow/workflowUi.tsx Outdated
Comment thread packages/client-runtime/src/state/workflow.ts
Comment thread apps/web/src/workflow-logic.ts Outdated
Comment thread packages/contracts/src/workflow.ts
@macroscopeapp

macroscopeapp Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Approvability

Verdict: Needs human review

This PR introduces a significant new feature for workflow run visibility including new RPC endpoints, a file-reading inspection service with security-sensitive path handling, new orchestration commands, and multiple new UI components. The scope and nature of these changes warrant human review.

You can customize Macroscope's approvability policy. Learn more.

Comment thread apps/mobile/src/lib/threadActivity.ts
- ClaudeAdapter: dedupe workflow agents/phases by index (last write wins)
  before applying entry caps, so repeated slot updates cannot exhaust the
  cap and freeze later agents stale; restrict sessionUrl to http(s) so a
  hostile tool result cannot smuggle a javascript: href
- workflowUi: guard sessionUrl scheme again at both anchor render sites
  (defense in depth for payloads persisted before the adapter filter)
- Ingestion: JSON-encode the (threadId, taskId) tuple in stable workflow
  activity ids to remove delimiter-ambiguity collisions
- workflow-logic: order same-timestamp activities by provider sequence
  then lifecycle rank so task.completed can never apply before its
  task.started
- WorkflowPanel/Card: scope phase and agent row keys by taskId so
  switching runs remounts rows instead of leaking expanded transcript
  state and cursors across runs
- workflowUi: phase progress counter counts settled (done + error)
  agents instead of reporting 0/1 for an errored terminal phase
- client-runtime: readJournal query polls every 4s while mounted so the
  Logs tab picks up new results during a live run
- mobile: keep workflow task.completed rows (only per-tick progress is
  suppressed) — with no workflow card on mobile it is the only signal a
  workflow finished or failed

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@t3dotgg

t3dotgg commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

All seven macroscope findings plus cursor's follow-up are addressed in 0eaeb94:

  • XSS via sessionUrl (High) — the adapter now only forwards http(s) URLs from tool results, and both anchor render sites re-validate the scheme (safeWorkflowSessionUrl) as defense in depth for payloads persisted before the filter. I kept the contract schema permissive on purpose: activity payloads aren't schema-validated on the read path, so enforcement lives at the producer and consumer boundaries where it actually executes.
  • Cap counts duplicatesnormalizeWorkflowProgress now dedupes agents and phases by index (last write wins) before enforcing caps, so repeated slot updates can't exhaust the cap and freeze later agents stale.
  • Stable-id delimiter ambiguity — workflow activity ids now JSON-encode the (threadId, taskId) tuple, eliminating :-collision between distinct pairs.
  • Row state leaking across runs — phase/agent row keys in both the panel and the card are scoped by taskId, so switching runs remounts rows instead of carrying over expanded transcripts and cursors.
  • Phase counter vs errored agents — the x/y counter now counts settled (done + error) agents.
  • Frozen journal viewreadJournal polls every 4s while mounted (it only mounts while the Logs tab is open, so the poll is bounded).
  • Same-millisecond orderingderiveWorkflowRuns now orders by provider sequence when present, then lifecycle rank (started < updated/meta < completed), so a completion can never be applied before the start that creates its run.
  • Mobile workflow completion (cursor) — mobile keeps task.completed rows for workflow tasks (only per-tick task.progress is suppressed), so the finished/failed/stopped signal survives on a surface with no workflow card.

Typecheck, lint, and the affected suites (106 server + 1298 web tests) pass.

@macroscopeapp macroscopeapp Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One Effect service convention issue: the new WorkflowInspectionError drops the underlying failure instead of preserving it as cause. See inline comments.

Posted via Macroscope — Effect Service Conventions

Comment thread apps/server/src/workflow/WorkflowInspectionService.ts
Comment thread packages/contracts/src/workflow.ts
Real filesystem failures (read-failed / not-found wraps) forward their
underlying error as an optional cause, matching the GitCommandError
convention; pure validation reasons stay cause-less. The paths a cause
can carry are ones the client supplied in the request, so this does not
reintroduce the earlier information-exposure concern.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@t3dotgg

t3dotgg commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

Addressed the error-convention findings in the latest commit: WorkflowInspectionError carries an optional cause again, and all five real-failure wrap sites (realpath resolution + the three file reads) forward the underlying error, matching the GitCommandError convention. Pure validation failures (invalid-path) intentionally stay cause-less.

For the record on the apparent tension with the earlier information-exposure comment: the causes here can only reference paths the client itself supplied in the request, so preserving the chain doesn't leak anything the caller doesn't already know — and it keeps parity with how the other inspection-style errors in the codebase behave.

@t3dotgg

t3dotgg commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

The latest macroscope pass re-posted the six findings that were already fixed in 0eaeb94 — the comments quote pre-fix code (the :-joined activity id, the missing refreshIntervalMs, the done-only phase counter, etc.), all of which are visibly changed on the current head cb5a39a68. Resolved the duplicate threads; happy to revisit any of them if the next pass still flags the current code.

Comment thread apps/web/src/workflow-logic.ts
A task.completed for a known workflow task now creates the run when its
task.started has not been applied yet — adopted runs can carry reset
provider sequences across CLI restarts, letting the completion sort
first. The later-applied started only fills metadata and can never
resurrect a terminal status, so derivation correctness no longer
depends on the comparator at all.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@t3dotgg

t3dotgg commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

Good catch on the sequence-inversion path — rather than special-casing the comparator, the derivation is now order-robust: a task.completed for a known workflow task creates the run itself when its task.started hasn't been applied yet (the adopted-run scenario where provider sequences reset across CLI restarts), and a later-applied started only fills metadata, never resurrecting a terminal status. Covered by a new test with an inverted-sequence fixture.

@DavidIlie

Copy link
Copy Markdown

pls merge

- Forward the SDK snapshot's per-agent tokens / toolCalls / durationMs
  through the contract, adapter normalization, and view model; agent
  rows now show model plus "94.2k tok · 47 tools · 7m 03s" (duration
  once the agent settles), mirroring the Claude Code TUI
- Rewrite the transcript renderer: only assistant text and tool calls
  render (tool rows as "→ Name input-preview"); user turns, tool
  results, attachments, and thinking are skipped instead of printing
  raw type names ("user attachment attachment")
- Bound transcript memory: retain a 600-line tail with an
  "earlier activity trimmed" notice so million-token agent threads
  cannot grow client memory unbounded

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comment thread apps/web/src/components/workflow/WorkflowPanel.tsx
Comment thread apps/web/src/components/workflow/WorkflowPanel.tsx
The SDK emits tokens/toolCalls on every progress tick (verified against
the bundled runtime), so agent rows now show them live; elapsed time for
a running agent derives from lastProgressAt - startedAt, which advances
tick-by-tick without a client timer. Settled agents keep the reported
total duration.

Agent rows no longer render the routine inline preview text (result and
last-tool snippets) — the expandable transcript owns that detail. Error
text stays inline since it explains a red row at a glance.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit bc9ab83. Configure here.

Comment thread apps/web/src/components/workflow/workflowUi.tsx Outdated
t3dotgg and others added 3 commits July 2, 2026 12:47
Long agent labels were shrink-0 and pushed the model/stats readout off
the clipped right edge. Rows now stack: the label owns the first line
and wraps freely, and model · stats · badges · error text sit on a
muted meta line under it — nothing competes for horizontal space, so
nothing clips regardless of label length. The chevron and status dot
center against the first text line via fixed line-height boxes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- WorkflowPanel: track a render-visible caught-up flag so the transcript
  shows "Loading transcript…" until the first read reaches EOF — an
  absence of parsed entries mid-drain means still paging, not that the
  agent produced no output
- workflowUi: error rows fall back to resultPreview when the snapshot
  carries no error field, so red rows always explain the failure inline

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The test asserted the session/cancel request against the log snapshot
returned by the *second* waitForJsonLogMatch, but only the first wait
guaranteed that line — the two log appends are independent, so under
CI load the second snapshot can miss (or tear) the cancel line and the
assertion fails spuriously. Each assertion now checks the snapshot its
own wait resolved on. This flake predates the branch (zero Cursor
files are touched here) but kept failing this PR's Test job.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@t3dotgg

t3dotgg commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

The Test job failure was the pre-existing CursorAdapter.test.ts flake (this branch touches zero Cursor files): the test asserted session/cancel presence against the log snapshot from the second waitForJsonLogMatch, while only the first wait guaranteed that line — under CI load the second snapshot can miss or tear it. Deflaked in the latest commit by asserting each condition against the snapshot its own wait resolved on. 5/5 isolated runs and a full vp run -r test pass locally; CI is re-running on the push.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL 1,000+ changed lines (additions + deletions). vouch:trusted PR author is trusted by repo permissions or the VOUCHED list.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants