Workflow run visibility: live agent tree, script view, per-agent transcripts#3650
Workflow run visibility: live agent tree, script view, per-agent transcripts#3650t3dotgg wants to merge 11 commits into
Conversation
Surface Claude Agent SDK workflow runs (the Workflow orchestration tool) end to end in the desktop app: - Contracts: WorkflowProgressEntry schemas, run handles, workflow inspection RPCs, thread.task.stop command + task-stop-requested event - ClaudeAdapter: normalize the SDK's undocumented workflow_progress snapshot (size-capped, tolerant), forward workflow identity on task events, emit task.workflowMeta from Workflow tool results, stopTask - Ingestion: upsert workflow snapshots under a stable per-task activity id so projections and reconnect payloads stay one row per run - WorkflowInspectionService: path-validated, size-capped reads of the run's script, journal, and per-agent transcripts (realpath containment under ~/.claude/projects) - Web: WorkflowRunCard inline in the chat timeline, a workflow right- panel surface with Run/Script/Logs tabs, cursor-paged transcript polling, stop + resume affordances; remote runs link to their cloud session Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- WorkflowInspectionService: re-contain joined leaf files so a symlink
named journal.jsonl / agent-<id>.jsonl cannot escape the projects root
(readScript already did this); add regression tests for both
- Stop serializing raw fs error causes in WorkflowInspectionError
- ProviderCommandReactor: surface stopTask failures as a
provider.task.stop.failed activity instead of a silent log warning
- ProviderService.stopTask: don't resurrect a stopped session via
recovery just to stop a task
- Ingestion: scope stable workflow activity ids by threadId so SDK task
id reuse across threads cannot collide in the projection table
- WorkflowPanel: keep polling transcripts past prior EOF ("complete"
means caught-up, not finished); reuse useCopyToClipboard for the
resume-command button
- workflow-logic: terminalize runs left "running" when the provider
session is gone; add a revision counter so timeline rows re-render on
content changes even when timestamps collide
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
- workflow-logic: settle in-flight agents to "error" when terminalizing a run whose session died, so a stopped chip never sits above pulsing "running" agents; drop the dead hasStartedActivity field (snapshot-only runs render intentionally after history trims) - ChatView: gate workflow liveness on derivePhase so interrupted sessions also terminalize runs, matching disconnected-session UX - WorkflowPanel: fetch the transcript once more when a run leaves "running" so lines appended after the last poll tick are not lost - mobile: suppress workflow-owned task.progress/task.completed rows in the work log, mirroring desktop (mobile has no workflow card yet) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Addressed all five cursor bot findings in 1610d39:
Two new unit tests cover the terminalization behavior. Typecheck, lint, and the affected suites pass. |
ApprovabilityVerdict: Needs human review This PR introduces a significant new feature for workflow run visibility including new RPC endpoints, a file-reading inspection service with security-sensitive path handling, new orchestration commands, and multiple new UI components. The scope and nature of these changes warrant human review. You can customize Macroscope's approvability policy. Learn more. |
- ClaudeAdapter: dedupe workflow agents/phases by index (last write wins) before applying entry caps, so repeated slot updates cannot exhaust the cap and freeze later agents stale; restrict sessionUrl to http(s) so a hostile tool result cannot smuggle a javascript: href - workflowUi: guard sessionUrl scheme again at both anchor render sites (defense in depth for payloads persisted before the adapter filter) - Ingestion: JSON-encode the (threadId, taskId) tuple in stable workflow activity ids to remove delimiter-ambiguity collisions - workflow-logic: order same-timestamp activities by provider sequence then lifecycle rank so task.completed can never apply before its task.started - WorkflowPanel/Card: scope phase and agent row keys by taskId so switching runs remounts rows instead of leaking expanded transcript state and cursors across runs - workflowUi: phase progress counter counts settled (done + error) agents instead of reporting 0/1 for an errored terminal phase - client-runtime: readJournal query polls every 4s while mounted so the Logs tab picks up new results during a live run - mobile: keep workflow task.completed rows (only per-tick progress is suppressed) — with no workflow card on mobile it is the only signal a workflow finished or failed Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
All seven macroscope findings plus cursor's follow-up are addressed in 0eaeb94:
Typecheck, lint, and the affected suites (106 server + 1298 web tests) pass. |
There was a problem hiding this comment.
One Effect service convention issue: the new WorkflowInspectionError drops the underlying failure instead of preserving it as cause. See inline comments.
Posted via Macroscope — Effect Service Conventions
Real filesystem failures (read-failed / not-found wraps) forward their underlying error as an optional cause, matching the GitCommandError convention; pure validation reasons stay cause-less. The paths a cause can carry are ones the client supplied in the request, so this does not reintroduce the earlier information-exposure concern. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Addressed the error-convention findings in the latest commit: For the record on the apparent tension with the earlier information-exposure comment: the causes here can only reference paths the client itself supplied in the request, so preserving the chain doesn't leak anything the caller doesn't already know — and it keeps parity with how the other inspection-style errors in the codebase behave. |
|
The latest macroscope pass re-posted the six findings that were already fixed in 0eaeb94 — the comments quote pre-fix code (the |
A task.completed for a known workflow task now creates the run when its task.started has not been applied yet — adopted runs can carry reset provider sequences across CLI restarts, letting the completion sort first. The later-applied started only fills metadata and can never resurrect a terminal status, so derivation correctness no longer depends on the comparator at all. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Good catch on the sequence-inversion path — rather than special-casing the comparator, the derivation is now order-robust: a |
|
pls merge |
- Forward the SDK snapshot's per-agent tokens / toolCalls / durationMs
through the contract, adapter normalization, and view model; agent
rows now show model plus "94.2k tok · 47 tools · 7m 03s" (duration
once the agent settles), mirroring the Claude Code TUI
- Rewrite the transcript renderer: only assistant text and tool calls
render (tool rows as "→ Name input-preview"); user turns, tool
results, attachments, and thinking are skipped instead of printing
raw type names ("user attachment attachment")
- Bound transcript memory: retain a 600-line tail with an
"earlier activity trimmed" notice so million-token agent threads
cannot grow client memory unbounded
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The SDK emits tokens/toolCalls on every progress tick (verified against the bundled runtime), so agent rows now show them live; elapsed time for a running agent derives from lastProgressAt - startedAt, which advances tick-by-tick without a client timer. Settled agents keep the reported total duration. Agent rows no longer render the routine inline preview text (result and last-tool snippets) — the expandable transcript owns that detail. Error text stays inline since it explains a red row at a glance. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit bc9ab83. Configure here.
Long agent labels were shrink-0 and pushed the model/stats readout off the clipped right edge. Rows now stack: the label owns the first line and wraps freely, and model · stats · badges · error text sit on a muted meta line under it — nothing competes for horizontal space, so nothing clips regardless of label length. The chevron and status dot center against the first text line via fixed line-height boxes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- WorkflowPanel: track a render-visible caught-up flag so the transcript shows "Loading transcript…" until the first read reaches EOF — an absence of parsed entries mid-drain means still paging, not that the agent produced no output - workflowUi: error rows fall back to resultPreview when the snapshot carries no error field, so red rows always explain the failure inline Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The test asserted the session/cancel request against the log snapshot returned by the *second* waitForJsonLogMatch, but only the first wait guaranteed that line — the two log appends are independent, so under CI load the second snapshot can miss (or tear) the cancel line and the assertion fails spuriously. Each assertion now checks the snapshot its own wait resolved on. This flake predates the branch (zero Cursor files are touched here) but kept failing this PR's Test job. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
The Test job failure was the pre-existing |

I had Fable try to implement better UI/UX for Claude Code's Workflows. Not sure if this will be trumped by Julius's work on orchestration but I wanted to give it a go.
The rest of this description is generated by Claude
AI generated from here down
Surfaces Claude Agent SDK workflow runs (the
Workfloworchestration tool — phases, subagents, narration) end to end in the desktop app.What you get
remote: true) runs render a link to their cloud session instead.log()narration plus per-agent results from the run journalthread.task.stoporchestration command →query.stopTask) and a copy-resume-command affordance (Workflow({ scriptPath, resumeFromRunId })) for terminal runs.How it works
The SDK streams a cumulative snapshot on every
task_progressmessage via the undocumentedworkflow_progressfield (deliberate wire surface — the CLI's own/workflowsview renders it — but absent from the published.d.ts, verified against the 0.3.170 bundled runtime we ship). The read is confined to one cast helper inClaudeAdapterwhich normalizes defensively: malformed entries dropped, previews clipped, entry counts capped, unknown agent states rendered as "running". An SDK change degrades to less detail, never a crash.Because every activity is persisted (SQLite
orchestration_events+ projections) and re-shipped on reconnect, workflow snapshots are upserted under a stable per-(thread, task) activity id instead of appended per tick — reconnect payloads stay one row per run. Run handles (script path, transcript dir, run id) arrive once via a newtask.workflowMetaruntime event emitted from the Workflow tool result.Disk artifacts (script,
journal.jsonl,agent-<id>.jsonl) are served by a newWorkflowInspectionServicewith realpath containment under~/.claude/projects(including re-containment of joined leaf files against symlink escapes), fixed leaf filenames, an agent-id allowlist pattern, and size caps — the RPCs cannot be used as an arbitrary-file-read oracle.Validation
vp run typecheck,vp check, and the fullvp run -r testsuite pass (1428 tests; one pre-existing flake inCursorAdapter.test.tspasses in isolation).Known v1 tradeoffs
orchestration_eventslog; only projections/reconnect are deduped. The SDK's own progress throttling bounds the rate.🤖 Generated with Claude Code
Note
Medium Risk
Touches orchestration commands, provider session control, and new path-based RPCs with containment rules; scope is large but bounded by tests and defensive normalization.
Overview
Adds end-to-end visibility for Claude Agent SDK workflow runs in the web app (and quieter mobile work logs), plus a way to stop a background workflow task.
Server / provider: The Claude adapter now reads undocumented
workflow_progressontask_progress, normalizes it, and forwards workflow metadata ontask.startedand from Workflow tool results astask.workflowMeta. Runtime ingestion emits a per-ticktask.progressrow and upserts stabletask.workflow-updated/task.workflow-metaactivities per(threadId, taskId). A newthread.task.stopcommand flows through the decider andProviderCommandReactortoProviderService.stopTask→ Claudequery.stopTask, with failure activities when no session is bound.WorkflowInspectionServiceserves script, journal, and agent transcript reads under a contained projects root via new WebSocket RPCs.Web UI:
workflow-logicderivesWorkflowRunview models; the chat timeline shows workflow cards and suppresses duplicate workflow noise in the work log. A Workflow right-panel tab (WorkflowPanel) exposes Run / Script / Logs, expandable agent transcripts, stop, and resume-command copy.Mobile: Work-log derivation skips workflow snapshot/meta and workflow-owned progress rows (no workflow card yet).
Reviewed by Cursor Bugbot for commit 0f16c5e. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Add live workflow run visibility with agent tree, script view, and per-agent transcripts
workflow-logicmodule that derives structuredWorkflowRunmodels (phases, agents, logs, usage, handles, status) from raw thread activities, and surfaces them as timeline entries in the chat viewWorkflowPanelright-panel component with Run/Script/Logs tabs, and aWorkflowRunCardinline timeline card with prioritized agent rows, rollup stats, and status chipsworkflow.readScript,workflow.readJournal, andworkflow.readAgentTranscriptbacked by a newWorkflowInspectionServicethat validates paths are contained within the projects root before readingtask.workflowMetaprovider runtime event when a Workflow tool result returns run handles, and attaches normalizedworkflowProgresssnapshots totask.progressevents with string clipping and entry cappingthread.task.stopclient command andthread.task-stop-requesteddomain event, wired through the decider,ProviderCommandReactor, and adapterstopTaskmethod to stop a background workflow tasktask.workflow-updated) and meta (task.workflow-meta) activities are suppressed from work log entries on both web and mobile to reduce noiseMacroscope summarized 0f16c5e.