From f3bf9206f1cb8f61ccde91e1fe30b9ed8ede6b98 Mon Sep 17 00:00:00 2001 From: Dan Cieslak Date: Sun, 7 Jun 2026 09:52:55 -0500 Subject: [PATCH] docs(rfc): add feedback on dynamic workflows RFC Companion review of the dynamic workflows RFC (#3431). Endorses the narrow v1 scope and proposes four low-cost extension seams -- external/human-input steps, addressable run identity + event bus, an explicit determinism boundary, and a pluggable AgentExecutor -- so human-in-the-loop approval, after-action reviews, bounded incident investigation, and non-mux execution backends stay open without expanding v1. Co-Authored-By: Claude Opus 4.8 --- rfc/20260529_dynamic-workflows.feedback.md | 74 ++++++++++++++++++++++ 1 file changed, 74 insertions(+) create mode 100644 rfc/20260529_dynamic-workflows.feedback.md diff --git a/rfc/20260529_dynamic-workflows.feedback.md b/rfc/20260529_dynamic-workflows.feedback.md new file mode 100644 index 0000000000..620ae3d3f4 --- /dev/null +++ b/rfc/20260529_dynamic-workflows.feedback.md @@ -0,0 +1,74 @@ +--- +title: Feedback — Dynamic Workflows RFC +re: rfc/20260529_dynamic-workflows.md (PR #3431) +date: 2026-06-07 +--- + +# Feedback: Dynamic Workflows RFC + +## TL;DR + +The v1 shape is right: a **bounded, single-player, deterministic agent-orchestration** engine — conductor-only, durable replay, lightweight run card. Nothing here argues for adding features to v1. + +This feedback is about **preserving a few extension seams** so three later use-case classes can ride this engine without a rewrite: **human-in-the-loop approval**, **after-action reviews (AARs)**, and **incident investigation** ("investigate and resolve a production incident"). All three are *bounded* runs that terminate. A fourth seam keeps the engine **backend-agnostic** — able to drive non-mux agent runtimes, not only mux's own task system. None of this belongs in v1. The asks below are about *not narrowing an abstraction prematurely* — near-zero v1 build cost — plus writing down one boundary. + +## What's right and should not change + +- **Conductor-only; side-effects live in tasks.** Correct, and more capable than it looks: "publish the report," "file a ticket," "take a mitigation action" are all just tasks the workflow spawns with the right tools. Do not relax this for any use case below. +- **Deterministic replay + durable journal.** The right correctness model, and the crash-resilience is exactly what longer-running coordination needs. The asks below are about staying *inside* this model, not loosening it. +- **Structured task output + report-time validation** and the **adversarial-verification lane** are the most valuable parts and generalize well beyond deep research (verifiable artifacts; contributing-factor analysis that is refuted/qualified rather than asserted). +- JS v1, project-trust gating, scratch→promote, discoverable-like-skills: all good. + +## The future use cases (all bounded) + +1. **Human-in-the-loop approval** — a run that pauses for a human decision/sign-off, then continues. +2. **After-action review (AAR)** — agent reconstructs a timeline from logs/telemetry, drafts contributing factors, incorporates human input and sign-off, compiles action items, publishes the writeup. Retrospective. +3. **Incident investigation** — declared → investigate → mitigate → verify → resolved (AAR afterward). Bounded, but **event-reactive while open**: alerts fire, metrics cross thresholds, and a human posts an update ("rolled back service X") at unpredictable times *during* the run. + +What unites all three: a run sometimes needs a **step whose result comes from outside the agent fan-out** — a human decision/input, or an external event. v1 only has agent-produced step results. Generalizing the *source* of a step result, without touching the journal/replay machinery, is the one seam that unlocks all three. + +> Out of scope even for us: a *genuinely never-ending* run (e.g., a standing monitor that never resolves). That would break completion-assuming machinery (unbounded journal growth, blocked awaiters, ever-costlier replay) and is a different primitive. All three use cases above terminate, so this is not what we're asking for. + +## Seams to preserve (low / zero v1 cost) + +### 1. (Headline) Let a durable step be resolved by an external input — human *or* event — not only by an agent task + +The replay model already short-circuits completed steps by stable ID + recorded result. A human approval/input and an external event (alert / metric / webhook) are *the same shape*: a durable replay-boundary step whose result is supplied from outside the agent fan-out and recorded in the journal exactly like an agent task's result. The run-status enum already lists **`waiting`**. + +This is the unifying insight: **human input and external events are one mechanism, not two.** v1 only needs agent-produced results; the seam is to keep the step/journal abstraction general about *where a step's result comes from*, rather than assuming every replay-boundary primitive is an agent-task spawn. + +Ask: model the durable step as "a boundary that resolves to a recorded result," with agent-task as the only v1 *source*, but no assumption baked in that it's the only possible source. This single seam unlocks approval, AAR, and incident investigation later, with no v1 feature work. + +### 2. Keep run identity and the event bus addressable independent of the launching chat + +The RFC already calls for background runs that "remain observable from the launching chat" and a run store that "should support future dashboard/list views." Both imply a run that can be inspected outside the immediate chat turn that started it. + +Ask: ensure `WorkflowRunStore` run identity and `WorkflowEventBus` events are addressable on their own (a stable run id + a subscribable event stream), so the dashboard/background-observability surfaces the RFC already anticipates can attach without modifying the runtime. Just don't couple the event stream exclusively to one chat session. (Nothing new to build in v1 — only a coupling to avoid.) + +### 3. Affirm the determinism boundary — and note that journaling external inputs is what keeps reactive-but-bounded runs inside it + +Deterministic replay (rerun the script against the journal; short-circuit by step id + input hash) is the right v1 choice, and it does **not** conflict with event-reactivity — *as long as external inputs are recorded as journal entries* (seam 1). Then a crash-resume reproduces "alert fired at T1, human input at T2, agent concluded Y at T3" deterministically, even though those inputs arrived live. + +Ask: state the boundary explicitly in the RFC — "the runtime assumes a run is a deterministic function of its inputs and **recorded** step results; reactive inputs are supported only when journaled; unbounded non-deterministic streams are out of scope." That makes reactive-but-bounded runs (incident investigation) a clean future on the same model, and unbounded reactive orchestration a deliberate non-goal rather than an accidental lock-in. + +### 4. Keep the agent executor behind `agent(spec)` pluggable — don't assume mux's TaskService is the only backend + +Just as mux abstracts *where code runs* behind a runtime backend (local, SSH, Docker, devcontainer, Coder), the workflow should abstract *what executes an agent step*. Today `agent(spec)` resolves to a mux sub-agent task. But the conductor doesn't need to know *who* ran the agent — it needs a spec to come back as a validated result. + +The RFC is already most of the way there: it lists a **"TaskService adapter,"** and the structured-output report contract (`spec → { reportMarkdown, structuredOutput }`, validated at report time) *is* the executor-agnostic boundary. The async primitives (`backgroundAgent` → handle, `awaitAgents`) are also the right shape for an executor that runs elsewhere and is awaited/polled (return an external handle, resolve it later). + +Ask: define the execution path as an `AgentExecutor` interface — roughly `(spec) → durable handle → validated result` — with the mux TaskService as the only v1 implementation. Don't thread `TaskService` directly through the conductor runtime. + +Why it's worth it even if mux never adds another backend: it's a clean decoupling/testability win (a mockable executor) on its own merits. As a bonus, it leaves the door open for external agent runtimes to fulfill steps later — e.g., **Anthropic's Managed Agents** (a natural fit, since mux already integrates Anthropic models via `@ai-sdk/anthropic`) and other emerging agent orchestrators — making the workflow engine a backend-agnostic conductor rather than one welded to mux's own task system. + +A backend may even bring its own *execution model* through this seam — e.g., the rubric-graded, iterate-until-satisfied loop in Anthropic's "Define Outcomes" — which mux would **inherit rather than build into v1**. (A conductor-side grade-and-retry, by contrast, is already expressible by composing existing primitives, as the deep-research adversarial-verification lane shows — so neither flavor needs a new v1 primitive.) Whether to *ship* any external executor is a separate product call; this is only about not foreclosing it. + +## What this is explicitly NOT asking for + +- No human-wait or event-source primitive in v1. +- No dashboard or cross-run observability surface in v1 (only: don't couple the event stream to a single chat). +- No never-ending / unbounded-reactive run kind — ever, on this model. +- No external/non-mux agent executor in v1 (only: an `AgentExecutor` interface with mux TaskService as the sole implementation). +- No relaxation of the conductor-only / side-effects-in-tasks boundary. + +Net: keep the **step/journal** abstraction one notch more general about *where a step result comes from*, put agent execution behind an **`AgentExecutor`** interface, keep run identity/events addressable on their own, and write down the determinism boundary (seam 3). That's the whole ask — roughly free now, and it keeps human-in-the-loop approval, AARs, bounded incident investigation, and non-mux execution backends open as future work on the same engine.