feat: Phase 1 of typed on_error routing (#227) by PolyphonyRequiem · Pull Request #229 · microsoft/conductor

PolyphonyRequiem · 2026-05-21T23:37:14Z

Phase 1 of the typed on_error routing design from #227 (RFC).

Companion PR to #227 (brainstorm spec). Spec excerpts inline below;
full design and open questions live in the RFC.

What ships in this PR (Phase 1 scope from the RFC)

Script env-var-file contract — CONDUCTOR_ERROR_OUT points at a temp
file the script writes a typed envelope into; engine reads on exit. Works
uniformly for pwsh, bash, python, node, dotnet (dotnet run --
style).
Agent JSON discriminator — {conductor_error: true, kind, message, details}
in agent output is treated as a raise rather than a regular response.
RouteDef.on_error: bool | str | list[str] — routes can match on
the failing node''s typed error kind.
{{ failing_node.error }} template scope — handler nodes get the
envelope of the node that raised, alongside its (possibly partial) output.
Halt-on-unhandled at root — if no error route matches, the workflow
halts; engine writes errors.jsonl (TMPDIR pattern, alongside the event
log) and emits a typed workflow_failed event. CLI maps the exception to
new exit code 3.
Optional AgentDef.raises: list[str] — declared kinds are linted
against route on_error declarations at validation time and checked at
runtime; undeclared kinds are wrapped as internal.undeclared_kind.
Engine-agnostic helpers under src/conductor/helpers/error/ for
pwsh / bash / python / node / dotnet (ship in the wheel).

What''s reserved for Phase 2/3 (out of scope here)

Sub-workflow envelope propagation across the boundary.
retry / halt / propagate route actions.
Provider-exhaustion synthetic kind.
on_error on routes from type: workflow, human_gate, notification,
and parallel/for_each groups is currently a hard validation error
(avoids silent "handler that never fires" footguns). These will become
valid in Phase 2.

Reserved kinds emitted in Phase 1

internal.script_error — script exited non-zero AND wrote no envelope
(opt-in: only synthesized when the node has raises or any on_error
route, so legacy exit_code-routing workflows are unaffected).
internal.schema_violation — agent output failed its output: schema.
internal.undeclared_kind — node with raises: raised a kind not in
its list; original kind preserved under details.original_kind.

Reserved kind prefixes (validator forbids users declaring these):
internal., provider., subworkflow., retry..

How to read this PR

The 14 commits map 1:1 to the implementation steps from the plan and are
designed to be reviewed in order. Each is independently testable and
green:

#	Commit	Touch points
1	`feat(schema)`	`config/schema.py`
2	`feat(engine)` ErrorEnvelope types	`engine/errors.py`, `exceptions.py`
3	`feat(router)` success vs. error buckets	`engine/router.py`
4	`feat(context)` store_error + .error access	`engine/context.py`
5	`feat(validator)` cross-check	`config/validator.py`
6	`feat(agent-exec)` envelope path	`executor/agent.py`
7	`feat(script-exec)` `CONDUCTOR_ERROR_OUT`	`executor/script.py`
8	`feat(engine-wire)` leaf-error path	`engine/workflow.py`
9	`feat(halt-jsonl)` errors.jsonl + event	`engine/workflow.py`
10	`feat(cli-exit)` exit code 3	`cli/app.py`
11	`feat(helpers)` 5 language helpers	`helpers/error/*`
12	`feat(examples)`	`examples/error-routing*.yaml`
13	`test(xeng)` cross-engine envelope contract	`tests/test_integration/`
14	`phase-1(checks)` final ty/lint/test sweep	minor casts

Examples

examples/error-routing.yaml — script-based; uses the
CONDUCTOR_ERROR_OUT contract directly with no helper. Workflow input
simulated_failure toggles ok / drift / rate_limited.
examples/error-routing-helpers.yaml — same shape using the shipped
Python helper.

Both validate, both run on Windows and POSIX, both render
{{ failing_node.error.kind }} / .message / .details in the
handler''s prompt.

Test posture

2943 tests pass; 12 baseline failures (perf flake, event_log
non-serializable, registry/integration ×10) are pre-existing and
unchanged.
New tests: 8 engine error-routing tests, 6 helper tests, 2 CLI
exit-code tests, 3 cross-engine integration tests (Python + pwsh run
on Windows; bash skipped on Windows by design — WSL relay shim
unreliable in CI envs).
Lint clean (ruff check), format clean (ruff format --check).
ty check src back to the 12-diagnostic baseline (all pre-existing
Windows termios/tty noise).
make validate-examples equivalent green across all 17 bundled
examples including the two new ones.

Open Phase 1 micro-decisions (resolved in this PR)

Exit code value: 3 (1 = generic, 2 = reserved misuse, 3 = workflow
halted on unhandled error).
errors.jsonl path: same $TMPDIR/conductor/ convention as the event
log; printed at end-of-run.
Frame trail in errors.jsonl: single-element today, structured to
accept multi-frame in Phase 2 without a shape change.
Convenience {{ workflow.last_error }} not added (RFC says require
{{ failing_node.error }}); can revisit in Phase 2.

cc @jasonrobertfox — companion to the RFC at #227.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

…ting Phase 1, step 1 of on_error routing (per design brief in docs/projects/error-routing/on-error-routing.brainstorm.md). Adds two opt-in schema fields and a small shared module for the constants both schema validation and the engine error path will use: - src/conductor/error_kinds.py - KIND_PATTERN: dotted lowercase identifier (at least one dot). - RESERVED_KIND_PREFIXES: internal., provider., subworkflow., retry. (the runtime owns these namespaces; workflow authors cannot declare kinds under them). - RESERVED_ON_ERROR_ALLOWLIST: the closed set of runtime-synthesized kinds that ARE legal to match in on_error even though they're not legal to declare in raises (internal.script_error, internal.schema_violation, internal.undeclared_kind). - is_reserved_prefix(kind) helper. - RouteDef.on_error: bool | str | list[str] | None - None = success route (existing behavior). - True = catch-all error route. - str = single-kind error route. - list[str] = multi-kind error route. - False is rejected (no semantic meaning). - Kind format enforced via KIND_PATTERN. - 'before'-mode validator so Pydantic's bool/str coercion doesn't swallow the discriminator. - AgentDef.raises: list[str] | None - Optional declaration of kinds the node may raise. - Powers a load-time lint (cross-checked against routes' on_error in the validator, landing in a follow-up commit) and a runtime undeclared-kind check (will land with the engine-wiring commit). - Reserved prefixes rejected so authors can't claim runtime namespaces; duplicates rejected; format enforced via KIND_PATTERN. Tests: - tests/test_error_kinds.py — 24 cases covering pattern + prefix + allowlist invariants (allowlist entries must themselves be reserved). - tests/test_config/test_schema.py::TestRouteDefOnError — 14 cases. - tests/test_config/test_schema.py::TestAgentDefRaises — 10 cases. No semantics change for existing workflows: both fields default to None and the engine doesn't observe them yet (wiring lands in subsequent commits). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…exceptions Phase 1, step 2 of on_error routing. Adds src/conductor/engine/errors.py: - ErrorEnvelope TypedDict — the internal {kind, message, details} shape. Strips the on-the-wire conductor_error: true discriminator so callers don't see it in {{ failing_node.error.* }} templates. - EnvelopeValidationError — distinct from ValidationError so the engine can catch and translate malformed envelopes into synthetic internal.* kinds rather than halting with a generic config error. - coerce_envelope(raw) — validates on-the-wire input, normalizes details to {} when absent. - make_script_error(exit_code, stderr_tail, command) — synthesizes the internal.script_error envelope. - make_schema_violation(node_name, source, original_message, failed_field?) — synthesizes the internal.schema_violation envelope with rich details for the swallowed-by-catch-all diagnostics case. - wrap_undeclared_kind(original, declared) — wraps an envelope whose kind isn't in the node's declared raises list. Preserves the original kind/message/details under details.original_* so an author handling internal.undeclared_kind can still recover the intent. Adds two exceptions to src/conductor/exceptions.py: - UnhandledNodeError — internal signal raised by the router when an error envelope reaches no on_error route at the current level. The engine catches this at the per-node dispatch site and re-raises as UnhandledWorkflowError. Not intended to surface to end users. - UnhandledWorkflowError — workflow halted on a typed error envelope. Carries the envelope and a frame trail (single frame in Phase 1; Phase 2 will accumulate frames across sub-workflow boundaries). CLI maps this to a distinct exit code so callers can distinguish 'workflow ran and halted on typed error' from generic failures. Tests: tests/test_engine/test_errors.py — 18 cases covering envelope coercion (including discriminator stripping and details normalization), the three synthetic-kind constructors, and the exception classes including the empty-frames defensive path. Nothing yet emits these envelopes or exceptions; the next commits wire them through the router, executors, and engine dispatch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Route evaluation now partitions by RouteDef.on_error: - on_error is None -> success bucket, evaluated when error=None - on_error is set -> error bucket, evaluated when an envelope is passed; the route's on_error matcher (True | str | list[str]) must match envelope[kind] Behavior preserved on the success path: first matching when: wins, no catch-all raises the existing ValueError. New: error-path exhaustion raises UnhandledNodeError carrying the envelope so the engine can translate it into UnhandledWorkflowError at the call site. Error-route eval context exposes the envelope as `error` for both Jinja2 ({{ error.kind }}) and simpleeval (kind == 'x.y' via flatten). Adds 12 tests in TestRouterErrorBucket covering bucket isolation, all three on_error matcher shapes, when: combined with on_error, output: transforms, ordering within the bucket, the new UnhandledNodeError path, and the legacy ValueError path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds WorkflowContext.store_error(agent_name, envelope) that co-locates error envelopes with their producing node's slot. The rendered context shape for a node is now `{node: {output?, error?}}`. All three context modes surface errors: - accumulate: each failing node gets `{agent: {error: envelope}}` - last_only: failing last agent surfaces with `{error: ...}` - explicit: declarations of the form `agent.error[.field]` copy the whole envelope into ctx[agent]['error']. Envelopes are bounded and templates commonly need `error.details.*`, so the runtime never field-slices them. Validator updates so existing semantic checks cover the new path: - INPUT_REF_PATTERN gains an `error_agent`/`error_field` branch matching `<agent>.error[.field]`. - _OUTPUT_ATTRS includes the singular `error` so Jinja AST analysis treats `{{ failing.error.kind }}` as a real output-class ref. - TemplateRefs gains `agent_error_refs: set[str]` and the AST walker populates it. - The per-agent template walk emits the same explicit-mode `undeclared input` warning for `.error` refs that `.output` and group `.errors` already get. - Unknown-agent checks cover the `.error` ref path. - Parallel-group internal-dependency check rejects intra-group `.error` refs too. Checkpoint round-trip via to_dict/from_dict serializes `agent_errors`; older checkpoints without the key restore as empty (backwards-compat). Adds 14 tests in TestWorkflowContextStoreError, 5 INPUT_REF_PATTERN shape tests, and 3 TemplateRefs error-extraction tests. Fixes the test_empty_context dict-equality fixture to include the new field. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…type Adds two new helpers in conductor.config.validator: - _validate_on_error_routes(agent): hard-errors on_error routes on node types that don't raise envelopes in Phase 1 (human_gate, workflow); validates each kind matches KIND_PATTERN; if agent.raises is declared, every concrete kind in on_error must be in raises or RESERVED_ON_ERROR_ALLOWLIST (catch-all rue always legal). - _validate_group_routes_no_on_error(): rejects on_error routes on parallel and for_each groups (group-level envelopes are Phase 2). Both helpers are wired into validate_workflow_config(). 11 new tests cover plain agent + script (legal), human_gate + workflow + parallel + for_each (rejected), bad kind format, undeclared-kind cross-check, catch-all, reserved allowlist, and no-raises = no constraint. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ions AgentOutput grows an optional `error: dict | None` field that carries an ErrorEnvelope when the agent failed. AgentExecutor.execute() now, after the provider call returns: 1. If the response is a dict with `conductor_error: true`, coerces the envelope (or synthesizes `internal.schema_violation` when the envelope itself is malformed) and attaches it to output.error WITHOUT running validate_output (the declared output schema doesn't apply to error envelopes). 2. Otherwise runs validate_output, and on ValidationError synthesizes an `internal.schema_violation` envelope on output.error instead of raising. Partial outputs (from mid-agent interrupts) bypass both checks. The error module is imported lazily inside execute() to avoid a circular import via conductor.engine.__init__. Updated two existing tests to assert the new envelope contract instead of expecting ValidationError. Added three new tests covering well-formed envelopes, malformed envelopes, and the partial-output bypass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…nthesize internal.script_error ScriptOutput grows an optional `error: dict | None` field. Each script run now allocates a tempfile via tempfile.mkstemp(), closes the fd immediately, and exposes the path in the env as CONDUCTOR_ERROR_OUT — set AFTER the agent.env merge so users cannot accidentally redirect or override it. After process.communicate() the executor reads the file: - empty / missing → no envelope - valid JSON envelope → coerce_envelope, attach to output.error - valid JSON but malformed envelope → internal.schema_violation - invalid JSON → internal.schema_violation If no envelope was written AND exit_code != 0 AND the node has opted into error routing (raises declared OR any on_error route present), the executor synthesizes an `internal.script_error` envelope. Legacy workflows that route on exit_code (no opt-in) keep their existing behavior. Temp file is always removed in finally — even on timeout/command-not-found. Added 7 tests covering: no envelope on success, well-formed envelope surfaces, user env cannot override, synthesized internal.script_error on opt-in, legacy non-zero with no opt-in keeps error=None, malformed envelope downgrades to schema_violation, temp file cleanup. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ary, group failure carry envelope Wires the on_error contract through the workflow engine end-to-end: - _evaluate_routes() now accepts an optional error envelope; empty routes plus an unhandled envelope raises UnhandledNodeError instead of silently routing to \. - New _normalize_envelope_for_node() applies undeclared-kind wrapping (raises declared + kind not in raises + not in allowlist → internal.undeclared_kind with original kind preserved under details.original_kind). - New _handle_leaf_error() centralizes the leaf path: normalize, store_error, evaluate error routes, raise UnhandledWorkflowError with a single-leaf frame trail on no match. - Agent call site (~2583) and script call site (~2359) both branch on output.error BEFORE storage and BEFORE schema validation. Script's success path runs full output-schema validation as before; error path skips it. - Sub-workflow call sites catch UnhandledWorkflowError from child engines and re-raise as ExecutionError. Phase 1 invariant: envelopes do not propagate across the sub-workflow boundary. - ParallelAgentError and ForEachError grow an optional envelope field. The parallel/for_each child execution helpers detect output.error, normalize it, raise an ExecutionError tagged with ._envelope, and the existing failure_mode machinery records it. Downstream group consumers can inspect the typed envelope. - Tests: 6 new tests in tests/test_engine/test_error_routing.py covering agent envelope routing, unhandled-envelope halt, undeclared-kind normalization (with the rescue agent reading the original kind from context), success-path regression, script envelope routing, and legacy exit_code routing regression. Full suite (2887 tests) green; no regressions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…on UnhandledWorkflowError Phase 1 Step 9. When the engine raises UnhandledWorkflowError (a leaf node returned an error envelope that no on_error route matched), the new `except UnhandledWorkflowError` arm in `_execute_loop`: * writes a single-line `errors.jsonl` record under \\\/conductor/\\ using the same naming convention as the event log (\\conductor-<workflow>-<ts>-<run_id>.errors.jsonl\\), carrying the envelope, frame trail, and leaf node name; * emits a typed \\workflow_failed\\ event with \\�rror_type='UnhandledWorkflowError'\\ plus the envelope, frames and errors-jsonl path so dashboard/log subscribers can render the typed halt distinctly from a generic ConductorError; * runs the \\on_error\\ lifecycle hook and saves a checkpoint, for parity with the other failure paths; * re-raises so the CLI (Step 10) can map it to its distinct exit code. The new arm is placed *before* the existing \\�xcept ConductorError\\ so the typed halt is caught first. Generic ConductorError handling is unchanged. Adds two integration tests covering the jsonl artefact and the typed event. Also tightens a docstring that was 101 chars (pre-existing from Step 8 — caught now by ruff format check). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…stderr summary Phase 1 Step 10. Both \\ un\\ and \\ esume\\ now catch \\UnhandledWorkflowError\\ specifically before the generic \\�xcept Exception\\, render a typed panel to stderr via the new \\print_unhandled_workflow_error\\ helper, and exit with code 3 so callers (CI, polyphony, shell scripts) can distinguish 'workflow ran to completion and halted on a typed error' from a generic failure (code 1). The summary panel surfaces the leaf node, kind, message, optional \\details\\, and the path to the \\�rrors.jsonl\\ artefact. To make the path reachable from the CLI without holding a reference to the engine, \\_execute_loop\\ now also attaches the path to the exception instance (\\�xc.errors_jsonl_path\\) before re-raising. Adds two CLI tests: * unhandled typed halt exits 3, * generic \\RuntimeError\\ still exits 1 (regression guard). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…n/node/dotnet Phase 1 Step 11. New \\src/conductor/helpers/error/\\ directory with one-file convenience modules for raising typed Conductor error envelopes from script-type nodes. All helpers implement the same contract: read \\CONDUCTOR_ERROR_OUT\\, write the \\{conductor_error: true, kind, message, details?}\\ JSON envelope to that path, and return — leaving exit-code management to the caller. * \\Conductor.Error.psm1\\ — PowerShell, \\Write-ConductorError\\ cmdlet * \\conductor-error.sh\\ — Bash/sh, sourced \\conductor_error\\ function * \\conductor_error.py\\ — Python, \\ aise_kind\\ function * \\conductor-error.mjs\\ — Node, exported \\ aiseError\\ * \\ConductorError.cs\\ — .NET, static \\ConductorError.Raise\\ * \\README.md\\ — quick reference + usage examples per engine Helpers ship under \\src/conductor/\\ so hatchling rolls them into the wheel; verified by inspecting the built artefact. Nothing is auto-loaded, on PATH, or on PYTHONPATH — script authors must explicitly Import-Module / source / import to use them, and authors who don't want them write the JSON themselves (it's three lines per engine). Adds \\ ests/test_helpers/test_error_helpers.py\\ with 6 cases covering the Python helper's envelope shape, env-var contract, no- sys.exit guarantee, and round-trip through \\coerce_envelope\\. The non-Python helpers are exercised by the cross-engine integration test landing in Step 13. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Phase 1 Step 12. Two new example workflows under \\�xamples/\\ plus an Error Routing section in \\�xamples/README.md\\. \\�rror-routing.yaml\\: * A \\ ype: script\\ probe that writes \\{conductor_error: true, kind, message, details}\\ to \\\\\ and exits 0 via the language-neutral contract (no helper required). * The probe declares \\ aises: [external.git.drift, external.api.rate_limited]\\ so undeclared kinds get normalised to \\internal.undeclared_kind\\. * Routes select by \\on_error: <kind>\\ to demonstrate that an envelope picks the typed arm over a generic exit_code fallback. * Handlers read \\{{ probe.error.kind }}\\, \\{{ probe.error.message }}\\, and \\{{ probe.error.details.* }}\\. * A \\simulated_failure\\ workflow input toggles between \\ok\\ / \\drift\\ / \\ ate_limited\\ so the same YAML exercises all three arms. \\�rror-routing-helpers.yaml\\: * Same flow, but raises via the shipped Python helper (\\conductor.helpers.error.conductor_error.raise_kind\\) instead of hand-rolled JSON, so authors see the ergonomic version. Both examples validate with \\conductor validate\\ and the full \\make validate-examples\\ sweep (17/17 pass). Caught one writing issue along the way: the input schema field is \\input:\\ (singular), not \\inputs:\\ — fixed before committing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Phase 1 Step 13 (acceptance #1). New \\ ests/test_integration/test_error_routing_cross_engine.py\\ exercises the \\CONDUCTOR_ERROR_OUT\\ contract end-to-end through the real \\WorkflowEngine\\ (no executor mocking) with three writer scripts in different languages: Python, PowerShell (\\pwsh\\), and bash. All three share the same workflow YAML shape and the same expected outcome: the probe writes a typed envelope, the engine routes by \\on_error: external.git.drift\\, and a rescue agent reads the kind back from the routed-to scope. Each test is skipped when the corresponding interpreter is missing from PATH (and bash is also skipped on Windows, where \\shutil.which\\ typically returns the WSL relay shim that fails with an opaque \\�xecvpe(/bin/bash) failed: No such file or directory\\ — outside the scope of this contract test; the brief calls for bash-on-Linux specifically). Locally on Windows: 2 pass (python + pwsh), 1 skipped (bash). On a Linux CI runner with pwsh installed, all three execute. The contract is the same string of bytes in every engine, so identical assertions hold across them — which is the whole point. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ty's TypedDict to dict() conversion picks the wrong overload for our ErrorEnvelope, surfacing 8 spurious diagnostics. Use cast() at the leaf sites where TypedDict envelopes cross into APIs typed as dict[str, Any] (script/agent executor returns; workflow.py store_error and router.evaluate call sites). Brings ty count back to the 12-diagnostic baseline (all pre-existing Windows termios/tty noise). Lint, format, examples-validation, and full test suite (2943 pass, 12 baseline failures) all green. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

codecov-commenter · 2026-05-21T23:46:43Z

Codecov Report

❌ Patch coverage is 86.85832% with 64 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@efa520f). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/conductor/engine/workflow.py	75.47%	26 Missing ⚠️
src/conductor/config/validator.py	83.82%	11 Missing ⚠️
src/conductor/engine/context.py	80.00%	9 Missing ⚠️
src/conductor/executor/script.py	87.87%	8 Missing ⚠️
src/conductor/cli/app.py	81.81%	6 Missing ⚠️
src/conductor/config/schema.py	93.87%	3 Missing ⚠️
src/conductor/engine/router.py	96.42%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #229   +/-   ##
=======================================
  Coverage        ?   88.20%           
=======================================
  Files           ?       65           
  Lines           ?    10115           
  Branches        ?        0           
=======================================
  Hits            ?     8922           
  Misses          ?     1193           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Daniel Green and others added 14 commits May 21, 2026 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Phase 1 of typed on_error routing (#227)#229

feat: Phase 1 of typed on_error routing (#227)#229
PolyphonyRequiem wants to merge 14 commits into
microsoft:mainfrom
PolyphonyRequiem:feature/error-routing

PolyphonyRequiem commented May 21, 2026

Uh oh!

codecov-commenter commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PolyphonyRequiem commented May 21, 2026

What ships in this PR (Phase 1 scope from the RFC)

What''s reserved for Phase 2/3 (out of scope here)

Reserved kinds emitted in Phase 1

How to read this PR

Examples

Test posture

Open Phase 1 micro-decisions (resolved in this PR)

Uh oh!

codecov-commenter commented May 21, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants