feat: Phase 1 of typed on_error routing (#227)#229
Draft
PolyphonyRequiem wants to merge 14 commits into
Draft
Conversation
…ting
Phase 1, step 1 of on_error routing (per design brief in
docs/projects/error-routing/on-error-routing.brainstorm.md).
Adds two opt-in schema fields and a small shared module for the
constants both schema validation and the engine error path will use:
- src/conductor/error_kinds.py
- KIND_PATTERN: dotted lowercase identifier (at least one dot).
- RESERVED_KIND_PREFIXES: internal., provider., subworkflow., retry.
(the runtime owns these namespaces; workflow authors cannot declare
kinds under them).
- RESERVED_ON_ERROR_ALLOWLIST: the closed set of runtime-synthesized
kinds that ARE legal to match in on_error even though they're not
legal to declare in raises (internal.script_error,
internal.schema_violation, internal.undeclared_kind).
- is_reserved_prefix(kind) helper.
- RouteDef.on_error: bool | str | list[str] | None
- None = success route (existing behavior).
- True = catch-all error route.
- str = single-kind error route.
- list[str] = multi-kind error route.
- False is rejected (no semantic meaning).
- Kind format enforced via KIND_PATTERN.
- 'before'-mode validator so Pydantic's bool/str coercion doesn't
swallow the discriminator.
- AgentDef.raises: list[str] | None
- Optional declaration of kinds the node may raise.
- Powers a load-time lint (cross-checked against routes' on_error in
the validator, landing in a follow-up commit) and a runtime
undeclared-kind check (will land with the engine-wiring commit).
- Reserved prefixes rejected so authors can't claim runtime
namespaces; duplicates rejected; format enforced via KIND_PATTERN.
Tests:
- tests/test_error_kinds.py — 24 cases covering pattern + prefix +
allowlist invariants (allowlist entries must themselves be reserved).
- tests/test_config/test_schema.py::TestRouteDefOnError — 14 cases.
- tests/test_config/test_schema.py::TestAgentDefRaises — 10 cases.
No semantics change for existing workflows: both fields default to None
and the engine doesn't observe them yet (wiring lands in subsequent
commits).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…exceptions
Phase 1, step 2 of on_error routing.
Adds src/conductor/engine/errors.py:
- ErrorEnvelope TypedDict — the internal {kind, message, details}
shape. Strips the on-the-wire conductor_error: true discriminator so
callers don't see it in {{ failing_node.error.* }} templates.
- EnvelopeValidationError — distinct from ValidationError so the
engine can catch and translate malformed envelopes into synthetic
internal.* kinds rather than halting with a generic config error.
- coerce_envelope(raw) — validates on-the-wire input, normalizes
details to {} when absent.
- make_script_error(exit_code, stderr_tail, command) — synthesizes
the internal.script_error envelope.
- make_schema_violation(node_name, source, original_message,
failed_field?) — synthesizes the internal.schema_violation envelope
with rich details for the swallowed-by-catch-all diagnostics case.
- wrap_undeclared_kind(original, declared) — wraps an envelope whose
kind isn't in the node's declared raises list. Preserves the
original kind/message/details under details.original_* so an author
handling internal.undeclared_kind can still recover the intent.
Adds two exceptions to src/conductor/exceptions.py:
- UnhandledNodeError — internal signal raised by the router when an
error envelope reaches no on_error route at the current level. The
engine catches this at the per-node dispatch site and re-raises as
UnhandledWorkflowError. Not intended to surface to end users.
- UnhandledWorkflowError — workflow halted on a typed error envelope.
Carries the envelope and a frame trail (single frame in Phase 1;
Phase 2 will accumulate frames across sub-workflow boundaries).
CLI maps this to a distinct exit code so callers can distinguish
'workflow ran and halted on typed error' from generic failures.
Tests: tests/test_engine/test_errors.py — 18 cases covering envelope
coercion (including discriminator stripping and details normalization),
the three synthetic-kind constructors, and the exception classes
including the empty-frames defensive path.
Nothing yet emits these envelopes or exceptions; the next commits wire
them through the router, executors, and engine dispatch.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Route evaluation now partitions by RouteDef.on_error:
- on_error is None -> success bucket, evaluated when error=None
- on_error is set -> error bucket, evaluated when an envelope is
passed; the route's on_error matcher (True | str | list[str])
must match envelope[kind]
Behavior preserved on the success path: first matching when: wins, no
catch-all raises the existing ValueError. New: error-path exhaustion
raises UnhandledNodeError carrying the envelope so the engine can
translate it into UnhandledWorkflowError at the call site.
Error-route eval context exposes the envelope as `error` for both
Jinja2 ({{ error.kind }}) and simpleeval (kind == 'x.y' via flatten).
Adds 12 tests in TestRouterErrorBucket covering bucket isolation,
all three on_error matcher shapes, when: combined with on_error,
output: transforms, ordering within the bucket, the new
UnhandledNodeError path, and the legacy ValueError path.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds WorkflowContext.store_error(agent_name, envelope) that co-locates
error envelopes with their producing node's slot. The rendered context
shape for a node is now `{node: {output?, error?}}`.
All three context modes surface errors:
- accumulate: each failing node gets `{agent: {error: envelope}}`
- last_only: failing last agent surfaces with `{error: ...}`
- explicit: declarations of the form `agent.error[.field]` copy
the whole envelope into ctx[agent]['error']. Envelopes are bounded
and templates commonly need `error.details.*`, so the runtime
never field-slices them.
Validator updates so existing semantic checks cover the new path:
- INPUT_REF_PATTERN gains an `error_agent`/`error_field` branch
matching `<agent>.error[.field]`.
- _OUTPUT_ATTRS includes the singular `error` so Jinja AST analysis
treats `{{ failing.error.kind }}` as a real output-class ref.
- TemplateRefs gains `agent_error_refs: set[str]` and the AST
walker populates it.
- The per-agent template walk emits the same explicit-mode
`undeclared input` warning for `.error` refs that `.output`
and group `.errors` already get.
- Unknown-agent checks cover the `.error` ref path.
- Parallel-group internal-dependency check rejects intra-group
`.error` refs too.
Checkpoint round-trip via to_dict/from_dict serializes `agent_errors`;
older checkpoints without the key restore as empty (backwards-compat).
Adds 14 tests in TestWorkflowContextStoreError, 5 INPUT_REF_PATTERN
shape tests, and 3 TemplateRefs error-extraction tests. Fixes the
test_empty_context dict-equality fixture to include the new field.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…type Adds two new helpers in conductor.config.validator: - _validate_on_error_routes(agent): hard-errors on_error routes on node types that don't raise envelopes in Phase 1 (human_gate, workflow); validates each kind matches KIND_PATTERN; if agent.raises is declared, every concrete kind in on_error must be in raises or RESERVED_ON_ERROR_ALLOWLIST (catch-all rue always legal). - _validate_group_routes_no_on_error(): rejects on_error routes on parallel and for_each groups (group-level envelopes are Phase 2). Both helpers are wired into validate_workflow_config(). 11 new tests cover plain agent + script (legal), human_gate + workflow + parallel + for_each (rejected), bad kind format, undeclared-kind cross-check, catch-all, reserved allowlist, and no-raises = no constraint. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ions AgentOutput grows an optional `error: dict | None` field that carries an ErrorEnvelope when the agent failed. AgentExecutor.execute() now, after the provider call returns: 1. If the response is a dict with `conductor_error: true`, coerces the envelope (or synthesizes `internal.schema_violation` when the envelope itself is malformed) and attaches it to output.error WITHOUT running validate_output (the declared output schema doesn't apply to error envelopes). 2. Otherwise runs validate_output, and on ValidationError synthesizes an `internal.schema_violation` envelope on output.error instead of raising. Partial outputs (from mid-agent interrupts) bypass both checks. The error module is imported lazily inside execute() to avoid a circular import via conductor.engine.__init__. Updated two existing tests to assert the new envelope contract instead of expecting ValidationError. Added three new tests covering well-formed envelopes, malformed envelopes, and the partial-output bypass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nthesize internal.script_error ScriptOutput grows an optional `error: dict | None` field. Each script run now allocates a tempfile via tempfile.mkstemp(), closes the fd immediately, and exposes the path in the env as CONDUCTOR_ERROR_OUT — set AFTER the agent.env merge so users cannot accidentally redirect or override it. After process.communicate() the executor reads the file: - empty / missing → no envelope - valid JSON envelope → coerce_envelope, attach to output.error - valid JSON but malformed envelope → internal.schema_violation - invalid JSON → internal.schema_violation If no envelope was written AND exit_code != 0 AND the node has opted into error routing (raises declared OR any on_error route present), the executor synthesizes an `internal.script_error` envelope. Legacy workflows that route on exit_code (no opt-in) keep their existing behavior. Temp file is always removed in finally — even on timeout/command-not-found. Added 7 tests covering: no envelope on success, well-formed envelope surfaces, user env cannot override, synthesized internal.script_error on opt-in, legacy non-zero with no opt-in keeps error=None, malformed envelope downgrades to schema_violation, temp file cleanup. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ary, group failure carry envelope Wires the on_error contract through the workflow engine end-to-end: - _evaluate_routes() now accepts an optional error envelope; empty routes plus an unhandled envelope raises UnhandledNodeError instead of silently routing to \. - New _normalize_envelope_for_node() applies undeclared-kind wrapping (raises declared + kind not in raises + not in allowlist → internal.undeclared_kind with original kind preserved under details.original_kind). - New _handle_leaf_error() centralizes the leaf path: normalize, store_error, evaluate error routes, raise UnhandledWorkflowError with a single-leaf frame trail on no match. - Agent call site (~2583) and script call site (~2359) both branch on output.error BEFORE storage and BEFORE schema validation. Script's success path runs full output-schema validation as before; error path skips it. - Sub-workflow call sites catch UnhandledWorkflowError from child engines and re-raise as ExecutionError. Phase 1 invariant: envelopes do not propagate across the sub-workflow boundary. - ParallelAgentError and ForEachError grow an optional envelope field. The parallel/for_each child execution helpers detect output.error, normalize it, raise an ExecutionError tagged with ._envelope, and the existing failure_mode machinery records it. Downstream group consumers can inspect the typed envelope. - Tests: 6 new tests in tests/test_engine/test_error_routing.py covering agent envelope routing, unhandled-envelope halt, undeclared-kind normalization (with the rescue agent reading the original kind from context), success-path regression, script envelope routing, and legacy exit_code routing regression. Full suite (2887 tests) green; no regressions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…on UnhandledWorkflowError Phase 1 Step 9. When the engine raises UnhandledWorkflowError (a leaf node returned an error envelope that no on_error route matched), the new `except UnhandledWorkflowError` arm in `_execute_loop`: * writes a single-line `errors.jsonl` record under \\\/conductor/\\ using the same naming convention as the event log (\\conductor-<workflow>-<ts>-<run_id>.errors.jsonl\\), carrying the envelope, frame trail, and leaf node name; * emits a typed \\workflow_failed\\ event with \\�rror_type='UnhandledWorkflowError'\\ plus the envelope, frames and errors-jsonl path so dashboard/log subscribers can render the typed halt distinctly from a generic ConductorError; * runs the \\on_error\\ lifecycle hook and saves a checkpoint, for parity with the other failure paths; * re-raises so the CLI (Step 10) can map it to its distinct exit code. The new arm is placed *before* the existing \\�xcept ConductorError\\ so the typed halt is caught first. Generic ConductorError handling is unchanged. Adds two integration tests covering the jsonl artefact and the typed event. Also tightens a docstring that was 101 chars (pre-existing from Step 8 — caught now by ruff format check). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…stderr summary Phase 1 Step 10. Both \\ un\\ and \\ esume\\ now catch \\UnhandledWorkflowError\\ specifically before the generic \\�xcept Exception\\, render a typed panel to stderr via the new \\print_unhandled_workflow_error\\ helper, and exit with code 3 so callers (CI, polyphony, shell scripts) can distinguish 'workflow ran to completion and halted on a typed error' from a generic failure (code 1). The summary panel surfaces the leaf node, kind, message, optional \\details\\, and the path to the \\�rrors.jsonl\\ artefact. To make the path reachable from the CLI without holding a reference to the engine, \\_execute_loop\\ now also attaches the path to the exception instance (\\�xc.errors_jsonl_path\\) before re-raising. Adds two CLI tests: * unhandled typed halt exits 3, * generic \\RuntimeError\\ still exits 1 (regression guard). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n/node/dotnet
Phase 1 Step 11. New \\src/conductor/helpers/error/\\ directory with
one-file convenience modules for raising typed Conductor error
envelopes from script-type nodes. All helpers implement the same
contract: read \\CONDUCTOR_ERROR_OUT\\, write the
\\{conductor_error: true, kind, message, details?}\\ JSON envelope to
that path, and return — leaving exit-code management to the caller.
* \\Conductor.Error.psm1\\ — PowerShell, \\Write-ConductorError\\ cmdlet
* \\conductor-error.sh\\ — Bash/sh, sourced \\conductor_error\\ function
* \\conductor_error.py\\ — Python, \\
aise_kind\\ function
* \\conductor-error.mjs\\ — Node, exported \\
aiseError\\
* \\ConductorError.cs\\ — .NET, static \\ConductorError.Raise\\
* \\README.md\\ — quick reference + usage examples per engine
Helpers ship under \\src/conductor/\\ so hatchling rolls them into
the wheel; verified by inspecting the built artefact. Nothing is
auto-loaded, on PATH, or on PYTHONPATH — script authors must
explicitly Import-Module / source / import to use them, and authors
who don't want them write the JSON themselves (it's three lines per
engine).
Adds \\ ests/test_helpers/test_error_helpers.py\\ with 6 cases
covering the Python helper's envelope shape, env-var contract, no-
sys.exit guarantee, and round-trip through \\coerce_envelope\\. The
non-Python helpers are exercised by the cross-engine integration test
landing in Step 13.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 1 Step 12. Two new example workflows under \\�xamples/\\ plus
an Error Routing section in \\�xamples/README.md\\.
\\�rror-routing.yaml\\:
* A \\ ype: script\\ probe that writes \\{conductor_error: true,
kind, message, details}\\ to \\\\\ and exits 0
via the language-neutral contract (no helper required).
* The probe declares \\
aises: [external.git.drift,
external.api.rate_limited]\\ so undeclared kinds get normalised to
\\internal.undeclared_kind\\.
* Routes select by \\on_error: <kind>\\ to demonstrate that an
envelope picks the typed arm over a generic exit_code fallback.
* Handlers read \\{{ probe.error.kind }}\\,
\\{{ probe.error.message }}\\, and \\{{ probe.error.details.* }}\\.
* A \\simulated_failure\\ workflow input toggles between
\\ok\\ / \\drift\\ / \\
ate_limited\\ so the same YAML exercises
all three arms.
\\�rror-routing-helpers.yaml\\:
* Same flow, but raises via the shipped Python helper
(\\conductor.helpers.error.conductor_error.raise_kind\\) instead of
hand-rolled JSON, so authors see the ergonomic version.
Both examples validate with \\conductor validate\\ and the full
\\make validate-examples\\ sweep (17/17 pass). Caught one writing
issue along the way: the input schema field is \\input:\\ (singular),
not \\inputs:\\ — fixed before committing.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 1 Step 13 (acceptance #1). New \\ ests/test_integration/test_error_routing_cross_engine.py\\ exercises the \\CONDUCTOR_ERROR_OUT\\ contract end-to-end through the real \\WorkflowEngine\\ (no executor mocking) with three writer scripts in different languages: Python, PowerShell (\\pwsh\\), and bash. All three share the same workflow YAML shape and the same expected outcome: the probe writes a typed envelope, the engine routes by \\on_error: external.git.drift\\, and a rescue agent reads the kind back from the routed-to scope. Each test is skipped when the corresponding interpreter is missing from PATH (and bash is also skipped on Windows, where \\shutil.which\\ typically returns the WSL relay shim that fails with an opaque \\�xecvpe(/bin/bash) failed: No such file or directory\\ — outside the scope of this contract test; the brief calls for bash-on-Linux specifically). Locally on Windows: 2 pass (python + pwsh), 1 skipped (bash). On a Linux CI runner with pwsh installed, all three execute. The contract is the same string of bytes in every engine, so identical assertions hold across them — which is the whole point. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ty's TypedDict to dict() conversion picks the wrong overload for our ErrorEnvelope, surfacing 8 spurious diagnostics. Use cast() at the leaf sites where TypedDict envelopes cross into APIs typed as dict[str, Any] (script/agent executor returns; workflow.py store_error and router.evaluate call sites). Brings ty count back to the 12-diagnostic baseline (all pre-existing Windows termios/tty noise). Lint, format, examples-validation, and full test suite (2943 pass, 12 baseline failures) all green. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #229 +/- ##
=======================================
Coverage ? 88.20%
=======================================
Files ? 65
Lines ? 10115
Branches ? 0
=======================================
Hits ? 8922
Misses ? 1193
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 1 of the typed
on_errorrouting design from #227 (RFC).Companion PR to #227 (brainstorm spec). Spec excerpts inline below;
full design and open questions live in the RFC.
What ships in this PR (Phase 1 scope from the RFC)
CONDUCTOR_ERROR_OUTpoints at a tempfile the script writes a typed envelope into; engine reads on exit. Works
uniformly for
pwsh,bash,python,node,dotnet(dotnet run --style).
{conductor_error: true, kind, message, details}in agent output is treated as a raise rather than a regular response.
RouteDef.on_error: bool | str | list[str]— routes can match onthe failing node''s typed error kind.
{{ failing_node.error }}template scope — handler nodes get theenvelope of the node that raised, alongside its (possibly partial) output.
halts; engine writes
errors.jsonl(TMPDIR pattern, alongside the eventlog) and emits a typed
workflow_failedevent. CLI maps the exception tonew exit code 3.
AgentDef.raises: list[str]— declared kinds are lintedagainst route
on_errordeclarations at validation time and checked atruntime; undeclared kinds are wrapped as
internal.undeclared_kind.src/conductor/helpers/error/forpwsh / bash / python / node / dotnet (ship in the wheel).
What''s reserved for Phase 2/3 (out of scope here)
retry/halt/propagateroute actions.on_erroron routes fromtype: workflow,human_gate,notification,and parallel/for_each groups is currently a hard validation error
(avoids silent "handler that never fires" footguns). These will become
valid in Phase 2.
Reserved kinds emitted in Phase 1
internal.script_error— script exited non-zero AND wrote no envelope(opt-in: only synthesized when the node has
raisesor anyon_errorroute, so legacy
exit_code-routing workflows are unaffected).internal.schema_violation— agent output failed itsoutput:schema.internal.undeclared_kind— node withraises:raised a kind not inits list; original kind preserved under
details.original_kind.Reserved kind prefixes (validator forbids users declaring these):
internal.,provider.,subworkflow.,retry..How to read this PR
The 14 commits map 1:1 to the implementation steps from the plan and are
designed to be reviewed in order. Each is independently testable and
green:
feat(schema)config/schema.pyfeat(engine)ErrorEnvelope typesengine/errors.py,exceptions.pyfeat(router)success vs. error bucketsengine/router.pyfeat(context)store_error + .error accessengine/context.pyfeat(validator)cross-checkconfig/validator.pyfeat(agent-exec)envelope pathexecutor/agent.pyfeat(script-exec)CONDUCTOR_ERROR_OUTexecutor/script.pyfeat(engine-wire)leaf-error pathengine/workflow.pyfeat(halt-jsonl)errors.jsonl + eventengine/workflow.pyfeat(cli-exit)exit code 3cli/app.pyfeat(helpers)5 language helpershelpers/error/*feat(examples)examples/error-routing*.yamltest(xeng)cross-engine envelope contracttests/test_integration/phase-1(checks)final ty/lint/test sweepExamples
examples/error-routing.yaml— script-based; uses theCONDUCTOR_ERROR_OUTcontract directly with no helper. Workflow inputsimulated_failuretoggles ok / drift / rate_limited.examples/error-routing-helpers.yaml— same shape using the shippedPython helper.
Both validate, both run on Windows and POSIX, both render
{{ failing_node.error.kind }}/.message/.detailsin thehandler''s prompt.
Test posture
non-serializable, registry/integration ×10) are pre-existing and
unchanged.
exit-code tests, 3 cross-engine integration tests (Python + pwsh run
on Windows; bash skipped on Windows by design — WSL relay shim
unreliable in CI envs).
ruff check), format clean (ruff format --check).ty check srcback to the 12-diagnostic baseline (all pre-existingWindows
termios/ttynoise).make validate-examplesequivalent green across all 17 bundledexamples including the two new ones.
Open Phase 1 micro-decisions (resolved in this PR)
halted on unhandled error).
errors.jsonlpath: same$TMPDIR/conductor/convention as the eventlog; printed at end-of-run.
errors.jsonl: single-element today, structured toaccept multi-frame in Phase 2 without a shape change.
{{ workflow.last_error }}not added (RFC says require{{ failing_node.error }}); can revisit in Phase 2.cc @jasonrobertfox — companion to the RFC at #227.
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com