Skip to content

Publish workflow YAML JSON Schema as CLI verb + release artifact #230

@PolyphonyRequiem

Description

@PolyphonyRequiem

Problem

The conductor workflow YAML contract today lives only inside the Pydantic models in src/conductor/config/schema.py. Downstream consumers (IDEs, CI in repos that author workflow YAMLs, custom linters) have no portable way to validate a YAML file against that contract without running conductor itself. Today the feedback loop on a typo'd field name is:

  1. Edit YAML in editor — no feedback.
  2. Push.
  3. conductor validate runs in CI (or worse, conductor run at workflow launch) and fails with a parse error.

In a repo with ~15 workflow YAMLs and ~50 active authors of nested step bodies (see PolyphonyRequiem/polyphony), this is a meaningful friction tax. Hand-rolled Pester lints in polyphony partially compensate by re-encoding structural rules, but they drift from conductor's own type definitions and can never be authoritative.

Proposal

Expose conductor's existing typed schema as a public JSON Schema artifact:

CLI verb

conductor schema [--component workflow|agent|route|all] [--output json|yaml] [--out <path>]

Defaults to --component workflow --output json → stdout. Body is the output of WorkflowConfig.model_json_schema() (Pydantic gives this for free; I verified locally that it produces a valid schema today).

Release artifact

Publish the generated schema as a GitHub release asset alongside the conductor binary:

https://github.com/microsoft/conductor/releases/download/v<X.Y.Z>/workflow.schema.json

So consumers can reference it via a stable, versioned URL without parsing release notes.

Drift protection

Embed src/conductor/schemas/workflow.schema.json as a regenerated build artifact; a unit test asserts WorkflowConfig.model_json_schema() == file_contents so the runtime types and the published schema can never drift.

Downstream value

  1. IDE feedback. Workflow YAMLs add a header # yaml-language-server: =https://.../workflow.schema.json and get red squiggles in VS Code for unknown fields, wrong types, missing required keys, illegal enum values. Autocomplete on Ctrl+Space. Today these wait until parse-time at workflow launch.

  2. CI step. Repos that author workflow YAMLs can run check-jsonschema --schemafile <url> workflows/*.yaml as one deterministic step. No Python env required, no conductor install required, runs in seconds.

  3. Replaces structural lint clauses in downstream repos. Polyphony today carries ~28 lint files under .conductor/registry/tests/lint-*.ps1 to validate workflow shape; ~30-40% of the clauses inside those files are structural checks (required field, enum value, type) that a schema would absorb. The semantic checks (M4 routing rules, M10 loop safety, vocab) stay because they need runtime reachability analysis that JSON Schema can't express — but the structural baseline becomes authoritative rather than hand-maintained.

Scope estimate

~6-10 hours in this repo:

  • CLI verb (~50 LoC).
  • Generated artifact + drift test (~30 LoC).
  • Release pipeline update to publish the asset.
  • Docs in references/yaml-schema.md pointing to the schema URL.

Non-goals

  • This does not replace conductor validate. The validate verb does semantic checks (route reachability, agent name references, etc.) that aren't in scope for the schema. The schema is the structural baseline; validate stays the semantic layer.
  • This does not propose freezing the YAML contract. The schema is versioned with the conductor release; breaking changes follow whatever versioning policy conductor already has.

Open questions

  1. Is the conductor team comfortable owning a public structural contract for the workflow YAML, or would you rather downstream repos vendor their own derived schema?
  2. Are sub-components (agent, route, parallel/for_each nodes, MCP nodes) worth separate --component flags, or just emit the whole tree from workflow?
  3. Any preference on the artifact path inside the source tree (src/conductor/schemas/ vs schemas/ vs artifacts/)?

Happy to draft the PR if there's appetite — wanted to scope it as an issue first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions