Summary
Add an optional validator: block to any provider-backed agent that runs a second LLM call to check the primary output against a user-defined rubric. If validation fails, re-run the primary agent once, with the validator's feedback appended to the prompt.
Motivation
We already have two correctness mechanisms, but neither catches the most common real failure mode:
retry: — handles transient failures (provider errors, timeouts). It retries the same prompt unchanged.
output: schema — handles shape failures (missing field, wrong type). Pass/fail on structure, nothing about whether the content is good.
The missing case: the agent returns a structurally valid output that is semantically wrong, incomplete, or off-rubric. Examples:
- Code-review agent returns "looks good!" but missed an obvious null-pointer bug
- Summarizer drops the most important point
- Plan agent skips a required step
- Research agent hallucinates a source
Today the only fix is to make the prompt longer / sterner, which doesn't actually solve the underlying problem.
Proposed shape
agents:
- name: code_reviewer
model: claude-sonnet-4-5
prompt: |
Review the diff for bugs and suggest fixes.
Diff:
{{ workflow.input.diff }}
output:
summary: { type: string }
issues: { type: array }
validator:
model: claude-sonnet-4-5 # can be a cheaper/different model
criteria: |
Verify the review:
1. Identifies all null-safety issues in the diff
2. Each `issues[*].suggestion` is actionable (not just "fix it")
3. No fabricated function names
max_retries: 1 # default 1; only this loop counts, not retry: count
Mechanics
- Primary agent runs as usual, produces output.
- If
validator: is present, run a second LLM call. The validator receives:
- The primary agent's prompt (rendered)
- The primary agent's output
- The
criteria: text
- A required JSON output shape:
{ passed: bool, issues: list[str] }
- If
passed: true, the primary output flows downstream unchanged.
- If
passed: false and max_retries > 0, re-run the primary agent once, appending a ## Validation feedback section to the prompt with the issues. Take the second output as final (whether or not it would pass a second validator check — we don't loop forever).
- Emit
agent_validator_start / agent_validator_complete events for dashboard visibility.
Why now
- This is the single highest-leverage correctness pattern we don't support.
- Composes cleanly with everything we already have. Distinct from
retry: (transient) and output: (shape).
- The implementation is a small wrapper around the existing
AgentExecutor: validator is just another agent execution whose output triggers conditional re-execution of the primary.
- Works across providers (Copilot + Claude) via the existing
AgentProvider interface.
Open questions
- Should the validator receive prior context (other agents' outputs), or only the primary's prompt + output? Default to just the latter — keeps validation focused and cheap.
- Should retries be capped at 1, or allow
max_retries: 2? Strong preference for hard cap at 1 — beyond that you're fighting prompt design, not output noise.
- Cost reporting: validator tokens should be counted as part of the primary agent's row in cost output, not a separate row. Or a separate row? Probably separate so users see the validator cost explicitly.
- Should validator failures appear in the JSONL event log as a distinct event type? Yes —
agent_validation_failed so the dashboard can surface them.
Acceptance criteria
Summary
Add an optional
validator:block to any provider-backed agent that runs a second LLM call to check the primary output against a user-defined rubric. If validation fails, re-run the primary agent once, with the validator's feedback appended to the prompt.Motivation
We already have two correctness mechanisms, but neither catches the most common real failure mode:
retry:— handles transient failures (provider errors, timeouts). It retries the same prompt unchanged.output:schema — handles shape failures (missing field, wrong type). Pass/fail on structure, nothing about whether the content is good.The missing case: the agent returns a structurally valid output that is semantically wrong, incomplete, or off-rubric. Examples:
Today the only fix is to make the prompt longer / sterner, which doesn't actually solve the underlying problem.
Proposed shape
Mechanics
validator:is present, run a second LLM call. The validator receives:criteria:text{ passed: bool, issues: list[str] }passed: true, the primary output flows downstream unchanged.passed: falseandmax_retries > 0, re-run the primary agent once, appending a## Validation feedbacksection to the prompt with the issues. Take the second output as final (whether or not it would pass a second validator check — we don't loop forever).agent_validator_start/agent_validator_completeevents for dashboard visibility.Why now
retry:(transient) andoutput:(shape).AgentExecutor: validator is just another agent execution whose output triggers conditional re-execution of the primary.AgentProviderinterface.Open questions
max_retries: 2? Strong preference for hard cap at 1 — beyond that you're fighting prompt design, not output noise.agent_validation_failedso the dashboard can surface them.Acceptance criteria
validator:block accepted onAgentDef(provider-backed agents only)validator:onscript/human_gate/workflowtypesexamples/(e.g. code review with hallucination check)retry:policy