Summary
Add an SSVC domain skill (evaluate-decision-point) and reusable library tooling that enables an LLM agent to evaluate a single SSVC decision point given caller-assembled evidence, producing a validated Selection object as structured output.
Motivation
SSVC decision point evaluation is not a free-form prose task — it produces structured semantic objects (Selection, SelectionList) that must conform to the existing Pydantic schema in src/ssvc/selection.py. An LLM-based evaluator needs a typed validation boundary between probabilistic inference and deterministic orchestration. Without this, evaluation results are unverified prose that cannot be consumed programmatically.
Core idea
The key abstraction is:
- Evidence in (caller-assembled: text, structured JSON, CVE records, advisories, links, etc.)
- Validated
Selection out (conforming to ssvc.selection.Selection)
The evaluator should live in src/ssvc/ as a reusable library component. An SSVC domain skill (skills/ssvc/evaluate-decision-point/) would invoke the library component and report results. The skill and library should be designed so the caller is responsible for assembling and formatting input evidence; the library provides structure and output validation, not input parsing.
The initial implementation is expected to use pydantic-ai as the agent framework (it provides schema validation, typed outputs, retry semantics, and parsing enforcement for structured LLM outputs), but the design should separate design intent from implementation choice.
Scope
- One decision point evaluated per invocation
- Caller-assembled evidence (no mandatory input format)
- Library provides flexible structure for callers to build on
- Skill orchestrates the library call and reports back a
Selection
- Failure modes (invalid schema emission, hallucinated fields, unsupported values, malformed selections) should be recognized and handled via retry semantics — but the exact mechanisms are deferred to implementation
Out of scope (for now)
- Multi-decision-point batch evaluation (full
SelectionList in one pass)
- Schema extensions to
Selection/SelectionList for rationale, uncertainty, or provenance (TBD — may need a wrapper or extension; unclear)
- Model provider selection/configuration
- Input parsing (caller is responsible for assembling evidence)
Dependencies
Blocked by #1151 (skills infrastructure: two-tier directory structure and canonical SKILL.md format must exist before this skill can be added properly).
Notes
An earlier exploratory sketch (docs/exploratory/llm_evaluator.md on feature/pydantic.ai) describes the high-level flow; that branch is defunct but the flowchart captures intent. The existing Selection / SelectionList Pydantic models (src/ssvc/selection.py) are already the natural output type for this evaluator.
Summary
Add an SSVC domain skill (
evaluate-decision-point) and reusable library tooling that enables an LLM agent to evaluate a single SSVC decision point given caller-assembled evidence, producing a validatedSelectionobject as structured output.Motivation
SSVC decision point evaluation is not a free-form prose task — it produces structured semantic objects (
Selection,SelectionList) that must conform to the existing Pydantic schema insrc/ssvc/selection.py. An LLM-based evaluator needs a typed validation boundary between probabilistic inference and deterministic orchestration. Without this, evaluation results are unverified prose that cannot be consumed programmatically.Core idea
The key abstraction is:
Selectionout (conforming tossvc.selection.Selection)The evaluator should live in
src/ssvc/as a reusable library component. An SSVC domain skill (skills/ssvc/evaluate-decision-point/) would invoke the library component and report results. The skill and library should be designed so the caller is responsible for assembling and formatting input evidence; the library provides structure and output validation, not input parsing.The initial implementation is expected to use
pydantic-aias the agent framework (it provides schema validation, typed outputs, retry semantics, and parsing enforcement for structured LLM outputs), but the design should separate design intent from implementation choice.Scope
SelectionOut of scope (for now)
SelectionListin one pass)Selection/SelectionListfor rationale, uncertainty, or provenance (TBD — may need a wrapper or extension; unclear)Dependencies
Blocked by #1151 (skills infrastructure: two-tier directory structure and canonical SKILL.md format must exist before this skill can be added properly).
Notes
An earlier exploratory sketch (
docs/exploratory/llm_evaluator.mdonfeature/pydantic.ai) describes the high-level flow; that branch is defunct but the flowchart captures intent. The existingSelection/SelectionListPydantic models (src/ssvc/selection.py) are already the natural output type for this evaluator.