Skip to content

Add LLM-based decision point evaluator skill and supporting library tooling #1152

@ahouseholder

Description

@ahouseholder

Summary

Add an SSVC domain skill (evaluate-decision-point) and reusable library tooling that enables an LLM agent to evaluate a single SSVC decision point given caller-assembled evidence, producing a validated Selection object as structured output.

Motivation

SSVC decision point evaluation is not a free-form prose task — it produces structured semantic objects (Selection, SelectionList) that must conform to the existing Pydantic schema in src/ssvc/selection.py. An LLM-based evaluator needs a typed validation boundary between probabilistic inference and deterministic orchestration. Without this, evaluation results are unverified prose that cannot be consumed programmatically.

Core idea

The key abstraction is:

  • Evidence in (caller-assembled: text, structured JSON, CVE records, advisories, links, etc.)
  • Validated Selection out (conforming to ssvc.selection.Selection)

The evaluator should live in src/ssvc/ as a reusable library component. An SSVC domain skill (skills/ssvc/evaluate-decision-point/) would invoke the library component and report results. The skill and library should be designed so the caller is responsible for assembling and formatting input evidence; the library provides structure and output validation, not input parsing.

The initial implementation is expected to use pydantic-ai as the agent framework (it provides schema validation, typed outputs, retry semantics, and parsing enforcement for structured LLM outputs), but the design should separate design intent from implementation choice.

Scope

  • One decision point evaluated per invocation
  • Caller-assembled evidence (no mandatory input format)
  • Library provides flexible structure for callers to build on
  • Skill orchestrates the library call and reports back a Selection
  • Failure modes (invalid schema emission, hallucinated fields, unsupported values, malformed selections) should be recognized and handled via retry semantics — but the exact mechanisms are deferred to implementation

Out of scope (for now)

  • Multi-decision-point batch evaluation (full SelectionList in one pass)
  • Schema extensions to Selection/SelectionList for rationale, uncertainty, or provenance (TBD — may need a wrapper or extension; unclear)
  • Model provider selection/configuration
  • Input parsing (caller is responsible for assembling evidence)

Dependencies

Blocked by #1151 (skills infrastructure: two-tier directory structure and canonical SKILL.md format must exist before this skill can be added properly).

Notes

An earlier exploratory sketch (docs/exploratory/llm_evaluator.md on feature/pydantic.ai) describes the high-level flow; that branch is defunct but the flowchart captures intent. The existing Selection / SelectionList Pydantic models (src/ssvc/selection.py) are already the natural output type for this evaluator.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Idea.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions