Skip to content

Agent Skill: Single Decision-Point SSVC Evaluator (Elimination-Based Selection Generation) #1156

@ahouseholder

Description

@ahouseholder

Note

This issue is intentionally less prescriptive than #1152. Implementers have latitude in how they approach the problem. #1152 exists as a technical reference with additional design hints (e.g., specific libraries, library placement) but should not be treated as a requirements document for this task.

Problem Context

We already have a structured SSVC domain model in Python (Pydantic-based), including DecisionPoint, DecisionPointValue, and Selection. What is missing is a reusable evaluation skill that can take a single decision point plus arbitrary evidence and produce a valid Selection in a consistent, repeatable way.

This is the first vertical slice toward agent-assisted SSVC classification. It intentionally does not include full SSVC tree evaluation or multi-decision-point orchestration.


Objective

Implement a reusable "decision-point evaluation skill" that enables an agent (or tool-using LLM) to evaluate exactly one SSVC DecisionPoint against a bounded evidence set and produce a Selection.

The evaluation model is explicitly elimination-based (via negativa):

  • Start from the full set of allowed DecisionPoint.values
  • Use provided evidence to eliminate values that are contradicted or unsupported
  • Return the remaining viable values as the Selection
  • If no values can be eliminated with confidence, return the full set with an explanation that evidence is insufficient to disambiguate

This is not a "best answer selection" problem. It is a constraint reduction problem over a closed value set.


Inputs

  1. DecisionPoint (src/ssvc/decision_points/base.py)

    • Fully defined Pydantic object
    • Provides: name, description, version, namespace, key, and a tuple of valid DecisionPointValue objects
    • Each DecisionPointValue has a name, key, and description — these descriptions are the primary semantic source for reasoning about which values apply
  2. DecisionPoint documentation (optional, supplemental)

    • Additional context beyond what is embedded in the Pydantic object: extended definitions, worked examples, usage notes
    • May be absent; the embedded DecisionPointValue.description fields are the minimum required semantics
  3. Evidence bundle

    • Arbitrary user-provided text
    • May include: incident reports, vulnerability descriptions, email threads, documentation excerpts, unstructured notes
    • No requirement for preprocessing or normalization upstream

Core Behavior

The evaluator must:

  1. Treat the DecisionPoint.values as the complete hypothesis space

  2. Analyze evidence to determine which values are:

    • contradicted — directly ruled out by evidence → eliminate
    • weakly supported — mentioned or partially consistent with evidence but not conclusively confirmed → remain possible
    • not addressed — evidence says nothing about the value → remain possible
  3. Produce a reduced candidate set of values

    • Note: eliminating all values would be an error — if evidence would eliminate all candidates, treat this as insufficient evidence and return the full set
  4. If ambiguity remains, explicitly surface:

    • why multiple values remain viable
    • what evidence would disambiguate them
    • (in interactive mode) optionally ask the user for that disambiguating information
  5. If no evidence is sufficient to eliminate any values:

    • return all values as viable
    • explicitly indicate insufficiency of evidence

Output Contract

Return a valid Selection object (src/ssvc/selection.py):

  • Must reference the original DecisionPoint (via matching namespace, key, and version)
  • values must be a non-empty list of MinimalDecisionPointValue objects (key only) drawn from the original DecisionPoint.values
  • Must pass Pydantic validation — the JSON schema is already generated and available at data/schema/v2/SelectionList_2_0_0.schema.json and can be regenerated via make regenerate_json

Note on SelectionList: Selection is the appropriate output for a single decision-point evaluation. If the calling context needs a timestamped, multi-selection record, wrapping in a SelectionList is the caller's responsibility — it is out of scope here.

"Valid output" means schema-conformant: a Selection that passes Pydantic validation. LLM inference variability is acceptable; outputs that fail schema validation should be retried or corrected before being returned.


Interaction Modes

The skill should support two execution modes:

1. Pipeline mode
  • Fully automated
  • Returns Selection only
  • No external interaction; evidence provided is all there is
2. Interactive mode
  • May request additional evidence if necessary
  • May ask clarifying questions when evidence is insufficient to distinguish between remaining values
  • Must still converge to a valid Selection — if the user cannot or does not provide disambiguating information, fall back to returning all remaining viable values

What is an "Agent Skill"?

A skill in this project is a SKILL.md file (plus any supporting code or resources) placed under .agents/skills/. A stub already exists at .agents/skills/ssvc/evaluate-decision-point/SKILL.md. This issue is to implement that stub.

See .agents/skills/README.md for the skill format. A first working version can be purely a well-structured SKILL.md with supporting Python tooling; sophistication can be added iteratively.


Key Design Constraints

  • The evaluator is generic across all SSVC decision points
  • No per-decision-point custom logic should be required
  • DecisionPoint definition (name, description, value names/descriptions) is the primary source of semantics
  • Evidence is treated as unstructured input with no schema guarantees
  • Output must always be schema-valid (see Output Contract above)

Non-Goals (Explicit Out of Scope)

  • Full SSVC decision tree evaluation
  • Multi-decision-point orchestration or batching
  • External retrieval / search / indexing systems
  • Designing new SSVC ontology or modifying existing decision points
  • Building a general-purpose autonomous agent framework

Implementation Notes (Guidance, Not Requirements)

Implementers are expected to decide how to structure:

  • prompt construction strategy (if using LLMs)
  • evidence selection/filtering heuristics
  • intermediate reasoning representations (if any)
  • orchestration between Python and LLM components

However, the following are required:

  • reuse existing Pydantic models for all structured IO
  • enforce schema validation on outputs
  • ensure the evaluator operates purely within the constraints of the provided DecisionPoint

Suggestions:

  • A first version could consist solely of a well-written SKILL.md and a small Python helper that loads/validates objects — get the full workflow working before refining quality of each step
  • See Add LLM-based decision point evaluator skill and supporting library tooling #1152 for additional design hints (library placement, framework options, retry semantics) — treat as optional reference, not requirements

Success Criteria

A minimal successful implementation:

  • Accepts any valid DecisionPoint
  • Accepts arbitrary evidence text
  • Produces a schema-valid Selection
  • Correctly reduces candidate values when evidence supports elimination
  • Preserves all values when evidence is insufficient
  • Can run in both pipeline and interactive modes

Future Extension Path (Context Only)

This work is expected to become the first component in a larger system that:

  • composes multiple decision-point evaluations into full SSVC evaluation trees
  • exposes the evaluator as a CLI tool
  • eventually wraps as a service and/or agent "skill" interface

That future scope is explicitly not part of this task.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions