fix: guard baseline deltas by eval input fingerprint by Osraka · Pull Request #1 · NousResearch/hermes-compression-eval

Osraka · 2026-05-17T07:36:28Z

Summary

record SHA-256 fingerprints for each fixture and probe bank in per-run outputs
refuse to summarize mixed-input runs and skip baseline deltas when fixture/probe contents no longer match
document the comparison guardrail and cover both changed-input and legacy-baseline cases with tests

Why

--compare-to currently keys only by fixture name. If a fixture or probe bank is edited between the baseline and current run, the report can show score deltas for unlike eval targets. That makes a prompt change look better or worse for reasons unrelated to the compressor itself. Fingerprinting the inputs keeps those comparisons honest while preserving the current report flow for compatible runs.

Test plan

python3 -m pytest tests/ -q

fix: guard compression eval baselines by input fingerprint

4c284e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: guard baseline deltas by eval input fingerprint#1

fix: guard baseline deltas by eval input fingerprint#1
Osraka wants to merge 1 commit into
NousResearch:mainfrom
Osraka:fix/guard-baseline-inputs

Osraka commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Osraka commented May 17, 2026

Summary

Why

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant