fix: reject empty holdout datasets by steezkelly · Pull Request #59 · NousResearch/hermes-agent-self-evolution

steezkelly · 2026-05-09T02:48:19Z

Summary

Partially addresses #33 M2 by failing fast when an evaluation dataset has no holdout examples:

adds _require_non_empty_holdout(...) to reject empty holdout splits before optimization starts
raises an actionable click.ClickException instead of silently producing 0.000/0.000 holdout scores
adds regression tests for both valid holdout and empty-holdout rejection

Root cause

The holdout scoring code uses sum(scores) / max(1, len(scores)). When dataset.holdout is empty, both baseline and evolved score lists are empty, so the report silently shows 0.000 scores rather than warning that no independent evaluation occurred.

Opposite-perspective review notes

I checked the skeptical cases:

Could this block intentional train/val-only runs? Yes, but the current CLI always reports holdout improvement, so allowing a zero-example holdout is more misleading than useful.
Should this happen before or after optimization? Before. If the independent evaluation split is invalid, running GEPA first just wastes model calls and creates a fake-looking report.
Is this only a warning? No. It hard-fails with a ClickException so the operator must fix the dataset/split.

Test Plan

RED first: pytest tests/skills/test_evolve_skill_dataset_gates.py -q failed because _require_non_empty_holdout did not exist
pytest tests/skills/test_evolve_skill_dataset_gates.py -q
pytest -q
static added-line security scan
git diff --check

Result: 141 passed, 11 warnings (DSPy deprecation warnings only).

Partially addresses #33 (M2: empty holdout set silently produces 0.000 scores).

steezkelly · 2026-05-09T03:39:46Z

Closing this PR in favor of consolidated PR #68. Local integration found real helper-block overlap in evolution/skills/evolve_skill.py across the stack, and #68 preserves local test evidence: targeted stack tests 41 passed; full suite 164 passed; GitHub checks were absent on the split PRs. Review #68 instead.

fix: reject empty holdout datasets

98b7835

steezkelly mentioned this pull request May 9, 2026

feat: consolidate evolution ingestion and safety gates #68

Closed

steezkelly closed this May 9, 2026

This was referenced May 9, 2026

Implement all-agent session ingestion and promotion gates #54

Open

feat: consolidate evolution gates stack (#55-#61) #70

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reject empty holdout datasets#59

fix: reject empty holdout datasets#59
steezkelly wants to merge 1 commit into
NousResearch:mainfrom
steezkelly:fix/33-empty-holdout-gate

steezkelly commented May 9, 2026

Uh oh!

steezkelly commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

steezkelly commented May 9, 2026

Summary

Root cause

Opposite-perspective review notes

Test Plan

Uh oh!

steezkelly commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant