fix: enforce run-tests evolution gate by steezkelly · Pull Request #58 · NousResearch/hermes-agent-self-evolution

steezkelly · 2026-05-09T02:46:19Z

Summary

Fixes the #33 C2 audit finding that --run-tests was a no-op:

wires EvolutionConfig.run_pytest into the evolution pipeline
runs ConstraintValidator.run_test_suite(...) after evolved skill constraints pass and before holdout/output promotion
blocks the run when pytest fails and saves output/<skill>/evolved_FAILED_TESTS.md for inspection
keeps the gate skipped when run_pytest=False
changes the pytest subprocess from bare python to sys.executable so the gate uses the active interpreter instead of depending on a shell python alias
adds regression tests for skip/run/fail gate behavior and pytest command construction

Root cause

evolve_skill.evolve(...) stored run_pytest=run_tests in EvolutionConfig, and ConstraintValidator.run_test_suite(...) existed, but nothing called it. The CLI advertised a test gate that never executed.

Opposite-perspective review notes

I specifically checked the skeptical cases:

Is the flag actually consumed? Yes: _run_pytest_gate_if_requested(...) calls the validator only when config.run_pytest is true.
Does failure block promotion? Yes: a failed test-suite result returns before holdout scoring and timestamped output/metrics creation.
Is subprocess injection introduced? No: command is a fixed argv list with no shell.
Could sys.executable ever be the wrong interpreter for a separate Hermes repo? Possibly in a future multi-venv setup; this is still more deterministic than bare python, and a configurable pytest command would be a good later enhancement.

Test Plan

RED first: focused gate test failed because _run_pytest_gate_if_requested did not exist
pytest tests/skills/test_evolve_skill_gates.py tests/core/test_constraints.py::TestRunTestSuite::test_uses_current_python_interpreter -q
pytest -q
static added-line security scan
git diff --check

Result: 141 passed, 11 warnings (DSPy deprecation warnings only).

Partially addresses #33 (C2: dead --run-tests config).

steezkelly · 2026-05-09T03:39:44Z

Closing this PR in favor of consolidated PR #68. Local integration found real helper-block overlap in evolution/skills/evolve_skill.py across the stack, and #68 preserves local test evidence: targeted stack tests 41 passed; full suite 164 passed; GitHub checks were absent on the split PRs. Review #68 instead.

fix: enforce run-tests evolution gate

e35598c

steezkelly mentioned this pull request May 9, 2026

feat: consolidate evolution ingestion and safety gates #68

Closed

steezkelly closed this May 9, 2026

This was referenced May 9, 2026

Implement all-agent session ingestion and promotion gates #54

Open

feat: consolidate evolution gates stack (#55-#61) #70

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: enforce run-tests evolution gate#58

fix: enforce run-tests evolution gate#58
steezkelly wants to merge 1 commit into
NousResearch:mainfrom
steezkelly:fix/33-run-tests-gate

steezkelly commented May 9, 2026

Uh oh!

steezkelly commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

steezkelly commented May 9, 2026

Summary

Root cause

Opposite-perspective review notes

Test Plan

Uh oh!

steezkelly commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant