Skip to content

fix: enforce run-tests evolution gate#58

Closed
steezkelly wants to merge 1 commit into
NousResearch:mainfrom
steezkelly:fix/33-run-tests-gate
Closed

fix: enforce run-tests evolution gate#58
steezkelly wants to merge 1 commit into
NousResearch:mainfrom
steezkelly:fix/33-run-tests-gate

Conversation

@steezkelly
Copy link
Copy Markdown

Summary

Fixes the #33 C2 audit finding that --run-tests was a no-op:

  • wires EvolutionConfig.run_pytest into the evolution pipeline
  • runs ConstraintValidator.run_test_suite(...) after evolved skill constraints pass and before holdout/output promotion
  • blocks the run when pytest fails and saves output/<skill>/evolved_FAILED_TESTS.md for inspection
  • keeps the gate skipped when run_pytest=False
  • changes the pytest subprocess from bare python to sys.executable so the gate uses the active interpreter instead of depending on a shell python alias
  • adds regression tests for skip/run/fail gate behavior and pytest command construction

Root cause

evolve_skill.evolve(...) stored run_pytest=run_tests in EvolutionConfig, and ConstraintValidator.run_test_suite(...) existed, but nothing called it. The CLI advertised a test gate that never executed.

Opposite-perspective review notes

I specifically checked the skeptical cases:

  • Is the flag actually consumed? Yes: _run_pytest_gate_if_requested(...) calls the validator only when config.run_pytest is true.
  • Does failure block promotion? Yes: a failed test-suite result returns before holdout scoring and timestamped output/metrics creation.
  • Is subprocess injection introduced? No: command is a fixed argv list with no shell.
  • Could sys.executable ever be the wrong interpreter for a separate Hermes repo? Possibly in a future multi-venv setup; this is still more deterministic than bare python, and a configurable pytest command would be a good later enhancement.

Test Plan

  • RED first: focused gate test failed because _run_pytest_gate_if_requested did not exist
  • pytest tests/skills/test_evolve_skill_gates.py tests/core/test_constraints.py::TestRunTestSuite::test_uses_current_python_interpreter -q
  • pytest -q
  • static added-line security scan
  • git diff --check

Result: 141 passed, 11 warnings (DSPy deprecation warnings only).

Partially addresses #33 (C2: dead --run-tests config).

@steezkelly
Copy link
Copy Markdown
Author

Closing this PR in favor of consolidated PR #68. Local integration found real helper-block overlap in evolution/skills/evolve_skill.py across the stack, and #68 preserves local test evidence: targeted stack tests 41 passed; full suite 164 passed; GitHub checks were absent on the split PRs. Review #68 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant