fix: harden phase 1 skill evolution path by ohkyuetaek · Pull Request #75 · NousResearch/hermes-agent-self-evolution

ohkyuetaek · 2026-05-13T23:22:22Z

Summary

Store skill text in DSPy signature instructions so GEPA can actually optimize the skill body
Update the GEPA construction path for DSPy 3.2+ with reflection_lm and max_full_evals
Validate full SKILL.md artifacts, including YAML frontmatter, before accepting candidates
Add configurable output_dir support and API-free golden dataset regression coverage for emitted artifacts

Test Plan

.venv/bin/python -m pytest -q
Static added-line security scan: no hardcoded secrets, shell=True/os.system, eval/exec, pickle.loads, or obvious SQL format injection matches
Independent pre-commit reviewer passed; one non-blocking baseline_body suggestion was applied

Notes

Direct push to NousResearch/hermes-agent-self-evolution was denied for this account, so this PR is opened from the ohkyuetaek fork.

- Store skill text in DSPy signature instructions so GEPA can evolve it - Use DSPy 3.2 GEPA constructor with reflection LM and max_full_evals - Validate full SKILL.md artifacts and honor configurable output_dir - Add API-free golden dataset regression coverage for emitted artifacts

ohkyuetaek · 2026-05-13T23:27:49Z

Ready for review.

Validation completed locally:

.venv/bin/python -m pytest -q → 145 passed, 11 DSPy deprecation warnings
Added-line static security scan found no hardcoded secrets, shell=True/os.system, eval/exec, pickle.loads, or obvious SQL string-format injection patterns
Independent pre-commit review passed; one non-blocking suggestion around baseline_body handling was applied and the full suite was re-run

This PR is intentionally scoped to hardening the Phase 1 minimal skill-evolution path: DSPy 3.2+ GEPA construction, optimizable skill instructions, full SKILL.md validation, configurable artifact output, and API-free regression coverage for emitted baseline/evolved/metrics artifacts.

jarrettj

Hermes Agent Code Review

Checked out pr-75, ran 139 tests (all pass), reviewed all 4 changed files. Two issues need to be addressed before merge, two minor suggestions included.

Critical

None

Warnings

evolve_skill.py:300 — Variable name output_dir is reused: the function parameter (line 77, Optional[str]) is consumed at line 90, then shadowed at line 300 with a Path object of different semantics. This confuses readers and static analysis. Rename to run_dir.
evolve_skill.py:264 — Holdout baseline may be scored with the evolved skill. optimizer.compile(baseline_module, ...) may mutate baseline_module in-place. If baseline_module is optimized_module, baseline holdout scores use the evolved skill text, making the improvement delta meaningless. Fix: create evolved_module = SkillModule(evolved_body) and use it for evolved-path scoring instead of optimized_module.

Suggestions

skill_module.py:8 — import re is unused (leftover after TaskWithSkill was removed). Delete it.
skill_module.py:104 — self.predictor.predict.signature is an undocumented DSPy internal. Add a comment: # DSPy 3.x: ChainOfThought stores the inner Predict at .predict.

Looks Good

Storing skill text in signature instructions (not as an InputField) is the correct GEPA target — sound design.
build_gepa_optimizer properly uses max_full_evals + reflection_lm per DSPy 3.2+ API.
validate_skill_candidate correctly assembles full SKILL.md (frontmatter + body) before constraint checks.
output_dir configurability is clean; the Path(output_dir) conversion is properly guarded.
Test coverage for all four new behaviors is solid.

Reviewed by Hermes Agent

jarrettj · 2026-05-16T19:03:03Z

    # ── 10. Save output ─────────────────────────────────────────────────
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
-    output_dir = Path("output") / skill_name / timestamp
+    output_dir = config.output_dir / skill_name / timestamp


Warning: output_dir here shadows the function parameter of the same name (line 77, Optional[str]). At this point it becomes a Path — different type, different semantics. Rename this local variable to run_dir to avoid confusion.

jarrettj · 2026-05-16T19:03:03Z

+    @property
+    def skill_text(self) -> str:
+        """Return the current/evolved skill instructions from the predictor."""
+        return self.predictor.predict.signature.instructions


Suggestion: self.predictor.predict.signature accesses an undocumented DSPy internal (ChainOfThought.predict). This works today (all tests pass), but could silently break on a DSPy version bump. Add a comment noting the DSPy 3.x dependency.

jarrettj · 2026-05-16T19:03:19Z

Code Review Summary — PR #75: `fix: harden phase 1 skill evolution path`

Verdict: Request Changes (0 critical, 2 warnings, 2 suggestions)

Scope: 4 files · +235 / -30 lines · 139 tests pass

⚠️ Warnings

evolution/skills/evolve_skill.py:300 — Variable name output_dir is reused in the same function scope. The parameter output_dir: Optional[str] (line 77) is consumed at line 90, then reassigned at line 300 as Path with different semantics. Rename the local to run_dir to eliminate the shadow.
evolution/skills/evolve_skill.py:264 — Holdout baseline may be inadvertently scored with the evolved skill. optimizer.compile(baseline_module, ...) may mutate baseline_module in-place and return it. If so, baseline_module is optimized_module and the comparison is invalid. Fix: after extracting evolved_body, create evolved_module = SkillModule(evolved_body) and score the evolved path from that instead of from optimized_module. (The existing test doesn't catch this because FakeSkillModule always returns identical output.)

💡 Suggestions

evolution/skills/skill_module.py:8 — import re is a dead import — leftover after TaskWithSkill was removed. Delete it.
evolution/skills/skill_module.py:104 — self.predictor.predict.signature accesses an undocumented DSPy internal. Works now, but fragile across DSPy versions. Add a brief comment: # DSPy 3.x: ChainOfThought stores the inner Predict at .predict.

✅ Looks Good

Storing skill text in signature instructions (not as an InputField) is the right GEPA target — this is the core correctness fix.
build_gepa_optimizer correctly uses max_full_evals + reflection_lm (DSPy 3.2+ API).
validate_skill_candidate correctly reassembles full SKILL.md before validation — the old body-only check was the bug this PR set out to fix.
output_dir configurability is clean and properly guarded with Path(output_dir).
Test file covers all four new behaviors (signature placement, property wiring, GEPA constructor shape, end-to-end artifact emission) without making live API calls.

Reviewed by Hermes Agent | 139 tests passed locally

ohkyuetaek mentioned this pull request May 13, 2026

feat: add optional live skill evolution smoke #76

Draft

jarrettj suggested changes May 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: harden phase 1 skill evolution path#75

fix: harden phase 1 skill evolution path#75
ohkyuetaek wants to merge 1 commit into
NousResearch:mainfrom
ohkyuetaek:fix/phase1-minimal-e2e

ohkyuetaek commented May 13, 2026

Uh oh!

ohkyuetaek commented May 13, 2026

Uh oh!

jarrettj left a comment

Uh oh!

jarrettj May 16, 2026

Uh oh!

jarrettj May 16, 2026

Uh oh!

jarrettj commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ohkyuetaek commented May 13, 2026

Summary

Test Plan

Notes

Uh oh!

ohkyuetaek commented May 13, 2026

Uh oh!

jarrettj left a comment

Choose a reason for hiding this comment

Hermes Agent Code Review

Critical

Warnings

Suggestions

Looks Good

Uh oh!

jarrettj May 16, 2026

Choose a reason for hiding this comment

Uh oh!

jarrettj May 16, 2026

Choose a reason for hiding this comment

Uh oh!

jarrettj commented May 16, 2026

Code Review Summary — PR #75: fix: harden phase 1 skill evolution path

⚠️ Warnings

💡 Suggestions

✅ Looks Good

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Code Review Summary — PR #75: `fix: harden phase 1 skill evolution path`