fix: harden phase 1 skill evolution path#75
Conversation
- Store skill text in DSPy signature instructions so GEPA can evolve it - Use DSPy 3.2 GEPA constructor with reflection LM and max_full_evals - Validate full SKILL.md artifacts and honor configurable output_dir - Add API-free golden dataset regression coverage for emitted artifacts
|
Ready for review. Validation completed locally:
This PR is intentionally scoped to hardening the Phase 1 minimal skill-evolution path: DSPy 3.2+ GEPA construction, optimizable skill instructions, full SKILL.md validation, configurable artifact output, and API-free regression coverage for emitted baseline/evolved/metrics artifacts. |
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Code Review
Checked out pr-75, ran 139 tests (all pass), reviewed all 4 changed files. Two issues need to be addressed before merge, two minor suggestions included.
Critical
None
Warnings
-
evolve_skill.py:300— Variable nameoutput_diris reused: the function parameter (line 77,Optional[str]) is consumed at line 90, then shadowed at line 300 with aPathobject of different semantics. This confuses readers and static analysis. Rename torun_dir. -
evolve_skill.py:264— Holdout baseline may be scored with the evolved skill.optimizer.compile(baseline_module, ...)may mutatebaseline_modulein-place. Ifbaseline_module is optimized_module, baseline holdout scores use the evolved skill text, making the improvement delta meaningless. Fix: createevolved_module = SkillModule(evolved_body)and use it for evolved-path scoring instead ofoptimized_module.
Suggestions
-
skill_module.py:8—import reis unused (leftover afterTaskWithSkillwas removed). Delete it. -
skill_module.py:104—self.predictor.predict.signatureis an undocumented DSPy internal. Add a comment:# DSPy 3.x: ChainOfThought stores the inner Predict at .predict.
Looks Good
- Storing skill text in signature instructions (not as an InputField) is the correct GEPA target — sound design.
build_gepa_optimizerproperly usesmax_full_evals+reflection_lmper DSPy 3.2+ API.validate_skill_candidatecorrectly assembles full SKILL.md (frontmatter + body) before constraint checks.output_dirconfigurability is clean; thePath(output_dir)conversion is properly guarded.- Test coverage for all four new behaviors is solid.
Reviewed by Hermes Agent
| # ── 10. Save output ───────────────────────────────────────────────── | ||
| timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") | ||
| output_dir = Path("output") / skill_name / timestamp | ||
| output_dir = config.output_dir / skill_name / timestamp |
There was a problem hiding this comment.
Warning: output_dir here shadows the function parameter of the same name (line 77, Optional[str]). At this point it becomes a Path — different type, different semantics. Rename this local variable to run_dir to avoid confusion.
| @property | ||
| def skill_text(self) -> str: | ||
| """Return the current/evolved skill instructions from the predictor.""" | ||
| return self.predictor.predict.signature.instructions |
There was a problem hiding this comment.
Suggestion: self.predictor.predict.signature accesses an undocumented DSPy internal (ChainOfThought.predict). This works today (all tests pass), but could silently break on a DSPy version bump. Add a comment noting the DSPy 3.x dependency.
Code Review Summary — PR #75:
|
Summary
Test Plan
Notes