fix: correct GEPA arg, constraint input, and rate limit handling in evolve_skill#35
Open
MwC-Trexx wants to merge 1 commit into
Open
Conversation
…volve_skill Three bugs fixed in evolution/skills/evolve_skill.py: 1. GEPA wrong keyword argument `max_steps` does not exist in DSPy 3.1.3 GEPA. Replace with `max_metric_calls`, which is the correct parameter name. Previously caused TypeError on every run, silently falling back to MIPROv2. 2. Constraint validator received body-only text `_check_skill_structure` looks for YAML frontmatter (`---`, `name:`, `description:`). Both call sites passed `skill["body"]` (text after frontmatter is stripped), so the check always failed. Fix: pass `skill["raw"]` for the baseline check and `evolved_full` (the reassembled skill with frontmatter) for the evolved check. 3. OpenAI free-tier rate limits corrupted MIPROv2 trials DSPy's parallelizer fires all evaluation calls concurrently, hitting the 3 RPM / 60 K TPM free-tier cap. Failed requests score 0.0, poisoning the Bayesian optimizer (trials 4-6 all scored 0.0 in the observed run). Fix: add `num_retries=8` to the LM so litellm retries transient limit errors, and `num_threads=1` to MIPROv2 to serialize evaluation calls. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
steezkelly
added a commit
to steezkelly/hermes-agent-self-evolution
that referenced
this pull request
Apr 25, 2026
…sResearch#24, NousResearch#26, NousResearch#35) - PR NousResearch#24: skill_module.py stores skill body as InputField → signature.instructions - _load_skill_body() splits frontmatter from body, body becomes instruction - _extract_evolved_instructions() extracts from signature.instructions (not wrapper) - constraint_validator.py: body/frontmatter separation — validate body has substance - dataset_builder.py: robust JSON parsing with 6 fallback strategies - PR NousResearch#26: GEPA wiring fix — reflection_lm passed to GEPA - PR NousResearch#35: constraint validator for GEPA args, max_metric_calls not mixed with auto Note: GEPA still falls back to MIPROv2 due to DSPy 3.2.0 API — max_metric_calls conflicts with auto='light'. Use max_metric_calls alone (fixed).
steezkelly
added a commit
to steezkelly/hermes-agent-self-evolution
that referenced
this pull request
Apr 25, 2026
…traint validator, JSON parsing robustness Combined patch applying upstream PRs NousResearch#24/NousResearch#26/NousResearch#35: - skill_module.py: embed skill body in signature instructions via HTML sentinel - evolve_skill.py: HTML sentinel extraction with fallback, GEPA max_metric_calls fix, improved messaging - constraints.py: validate YAML frontmatter + substantive body content separately - dataset_builder.py: 6-strategy JSON parser for LLM output resilience - sentinel collision: replaced \n\n---\n\n (appears in skill bodies) with <!-- ___SKILL_EVOLUTION_SENTINEL___ -->
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three bugs in
evolution/skills/evolve_skill.pythat together cause every optimization run to (a) never actually use GEPA, (b) always reject the evolved skill as invalid, and (c) accumulate 0.0-scored trials from rate limit errors.Bug 1 — GEPA called with wrong keyword argument
dspy.GEPA(metric=..., max_steps=N)raisesTypeError: GEPA.__init__() got an unexpected keyword argument 'max_steps'on every run in DSPy 3.1.3. The optimizer silently falls back to MIPROv2. The correct parameter ismax_metric_calls.Bug 2 — Constraint validator receives body-only text
_check_skill_structurechecks that text starts with---and containsname:/description:. Both call sites passedskill["body"](the markdown after frontmatter is stripped), which never has frontmatter. The constraint therefore always fails and rejects every evolved skill — even valid ones — writing toevolved_FAILED.mdinstead of deploying.Fix: pass
skill["raw"]for the baseline check andevolved_full(the reassembled skill with frontmatter, already computed at line 185) for the evolved check.Bug 3 — Concurrent calls saturate OpenAI free-tier rate limits
DSPy's parallelizer fires all evaluation calls simultaneously. On a free-tier OpenAI account (3 RPM / 60 K TPM for
gpt-4.1-mini), the majority of calls in each trial fail withRateLimitError. DSPy records those trials as 0.0, poisoning MIPROv2's Bayesian optimizer (trials 4, 5, and 6 all scored 0.0 in the observed run despite the underlying prompt being reasonable).Fix: add
num_retries=8to the LM so litellm retries transient limit hits, andnum_threads=1to MIPROv2 to serialize evaluation and eliminate concurrent burst pressure.Test plan
python -m evolution.skills.evolve_skill --skill github-code-review --dry-run— setup validates without errorspython -m evolution.skills.evolve_skill --skill github-code-review --iterations 3 --eval-source synthetic— baseline constraint check shows all✓includingskill_structure; evolved skill saves tooutput/<skill>/<timestamp>/instead ofevolved_FAILED.md; no 0.0-scored MIPROv2 trials from rate limit errors🤖 Generated with Claude Code