fix: correct GEPA arg, constraint input, and rate limit handling in evolve_skill by MwC-Trexx · Pull Request #35 · NousResearch/hermes-agent-self-evolution

MwC-Trexx · 2026-04-23T12:40:55Z

Summary

Three bugs in evolution/skills/evolve_skill.py that together cause every optimization run to (a) never actually use GEPA, (b) always reject the evolved skill as invalid, and (c) accumulate 0.0-scored trials from rate limit errors.

Bug 1 — GEPA called with wrong keyword argument

dspy.GEPA(metric=..., max_steps=N) raises TypeError: GEPA.__init__() got an unexpected keyword argument 'max_steps' on every run in DSPy 3.1.3. The optimizer silently falls back to MIPROv2. The correct parameter is max_metric_calls.

# Before
optimizer = dspy.GEPA(metric=skill_fitness_metric, max_steps=iterations)
# After
optimizer = dspy.GEPA(metric=skill_fitness_metric, max_metric_calls=iterations)

Bug 2 — Constraint validator receives body-only text

_check_skill_structure checks that text starts with --- and contains name:/description:. Both call sites passed skill["body"] (the markdown after frontmatter is stripped), which never has frontmatter. The constraint therefore always fails and rejects every evolved skill — even valid ones — writing to evolved_FAILED.md instead of deploying.

Fix: pass skill["raw"] for the baseline check and evolved_full (the reassembled skill with frontmatter, already computed at line 185) for the evolved check.

# Before
validator.validate_all(skill["body"], "skill")
validator.validate_all(evolved_body, "skill", baseline_text=skill["body"])
# After
validator.validate_all(skill["raw"], "skill")
validator.validate_all(evolved_full, "skill", baseline_text=skill["raw"])

Bug 3 — Concurrent calls saturate OpenAI free-tier rate limits

DSPy's parallelizer fires all evaluation calls simultaneously. On a free-tier OpenAI account (3 RPM / 60 K TPM for gpt-4.1-mini), the majority of calls in each trial fail with RateLimitError. DSPy records those trials as 0.0, poisoning MIPROv2's Bayesian optimizer (trials 4, 5, and 6 all scored 0.0 in the observed run despite the underlying prompt being reasonable).

Fix: add num_retries=8 to the LM so litellm retries transient limit hits, and num_threads=1 to MIPROv2 to serialize evaluation and eliminate concurrent burst pressure.

# Before
lm = dspy.LM(eval_model)
optimizer = dspy.MIPROv2(metric=skill_fitness_metric, auto="light")
# After
lm = dspy.LM(eval_model, num_retries=8)
optimizer = dspy.MIPROv2(metric=skill_fitness_metric, auto="light", num_threads=1)

Test plan

python -m evolution.skills.evolve_skill --skill github-code-review --dry-run — setup validates without errors
python -m evolution.skills.evolve_skill --skill github-code-review --iterations 3 --eval-source synthetic — baseline constraint check shows all ✓ including skill_structure; evolved skill saves to output/<skill>/<timestamp>/ instead of evolved_FAILED.md; no 0.0-scored MIPROv2 trials from rate limit errors
With a valid DSPy GEPA install, the GEPA fallback message no longer appears

🤖 Generated with Claude Code

…volve_skill Three bugs fixed in evolution/skills/evolve_skill.py: 1. GEPA wrong keyword argument `max_steps` does not exist in DSPy 3.1.3 GEPA. Replace with `max_metric_calls`, which is the correct parameter name. Previously caused TypeError on every run, silently falling back to MIPROv2. 2. Constraint validator received body-only text `_check_skill_structure` looks for YAML frontmatter (`---`, `name:`, `description:`). Both call sites passed `skill["body"]` (text after frontmatter is stripped), so the check always failed. Fix: pass `skill["raw"]` for the baseline check and `evolved_full` (the reassembled skill with frontmatter) for the evolved check. 3. OpenAI free-tier rate limits corrupted MIPROv2 trials DSPy's parallelizer fires all evaluation calls concurrently, hitting the 3 RPM / 60 K TPM free-tier cap. Failed requests score 0.0, poisoning the Bayesian optimizer (trials 4-6 all scored 0.0 in the observed run). Fix: add `num_retries=8` to the LM so litellm retries transient limit errors, and `num_threads=1` to MIPROv2 to serialize evaluation calls. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…sResearch#24, NousResearch#26, NousResearch#35) - PR NousResearch#24: skill_module.py stores skill body as InputField → signature.instructions - _load_skill_body() splits frontmatter from body, body becomes instruction - _extract_evolved_instructions() extracts from signature.instructions (not wrapper) - constraint_validator.py: body/frontmatter separation — validate body has substance - dataset_builder.py: robust JSON parsing with 6 fallback strategies - PR NousResearch#26: GEPA wiring fix — reflection_lm passed to GEPA - PR NousResearch#35: constraint validator for GEPA args, max_metric_calls not mixed with auto Note: GEPA still falls back to MIPROv2 due to DSPy 3.2.0 API — max_metric_calls conflicts with auto='light'. Use max_metric_calls alone (fixed).

…traint validator, JSON parsing robustness Combined patch applying upstream PRs NousResearch#24/NousResearch#26/NousResearch#35: - skill_module.py: embed skill body in signature instructions via HTML sentinel - evolve_skill.py: HTML sentinel extraction with fallback, GEPA max_metric_calls fix, improved messaging - constraints.py: validate YAML frontmatter + substantive body content separately - dataset_builder.py: 6-strategy JSON parser for LLM output resilience - sentinel collision: replaced \n\n---\n\n (appears in skill bodies) with

steezkelly mentioned this pull request Apr 25, 2026

fix: ghost-improvement extraction bug + GEPA API + constraint validator + JSON robustness #39

Closed

seilk mentioned this pull request Apr 26, 2026

fix: install missing optuna dep and add LLM request timeout #41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct GEPA arg, constraint input, and rate limit handling in evolve_skill#35

fix: correct GEPA arg, constraint input, and rate limit handling in evolve_skill#35
MwC-Trexx wants to merge 1 commit into
NousResearch:mainfrom
MwC-Trexx:fix/evolve-skill-gepa-constraints-ratelimit

MwC-Trexx commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MwC-Trexx commented Apr 23, 2026

Summary

Bug 1 — GEPA called with wrong keyword argument

Bug 2 — Constraint validator receives body-only text

Bug 3 — Concurrent calls saturate OpenAI free-tier rate limits

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant