Skip to content

fix: correct GEPA arg, constraint input, and rate limit handling in evolve_skill#35

Open
MwC-Trexx wants to merge 1 commit into
NousResearch:mainfrom
MwC-Trexx:fix/evolve-skill-gepa-constraints-ratelimit
Open

fix: correct GEPA arg, constraint input, and rate limit handling in evolve_skill#35
MwC-Trexx wants to merge 1 commit into
NousResearch:mainfrom
MwC-Trexx:fix/evolve-skill-gepa-constraints-ratelimit

Conversation

@MwC-Trexx
Copy link
Copy Markdown

Summary

Three bugs in evolution/skills/evolve_skill.py that together cause every optimization run to (a) never actually use GEPA, (b) always reject the evolved skill as invalid, and (c) accumulate 0.0-scored trials from rate limit errors.

Bug 1 — GEPA called with wrong keyword argument

dspy.GEPA(metric=..., max_steps=N) raises TypeError: GEPA.__init__() got an unexpected keyword argument 'max_steps' on every run in DSPy 3.1.3. The optimizer silently falls back to MIPROv2. The correct parameter is max_metric_calls.

# Before
optimizer = dspy.GEPA(metric=skill_fitness_metric, max_steps=iterations)
# After
optimizer = dspy.GEPA(metric=skill_fitness_metric, max_metric_calls=iterations)

Bug 2 — Constraint validator receives body-only text

_check_skill_structure checks that text starts with --- and contains name:/description:. Both call sites passed skill["body"] (the markdown after frontmatter is stripped), which never has frontmatter. The constraint therefore always fails and rejects every evolved skill — even valid ones — writing to evolved_FAILED.md instead of deploying.

Fix: pass skill["raw"] for the baseline check and evolved_full (the reassembled skill with frontmatter, already computed at line 185) for the evolved check.

# Before
validator.validate_all(skill["body"], "skill")
validator.validate_all(evolved_body, "skill", baseline_text=skill["body"])
# After
validator.validate_all(skill["raw"], "skill")
validator.validate_all(evolved_full, "skill", baseline_text=skill["raw"])

Bug 3 — Concurrent calls saturate OpenAI free-tier rate limits

DSPy's parallelizer fires all evaluation calls simultaneously. On a free-tier OpenAI account (3 RPM / 60 K TPM for gpt-4.1-mini), the majority of calls in each trial fail with RateLimitError. DSPy records those trials as 0.0, poisoning MIPROv2's Bayesian optimizer (trials 4, 5, and 6 all scored 0.0 in the observed run despite the underlying prompt being reasonable).

Fix: add num_retries=8 to the LM so litellm retries transient limit hits, and num_threads=1 to MIPROv2 to serialize evaluation and eliminate concurrent burst pressure.

# Before
lm = dspy.LM(eval_model)
optimizer = dspy.MIPROv2(metric=skill_fitness_metric, auto="light")
# After
lm = dspy.LM(eval_model, num_retries=8)
optimizer = dspy.MIPROv2(metric=skill_fitness_metric, auto="light", num_threads=1)

Test plan

  • python -m evolution.skills.evolve_skill --skill github-code-review --dry-run — setup validates without errors
  • python -m evolution.skills.evolve_skill --skill github-code-review --iterations 3 --eval-source synthetic — baseline constraint check shows all including skill_structure; evolved skill saves to output/<skill>/<timestamp>/ instead of evolved_FAILED.md; no 0.0-scored MIPROv2 trials from rate limit errors
  • With a valid DSPy GEPA install, the GEPA fallback message no longer appears

🤖 Generated with Claude Code

…volve_skill

Three bugs fixed in evolution/skills/evolve_skill.py:

1. GEPA wrong keyword argument
   `max_steps` does not exist in DSPy 3.1.3 GEPA. Replace with
   `max_metric_calls`, which is the correct parameter name. Previously
   caused TypeError on every run, silently falling back to MIPROv2.

2. Constraint validator received body-only text
   `_check_skill_structure` looks for YAML frontmatter (`---`, `name:`,
   `description:`). Both call sites passed `skill["body"]` (text after
   frontmatter is stripped), so the check always failed. Fix: pass
   `skill["raw"]` for the baseline check and `evolved_full` (the
   reassembled skill with frontmatter) for the evolved check.

3. OpenAI free-tier rate limits corrupted MIPROv2 trials
   DSPy's parallelizer fires all evaluation calls concurrently, hitting
   the 3 RPM / 60 K TPM free-tier cap. Failed requests score 0.0,
   poisoning the Bayesian optimizer (trials 4-6 all scored 0.0 in the
   observed run). Fix: add `num_retries=8` to the LM so litellm retries
   transient limit errors, and `num_threads=1` to MIPROv2 to serialize
   evaluation calls.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
steezkelly added a commit to steezkelly/hermes-agent-self-evolution that referenced this pull request Apr 25, 2026
…sResearch#24, NousResearch#26, NousResearch#35)

- PR NousResearch#24: skill_module.py stores skill body as InputField → signature.instructions
  - _load_skill_body() splits frontmatter from body, body becomes instruction
  - _extract_evolved_instructions() extracts from signature.instructions (not wrapper)
  - constraint_validator.py: body/frontmatter separation — validate body has substance
  - dataset_builder.py: robust JSON parsing with 6 fallback strategies

- PR NousResearch#26: GEPA wiring fix — reflection_lm passed to GEPA

- PR NousResearch#35: constraint validator for GEPA args, max_metric_calls not mixed with auto

Note: GEPA still falls back to MIPROv2 due to DSPy 3.2.0 API — max_metric_calls
conflicts with auto='light'. Use max_metric_calls alone (fixed).
steezkelly added a commit to steezkelly/hermes-agent-self-evolution that referenced this pull request Apr 25, 2026
…traint validator, JSON parsing robustness

Combined patch applying upstream PRs NousResearch#24/NousResearch#26/NousResearch#35:
- skill_module.py: embed skill body in signature instructions via HTML sentinel
- evolve_skill.py: HTML sentinel extraction with fallback, GEPA max_metric_calls fix, improved messaging
- constraints.py: validate YAML frontmatter + substantive body content separately
- dataset_builder.py: 6-strategy JSON parser for LLM output resilience
- sentinel collision: replaced \n\n---\n\n (appears in skill bodies) with <!-- ___SKILL_EVOLUTION_SENTINEL___ -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant