fix(dsp): correctly extract instruction from signature in GEPA optimizer #466

monotykamary · 2025-12-08T19:27:51Z

What kind of change does this PR introduce? Bug fix
What is the current behavior? (You can also link to an open issue here)

The GEPA optimizer accesses sig.instruction which doesn't exist on AxSignature, causing it to always fall back to the default instruction 'Follow the task precisely. Be concise, correct, and consistent.' instead of using the user's signature description or custom instruction.

Fixes #463

What is the new behavior (if this is a feature change)?

The getBaseInstruction() method now correctly:

First checks for custom instruction set via program.getInstruction()
Falls back to signature description via sig.getDescription()
Only uses default fallback if neither is available

Other information:

Changes made:

Added customInstruction field and getInstruction() method to AxPromptTemplate
Added getInstruction() method to AxGen
Fixed getBaseInstruction() in both gepa.ts and gepaFlow.ts
Added unit tests to verify instruction extraction works correctly

Tested with real OpenAI API calls to confirm the fix works end-to-end.

The GEPA optimizer was accessing `sig.instruction` which doesn't exist on AxSignature, causing it to always fall back to the default instruction. Changes: - Add `customInstruction` field and `getInstruction()` to AxPromptTemplate - Add `getInstruction()` method to AxGen - Fix `getBaseInstruction()` in gepa.ts and gepaFlow.ts to: 1. First check for custom instruction via `program.getInstruction()` 2. Fall back to signature description via `sig.getDescription()` 3. Only use default fallback if neither is available - Add tests to verify instruction extraction works correctly Fixes ax-llm#463

MrSpreadsheet · 2025-12-09T09:07:37Z

2. Falls back to signature description via sig.getDescription()

That doesn't seem right. The description is only one field of the signature. The signature.toString() may get the full instruction with inputs and outputs. I'll be able to check the PR behavior later, but I fear that this may get some behaviour towards the correct direction, but not actually be fully correct according to GEPA.

monotykamary · 2025-12-09T10:09:45Z

Thanks for the review! I looked into the original GEPA implementation and DSPy's integration to verify our approach is correct.

How DSPy's GEPA extracts instructions

In DSPy's _build_seed_candidate method (gepa.py#L380):

seed_candidate[name] = pred.signature.instructions

DSPy uses signature.instructions which returns only the instruction text (stored in __doc__), NOT the full signature with field definitions.

How DSPy's GEPA applies evolved instructions

In the DSPy adapter (gepa_utils.py#L103-L106):

def build_program(self, candidate: dict[str, str]):
    new_prog = self.student.deepcopy()
    for name, pred in new_prog.named_predictors():
        if name in candidate:
            pred.signature = pred.signature.with_instructions(candidate[name])
    return new_prog

The with_instructions() method creates a new signature with the same fields but updated instructions only.

DSPy's Signature model

From DSPy's signature implementation (signature.py):

signature.instructions → task description text (stored in __doc__)
signature.fields → input/output field definitions (separate, never evolved)
with_instructions(new_text) → creates new signature with same fields but new instructions

Mapping to ax

DSPy	ax	What it returns
`signature.instructions`	`sig.getDescription()`	Task description only: `"Classify emails by urgency"`
`str(signature)`	`sig.toString()`	Full signature: `"Classify emails..." emailText:string -> priority:class`

Why `sig.toString()` would be incorrect

If we used sig.toString(), GEPA would try to evolve field type definitions like emailText:string -> priority:class "high, normal, low". The LLM might output:

"Classify emails carefully" emailText:string -> priority:class "high..."

This is nonsensical - field type definitions are structural metadata, not semantic text to evolve.

What actually happens with our fix

getBaseInstruction() extracts "Classify emails by urgency level" via sig.getDescription()
GEPA evolves it to "Classify emails carefully. Look for URGENT, CRITICAL keywords for high priority."
setInstruction(evolved) applies only the evolved text

Field definitions remain intact in AxPromptTemplate.buildLegacyPrompt() which renders them separately as:

## Input Fields
- emailText (string): Email content

## Output Fields
- priority (class): high | normal | low

Our implementation correctly mirrors the original DSPy GEPA behavior where only instruction text is evolved, not field schemas.

monotykamary mentioned this pull request Dec 8, 2025

GEPA optimizer broken and produces default prompt as instruction #463

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(dsp): correctly extract instruction from signature in GEPA optimizer #466

fix(dsp): correctly extract instruction from signature in GEPA optimizer #466

Uh oh!

monotykamary commented Dec 8, 2025

Uh oh!

MrSpreadsheet commented Dec 9, 2025

Uh oh!

monotykamary commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(dsp): correctly extract instruction from signature in GEPA optimizer #466

Are you sure you want to change the base?

fix(dsp): correctly extract instruction from signature in GEPA optimizer #466

Uh oh!

Conversation

monotykamary commented Dec 8, 2025

Uh oh!

MrSpreadsheet commented Dec 9, 2025

Uh oh!

monotykamary commented Dec 9, 2025

How DSPy's GEPA extracts instructions

How DSPy's GEPA applies evolved instructions

DSPy's Signature model

Mapping to ax

Why sig.toString() would be incorrect

What actually happens with our fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Why `sig.toString()` would be incorrect