Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
6412a5d
feat(gepa): add tool description optimization for multi-agent systems
Ju-usc Oct 10, 2025
cf0be4f
style: fix ruff formatting (trailing whitespace)
Ju-usc Oct 10, 2025
aa53fe2
style: apply ruff formatting fixes
Ju-usc Oct 10, 2025
045c6cf
feat(gepa): implement tool-specific proposer for tool descriptions
Ju-usc Oct 10, 2025
c4f2041
docs(gepa): clean up multi-agent example code
Ju-usc Oct 10, 2025
260ca80
refactor(gepa): simplify tool reflective dataset with ReAct context r…
Ju-usc Oct 11, 2025
04f7e3d
fix(gepa): unify custom proposer routing for tools
Ju-usc Oct 12, 2025
f92e184
docs(gepa): clarify tool reflection prompt
Ju-usc Oct 12, 2025
7178869
test: streamline GEPA tool optimization tests
Ju-usc Oct 12, 2025
e34703b
fix(gepa): streamline tool proposer formatting
Ju-usc Oct 12, 2025
3f05311
test(gepa): drop legacy dummy tool fixture
Ju-usc Oct 12, 2025
4df9ce5
docs(gepa): add tool-specific reflection prompt and metric example
Ju-usc Oct 12, 2025
4296ccf
docs(gepa): fix implementation details with accurate code flow
Ju-usc Oct 13, 2025
ea1204a
docs(gepa): remove backward compatibility note
Ju-usc Oct 13, 2025
48d5cd6
docs(gepa): improve usage examples with optimization visualization
Ju-usc Oct 13, 2025
548d9b6
docs(gepa): add design rationale comments for tool context sharing
Ju-usc Oct 13, 2025
e61d0a1
docs(gepa): add tool optimization links to overview and parameter docs
Ju-usc Oct 13, 2025
5c95412
docs(gepa): refine tool optimization scenarios and remove implementat…
Ju-usc Oct 13, 2025
19d7717
docs(gepa): clarify future work section in code comments
Ju-usc Oct 13, 2025
9ce5fe4
refactor(gepa): unify ReAct optimization as single module
Ju-usc Oct 24, 2025
91331d0
test(gepa): add end-to-end ReAct module optimization test
Ju-usc Oct 24, 2025
3418b59
fix(gepa): enable arg description optimization for ReAct tools
Ju-usc Oct 24, 2025
b26d39a
chore: remove legacy test_gepa_tool_optimization.py
Ju-usc Oct 24, 2025
2791b5c
fix: restore accidentally removed score mismatch warning
Ju-usc Oct 24, 2025
8e63c62
test: update fixture after arg description optimization fix
Ju-usc Oct 25, 2025
7a9d2f3
fix(test): use JSON-based hashing for cross-version fixture stability
Ju-usc Oct 25, 2025
cd0de57
refactor(gepa): rename optimize_tool_descriptions to optimize_react_c…
Ju-usc Oct 26, 2025
67bb739
docs(gepa): improve 'What is optimize_react_components?' section
Ju-usc Oct 26, 2025
b3026a7
docs(gepa): replace outdated tool-specific prompt with actual ReAct o…
Ju-usc Oct 26, 2025
4e107aa
docs(gepa): simplify 'How It Works' section with accurate routing beh…
Ju-usc Oct 26, 2025
78547e7
docs(gepa): remove outdated Implementation Details section
Ju-usc Oct 26, 2025
7fa829b
docs(gepa): replace theoretical scenarios with real user pain points
Ju-usc Oct 26, 2025
da0e7bc
docs(gepa): fix usage examples reference to match updated scenarios
Ju-usc Oct 26, 2025
e51158d
docs(gepa): update inspect section to show all 4 ReAct components wit…
Ju-usc Oct 26, 2025
776ab9b
docs(gepa): rewrite Section 8 with accurate custom proposer behavior …
Ju-usc Oct 26, 2025
ec6bb7b
fix(gepa): fix top-level ReAct module lookup and remove tool name san…
Ju-usc Oct 27, 2025
b6cc67b
refactor(gepa): unify ReAct module key handling and use constant
Ju-usc Oct 28, 2025
1206f38
test(gepa): add ReAct module detection tests for nested structures
Ju-usc Oct 28, 2025
333cbbf
test(gepa): add comprehensive ReAct detection and reconstruction tests
Ju-usc Oct 28, 2025
a50552a
test(gepa): add reflective dataset tests for multi-agent trajectory v…
Ju-usc Oct 28, 2025
965b157
test(gepa): verify tool arg descriptions propagate to args schema
Ju-usc Oct 29, 2025
5ddc6d3
fix(gepa): propagate arg_desc updates to tool.args for prompt rendering
Ju-usc Oct 29, 2025
2269de5
test(gepa): remove fixture-based test and unused dependencies
Ju-usc Oct 29, 2025
17456f0
test(gepa): remove unused fixture file
Ju-usc Oct 29, 2025
c884c18
style: fix ruff linting issues (import formatting, whitespace, bare e…
Ju-usc Oct 31, 2025
82dee25
refactor(test): rename setup_spy_for_base_program to setup_capture_fo…
Ju-usc Oct 31, 2025
ca84b9d
docs(gepa): clarify why Tool.func uses placeholder lambda in proposer
Ju-usc Oct 31, 2025
2eb8986
refactor(gepa): make all ReAct components optional with None default …
Ju-usc Oct 31, 2025
9f37ac1
docs(gepa): clarify 'LM' as 'reflection LM' in comments for precision
Ju-usc Oct 31, 2025
bd4cdac
refactor(gepa): refine reflection prompt to guide concise, focused Re…
Ju-usc Oct 31, 2025
0ad4077
docs(gepa): revise ReAct metric example to be general and extensible
Ju-usc Oct 31, 2025
ef5563e
docs(gepa): replace custom proposer example with reference to ReActMo…
Ju-usc Oct 31, 2025
1b10b65
docs(gepa): make custom proposer section more approachable and clear
Ju-usc Oct 31, 2025
675a0cd
docs(gepa): update ReAct reflection prompt to match current implement…
Ju-usc Nov 1, 2025
4a4d209
feat(gepa): warn when ReAct modules detected but optimization disabled
Ju-usc Nov 3, 2025
d84842f
test(gepa): fix DummyLM configuration and remove exception swallowing
Ju-usc Nov 9, 2025
bb28f5f
test(gepa): add failing tests for generic tool optimization
Ju-usc Nov 9, 2025
a590e46
refactor(gepa): rename optimize_react_components to enable_tool_optim…
Ju-usc Nov 9, 2025
6aceaf5
refactor(gepa): extract nested function to private method
Ju-usc Nov 9, 2025
7a5bf05
feat(gepa): detect tool-using predictors via type checking
Ju-usc Nov 9, 2025
12b01ed
test(gepa): update ReAct tests for predictor-name-based keys
Ju-usc Nov 10, 2025
265896c
test(gepa): use explicit predictor keys in tool optimization tests
Ju-usc Nov 10, 2025
fe19dac
feat(gepa): extract tools from runtime traces
Ju-usc Nov 10, 2025
38dd7cb
feat(gepa): detect tool-using predictors at compile time
Ju-usc Nov 10, 2025
7f05a73
refactor(gepa): use predictor identity for ReAct detection
Ju-usc Nov 10, 2025
0a6016d
test(gepa): refactor ReAct tests to use dynamic predictor names
Ju-usc Nov 10, 2025
a635768
refactor(gepa): generalize proposer to support both ReAct and tool mo…
Ju-usc Nov 10, 2025
e35603a
refactor(gepa): eliminate create-delete pattern in base_program build
Ju-usc Nov 10, 2025
ecb3726
refactor(gepa): eliminate ReAct coupling in build_program
Ju-usc Nov 11, 2025
d3693c9
refactor(gepa): apply code cleanup principles consistently
Ju-usc Nov 11, 2025
a086646
refactor(gepa): unify config extraction patterns
Ju-usc Nov 11, 2025
0cecb75
refactor(gepa): remove verbose logs and consolidate comments
Ju-usc Nov 11, 2025
9592c50
docs(gepa): clarify ReAct trace workaround with TODO
Ju-usc Nov 12, 2025
76d7af5
test(gepa): remove deprecated ReAct-specific tests and refactor tool …
Ju-usc Nov 13, 2025
ac66e05
feat(gepa): add assertion for ReAct two-predictor design
Ju-usc Nov 13, 2025
3ec4ada
test(gepa): add DSPy ReAct design docs and improve test consistency
Ju-usc Nov 13, 2025
b679ba2
fix(test): remove trailing whitespace and extra blank lines
Ju-usc Nov 13, 2025
02aa151
refactor(gepa): clarify tool proposer output field descriptions
Ju-usc Nov 14, 2025
d37e433
Merge branch 'main' into feature/tool-description-optimization
Ju-usc Nov 14, 2025
d8b7c66
refactor(gepa): treat args as canonical for tool arg descriptions
Ju-usc Nov 14, 2025
f62a68e
refactor(gepa): tolerate missing arg descriptions when applying tool …
Ju-usc Nov 14, 2025
e031409
refactor(gepa): use args as sole source of tool arg descriptions
Ju-usc Nov 14, 2025
a133545
test(gepa): drop arg_desc expectations from tool optimization tests
Ju-usc Nov 14, 2025
b1e4f3d
refactor(gepa): refine reflection prompts for tool optimization
Ju-usc Nov 19, 2025
7f81e88
refactor(gepa): improve tool extraction robustness and observability
Ju-usc Nov 19, 2025
f267ccc
refactor(gepa): simplify initialization logic
Ju-usc Nov 19, 2025
28ceb70
refactor(gepa): remove ReAct trace workaround
Ju-usc Nov 19, 2025
d8275ef
chore(gepa): clean up whitespace and style changes from tool optimiza…
Ju-usc Nov 19, 2025
deeb010
chore(gepa): clean up whitespace and style changes from tool optimiza…
Ju-usc Nov 19, 2025
4bcc714
chore: restore .gitignore to match main
Ju-usc Nov 19, 2025
4b872d7
docs(gepa): document tool optimization flag in overview
Ju-usc Nov 19, 2025
5129586
docs(gepa): clarify enable_tool_optimization and custom proposers
Ju-usc Nov 19, 2025
ebe4221
docs(gepa): update tool module optimization prompt to match actual code
Ju-usc Nov 20, 2025
2133b0b
docs(gepa): update How Tool Optimization Works section
Ju-usc Nov 20, 2025
9c05b6a
docs(gepa): update When to Use Tool Optimization section
Ju-usc Nov 20, 2025
ec9241b
docs(gepa): update custom proposers section for tool optimization
Ju-usc Nov 20, 2025
46d8f5e
docs(gepa): update usage examples with correct tool patterns and inte…
Ju-usc Nov 20, 2025
5d33fc6
docs(gepa): remove redundant metrics section
Ju-usc Nov 20, 2025
b564029
refactor(gepa): use absolute import for ToolModuleProposer
Ju-usc Nov 20, 2025
13209f5
docs(gepa): update tool optimization doc link
Ju-usc Nov 20, 2025
09990a6
docs(gepa): replace eval() example with get_weather tool
Ju-usc Nov 29, 2025
33fc771
fix(gepa): change ReAct detection log from warning to info
Ju-usc Dec 2, 2025
fa72fc0
refactor(gepa): extract _propose_component_texts as private method
Ju-usc Dec 2, 2025
2a15e56
refactor(gepa): TODO out generic tool module optimization, keep ReAct…
Ju-usc Dec 2, 2025
59f23e5
refactor(gepa): remove generic tool module detection, keep ReAct only
Ju-usc Dec 2, 2025
68d7021
refactor(gepa): improve naming and extract tool update methods
Ju-usc Dec 2, 2025
d99ba1d
refactor(gepa): remove unused TOOL_MODULE_PREFIX and rename to tool_c…
Ju-usc Dec 2, 2025
3fd9a0a
refactor(gepa): rename ToolModuleProposer to ToolProposer
Ju-usc Dec 2, 2025
7d64e7a
docs(gepa): update tool optimization docs for ReAct-only support
Ju-usc Dec 2, 2025
4b3ee18
refactor(gepa): unify prefix to TOOL_MODULE_PREFIX for all tool-using…
Ju-usc Dec 2, 2025
3a5fb7f
docs(gepa): remove CustomAgent example, keep ReAct only
Ju-usc Dec 2, 2025
0e75d8c
docs(gepa): update enable_tool_optimization docstring for ReAct-only …
Ju-usc Dec 2, 2025
734fbdf
test(gepa): remove generic tool tests, keep ReAct-only tests
Ju-usc Dec 2, 2025
1fb15ba
refactor(gepa): use local ToolProposer variable, update docs for ReAc…
Ju-usc Dec 2, 2025
da2f6d0
docs(gepa): update tool optimization docs for ReAct-only support
Ju-usc Dec 2, 2025
a942246
some fixes
chenmoneygithub Dec 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 143 additions & 0 deletions docs/docs/api/optimizers/GEPA/GEPA_Advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -443,3 +443,146 @@ gepa = dspy.GEPA(
auto="medium"
)
```

## Tool Optimization

### What is enable_tool_optimization?

When `enable_tool_optimization=True`, GEPA jointly optimizes `dspy.ReAct` modules: predictor instructions and tool descriptions and argument descriptions are updated together, instead of being tuned in isolation. This lets the model learn better patterns for when to call a tool and how to use it from the same execution traces and feedback that drive core GEPA.

### Usage and constraints

- **Expose tools as `dspy.Tool` in signatures and examples.** GEPA only optimizes tools that are represented as `dspy.Tool` and actually passed as `dspy.Tool` objects into your modules.
- **Treat `Tool.name` as a stable identifier.** `Tool.name` is the tool's name, and GEPA uses it to attach improved descriptions and argument descriptions. If you reuse the same `Tool.name` for different tools, they will share the same text updates.
- **Avoid custom tools named `"finish"`.** The built-in ReAct `"finish"` tool is reserved and excluded from optimization. Custom tools with the name `"finish"` are also not optimized.
- **Custom instruction proposers handle all modules and tool updates.** When you provide an `instruction_proposer`, GEPA routes every optimized module through your proposer instead of the built-in instruction proposer. If `enable_tool_optimization=True`, modules that call tools are still included, and your proposer is also responsible for updating their tool descriptions and argument descriptions.

### Tool Module Optimization Prompt

GEPA uses `ToolProposer` to optimize ReAct modules when `enable_tool_optimization=True`. For each module, the proposer builds a dynamic signature from the base `GenerateImprovedToolModuleDescriptionsFromFeedback` signature shown below, then appends output fields for each tool description and each tool argument description in that module. For ReAct modules, the proposer also appends input and output fields for the extract instruction.

```python
class GenerateImprovedToolModuleDescriptionsFromFeedback(dspy.Signature):
"""I provided an assistant with predictor instructions and tool descriptions,
but its performance needs improvement based on the examples_with_feedback below.

Your task is to propose better predictor instructions, tool descriptions, and
tool argument descriptions that address the issues shown in these examples.
Focus on reinforcing patterns that clearly improve the assistant's performance
on similar tasks, rather than rewriting everything from scratch unless necessary.
These components are progressively optimized - refine only what needs to change.

Analyze the examples_with_feedback to identify success and failure patterns,
and write improved instructions and descriptions at their appropriate level
of abstraction and/or specificity, so that each layer plays a clear,
complementary role without unnecessary repetition or verbosity unless
redundancy clearly helps the assistant's performance.
"""

current_predictor_instruction = dspy.InputField(
desc="Current instruction guiding the predictor"
)
current_tools = dspy.InputField(
annotation=list[dspy.Tool],
desc="Available tools with their complete schemas"
)
examples_with_feedback = dspy.InputField(
desc="Execution examples with feedback showing successes and failures"
)

improved_predictor_instruction: str | None = dspy.OutputField(
desc="Improved instruction for the predictor",
default=None
)

# GEPA appends output fields dynamically for each tool and argument:
# - improved_tool_{name}_desc with desc="Improved description of tool '{name}'"
# - improved_tool_{name}_arg_{param}_desc with desc="Improved description of the argument '{param}' of tool '{name}'"
# For ReAct modules, GEPA also appends:
# - current_extract_instruction (input) with desc="Current instruction for extraction predictor"
# - improved_extract_instruction (output) with desc="Improved instruction for extraction"
```

The reflection LM uses this dynamically-built signature to jointly propose updates across predictor instructions, tool descriptions, and argument descriptions based on execution feedback. Updates are coordinated rather than made in isolation: the LM sees all current components together and can selectively update any subset by returning new text, or return `None` to keep a component unchanged.

### How Tool Optimization Works

When `enable_tool_optimization=True`, GEPA:

1. **Discovers ReAct modules** - Identifies `dspy.ReAct` modules and their associated tools
2. **Treats them as joint optimization units** - Instead of only optimizing predictor instructions, GEPA optimizes predictor instructions and tool descriptions together as a coordinated set; for ReAct this includes both the react and extract instructions
3. **Routes to specialized proposer** - Separates components by type and routes them appropriately:
- **With custom `instruction_proposer`**: Your custom proposer receives both ReAct modules and plain predictors, and is responsible for updating all components
- **With default proposer**: Plain predictors use the default instruction proposer; ReAct modules use `ToolProposer`, which employs the dynamic signature mechanism described above
4. **Optimizes jointly** - `ToolProposer` improves predictor instructions and tool descriptions together based on execution feedback, coordinating updates across all components rather than tuning them in isolation
5. **Applies updates** - Improved instructions update predictor signatures; improved tool descriptions and argument descriptions update all `dspy.Tool` objects with matching tool names throughout the program

Modules without tools (like `dspy.Predict` or `dspy.ChainOfThought`) continue using standard GEPA instruction-only optimization.

### When to Use Tool Optimization

Enable `enable_tool_optimization=True` when tools are central to your program's behavior and you want GEPA to jointly optimize predictor instructions and tool descriptions together. Common scenarios:

1. **Wrong tool selection** - Predictor with `search` and `weather` tools keeps searching when it should check weather, or vice versa. GEPA refines predictor instructions and tool descriptions to clarify when to use each tool.

2. **Underused tools** - Predictor responds "I don't know" without using available tools that could answer the question. GEPA improves predictor instructions to be more proactive about tool usage.

3. **Tool call loops** - Agent keeps calling `web_search` multiple times with similar queries instead of synthesizing information. GEPA improves instructions to encourage synthesis and tool descriptions to clarify when searches are sufficient.

4. **Extraction failures (ReAct)** - Agent executes tools correctly but fails to extract the final answer from the trajectory. GEPA improves extract instruction to better identify and format answers from tool outputs.

5. **Multi-agent delegation** - Parent agent has delegation tools to specialized sub-agents but doesn't understand when to use each. GEPA optimizes instructions and tool descriptions across both parent and sub-agent modules for coherent delegation.

See the usage example below for tool-using programs.

### Usage Example

```python
import dspy

def search_web(query: str) -> str:
return f"Search results for: {query}"

def get_weather(city: str) -> str:
"""Get the current weather for a city."""
return f"The weather in {city} is sunny and 75°F"

# Create tools with basic descriptions
search_tool = dspy.Tool(search_web, name="search_web", desc="Search tool")
weather_tool = dspy.Tool(get_weather, name="get_weather", desc="Weather tool")

program = dspy.ReAct("question -> answer", tools=[search_tool, weather_tool])

# Enable tool optimization
gepa = dspy.GEPA(
metric=my_metric,
reflection_lm=dspy.LM(model="gpt-5-mini"),
enable_tool_optimization=True,
auto="medium"
)

optimized_program = gepa.compile(program, trainset=train_examples, valset=val_examples)
```

### Inspecting Optimized Programs

View optimization results and metadata (requires `track_stats=True`):

```python
# High-level optimization metadata
optimized_program.detailed_results
```

Access optimized instructions and tool descriptions directly:

```python
# Predictor instructions
for name, predictor in optimized_program.named_predictors():
print(f"{name}: {predictor.signature.instructions}")

# Tool descriptions and argument descriptions
for tool_name, tool in optimized_program.tools.items():
print(f"{tool_name}: {tool.desc}")
for arg_name, arg_schema in tool.args.items():
print(f" {arg_name}: {arg_schema.get('description', 'N/A')}")
```
6 changes: 6 additions & 0 deletions docs/docs/api/optimizers/GEPA/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,12 @@ Practical Recipe for GEPA-Friendly Feedback:
- **Multi-Objective Tasks** (e.g., PUPA): Decompose aggregate scores to reveal contributions from each objective, highlighting tradeoffs (e.g., quality vs. privacy).
- **Stacked Pipelines** (e.g., code generation: parse → compile → run → profile → evaluate): Expose stage-specific failures; natural-language traces often suffice for LLM self-correction.

## Tool Optimization with GEPA

When `enable_tool_optimization=True`, GEPA jointly optimizes `dspy.ReAct` modules with the tools - GEPA updates predictor instructions and tool descriptions/argument descriptions together, based on execution traces and feedback, instead of keeping tool behavior fixed.

For details, examples, and the underlying design (tool discovery, naming requirements, and interaction with custom instruction proposers), see [Tool Optimization](GEPA_Advanced.md#tool-optimization).

## Custom Instruction Proposal

For advanced customization of GEPA's instruction proposal mechanism, including custom instruction proposers and component selectors, see [Advanced Features](GEPA_Advanced.md).
Expand Down
Loading