You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: restructure README and update evaluator cookbook for v2 protocol
- README: add How It Works section, move protocol/intake to Reference,
add intake context, polish flow and transitions
- Cookbook: update contract to Protocol v2, add dataset-aware and
composite evaluator recipes, add score-range and task-model sections
CLI stdout returns a JSON summary — see [Result Contract](#result-contract) for the full shape.
39
32
40
-
## Evaluator Contract (Protocol v2)
33
+
## How It Works
41
34
42
-
Evaluator input payload (stdin JSON for command mode, POST JSON for HTTP mode):
35
+
optimize-anything runs a GEPA (Guided Evolutionary Prompt Algorithm) loop: propose → evaluate → reflect, repeating until budget is exhausted or early stopping kicks in.
-`_protocol_version`, `example`, and `task_model` are optional/additive
50
-
- legacy evaluators that only read `candidate` remain compatible
51
-
52
-
Evaluator output payload:
53
-
54
-
```json
55
-
{"score": 0.75, "notes": "optional diagnostics"}
38
+
seed.txt ──► [Propose] ──► candidates
39
+
▲ │
40
+
│ [Evaluate]
41
+
[Reflect] ◄──── scores + diagnostics
56
42
```
57
43
58
-
-`score` is required
59
-
- additional keys are treated as side-info
60
-
61
-
## v2 Runtime Modes
44
+
1.**Propose** — The optimizer generates candidate artifacts from your seed (or from scratch in seedless mode).
45
+
2.**Evaluate** — Each candidate is scored by your evaluator. Three evaluator types are supported: a **command evaluator** (any executable that reads JSON on stdin and writes a score on stdout), an **HTTP evaluator** (a service that accepts POST requests), or a built-in **LLM judge** (no evaluator script required — just pass `--judge-model`).
46
+
3.**Reflect** — Scores and diagnostics feed back into the next proposal round. The loop continues, progressively improving the artifact toward your objective.
62
47
63
-
### Intake schema keys
48
+
The evaluator is the only thing you bring. Everything else — proposal strategy, reflection, early stopping, caching, parallelism — is handled by the optimizer.
-`_protocol_version`, `example`, and `task_model` are optional/additive
207
+
- legacy evaluators that only read `candidate` remain compatible
208
+
209
+
Evaluator output payload:
210
+
211
+
```json
212
+
{"score": 0.75, "notes": "optional diagnostics"}
213
+
```
214
+
215
+
-`score` is required
216
+
- additional keys are treated as side-info
217
+
218
+
### Intake
219
+
220
+
Intake is optional structured guidance you pass to the optimizer to shape how evaluation works. Instead of relying solely on an `--objective` string, intake lets you declare quality dimensions, hard constraints, evaluation patterns, and execution preferences — giving you finer control over what "better" means for your artifact.
221
+
222
+
Use intake when your evaluation criteria are multi-dimensional, when you need to enforce hard constraints, or when you want consistent evaluation behavior across runs.
0 commit comments