WALKTHROUGH

Step-by-step v2 workflow for optimize-anything.

Prerequisites

Python 3.10+
uv
API key(s) for any LLM providers you plan to use

Step 1: Install

git clone https://github.com/ASRagab/optimize-anything.git
cd optimize-anything
uv sync

Step 2: Create a seed artifact

cat > seed.txt << 'EOF'
You are a support assistant. Help users quickly.
EOF

Step 3: Generate an evaluator (default is judge/Python)

generate-evaluator now defaults to --evaluator-type judge (Python template), not bash command mode.

uv run optimize-anything generate-evaluator \
  seed.txt \
  --objective "Score clarity, actionability, and specificity" \
  > evaluators/eval.py

If you want a bash scaffold explicitly:

uv run optimize-anything generate-evaluator \
  seed.txt \
  --objective "Score clarity, actionability, and specificity" \
  --evaluator-type command \
  > evaluators/eval.sh
chmod +x evaluators/eval.sh

Step 4: Test evaluator contract

echo '{"_protocol_version":2,"candidate":"test"}' | python evaluators/eval.py
# expected: JSON with required "score"

Step 5: Baseline score

uv run optimize-anything score seed.txt \
  --judge-model openai/gpt-4o-mini \
  --objective "Score clarity and constraints"

Step 6: Optimize

uv run optimize-anything optimize seed.txt \
  --judge-model openai/gpt-4o-mini \
  --objective "Improve clarity and specificity" \
  --model openai/gpt-4o-mini \
  --budget 40 \
  --cache --run-dir runs --diff

Notes:

--early-stop can be set manually.
It is auto-enabled when budget > 30.

Step 7: Validate optimized output (single judge)

uv run optimize-anything score runs/run-<TIMESTAMP>/best_artifact.txt \
  --judge-model anthropic/claude-sonnet-4-5 \
  --objective "Score clarity, constraints, and usefulness"

Step 8: Validate with multiple providers (new `validate` flow)

uv run optimize-anything validate runs/run-<TIMESTAMP>/best_artifact.txt \
  --providers openai/gpt-4o-mini anthropic/claude-sonnet-4-5 google/gemini-2.0-flash \
  --objective "Score clarity, constraints, and robustness"

Step 9: Iterate (dataset/generalization optional)

For multi-task optimization:

uv run optimize-anything optimize seed.txt \
  --judge-model openai/gpt-4o-mini \
  --objective "Generalize across support scenarios" \
  --dataset data/train.jsonl --valset data/val.jsonl \
  --model openai/gpt-4o-mini \
  --budget 120 --cache --run-dir runs

Step 9.5: Speed up with parallel workers

For expensive evaluators, add parallel execution:

uv run optimize-anything optimize seed.txt \
  --evaluator-command bash evaluators/eval.sh \
  --objective "Improve quality" \
  --model openai/gpt-4o-mini \
  --budget 100 \
  --parallel --workers 8

Step 10: Accept or rerun

If improved, copy best artifact back into source. Otherwise adjust objective/intake and rerun.

Common issues

Problem	Fix
Missing API key when using judge evaluator	Set provider API key env vars (e.g., `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`)
`--no-seed` failed	`--no-seed` requires both `--objective` and `--model`
Evaluator preflight failed	Ensure evaluator prints valid JSON with `score`; logs go to stderr
`--valset` rejected	`--valset` requires `--dataset`
Cache reuse failed	`--cache-from` requires `--cache` and `--run-dir`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WALKTHROUGH

Prerequisites

Step 1: Install

Step 2: Create a seed artifact

Step 3: Generate an evaluator (default is judge/Python)

Step 4: Test evaluator contract

Step 5: Baseline score

Step 6: Optimize

Step 7: Validate optimized output (single judge)

Step 8: Validate with multiple providers (new `validate` flow)

Step 9: Iterate (dataset/generalization optional)

Step 9.5: Speed up with parallel workers

Step 10: Accept or rerun

Common issues

FilesExpand file tree

WALKTHROUGH.md

Latest commit

History

WALKTHROUGH.md

File metadata and controls

WALKTHROUGH

Prerequisites

Step 1: Install

Step 2: Create a seed artifact

Step 3: Generate an evaluator (default is judge/Python)

Step 4: Test evaluator contract

Step 5: Baseline score

Step 6: Optimize

Step 7: Validate optimized output (single judge)

Step 8: Validate with multiple providers (new validate flow)

Step 9: Iterate (dataset/generalization optional)

Step 9.5: Speed up with parallel workers

Step 10: Accept or rerun

Common issues

Step 8: Validate with multiple providers (new `validate` flow)