A configurable, open-source self-optimization loop for AI agents. AutoPrompt autonomously tests an AI agent, scores responses against a rubric, and iteratively improves the agent's configuration (system prompt, memory strategy, tool config).
AutoPrompt runs a meta-optimization loop: it generates test cases, sends them to your agent, scores the responses against a rubric using an LLM-as-judge, identifies weak areas, and mutates your agent's config files to improve them. Only mutations that improve the baseline score are kept — all others are rolled back.
generate baseline tests → score → identify weak dims → mutate config
↑ |
└──────────── keep if improved, else rollback ─────────┘
- Config-Driven: Set up the target agent, rubric, max iterations, budget, and API URLs via a single
autoprompt.yaml. - Pluggable Agent Adapters: Communicate over HTTP, directly to Python functions, or via CLI subprocess.
- LLM-as-a-Judge: Custom per-dimension scoring with configurable weights.
- Budget Limits: Tracks estimated LLM costs and stops when the threshold is hit.
- Multi-Turn Support: Test agents across multi-turn conversations.
- Reporting: Stores data in SQLite, Supabase, or JSON lines. Includes leaderboard, diff, and report commands.
pip install -e .
# for development and tests
pip install -e .[dev]Copy env example and fill out:
cp .env.example .envThe fastest way to try AutoPrompt is with the multi-turn example — it runs entirely offline, no API key required for the agent itself:
autoprompt run examples/multi_turn/autoprompt.yaml --dry-run
autoprompt run examples/multi_turn/autoprompt.yamlFor a real LLM agent (requires OPENROUTER_API_KEY):
autoprompt run examples/simple/autoprompt.yamlAutoPrompt connects to your agent via one of three built-in adapters:
| Adapter | When to use | Config field |
|---|---|---|
python_callable |
Your agent is a Python function in the same repo | import_path: "my_module:my_function" |
http |
Your agent runs as an HTTP service | endpoint: "http://localhost:8000/chat" |
cli |
Your agent is a CLI tool that reads from stdin | command: "python my_agent.py" |
You need a custom adapter only when the built-in ones don't fit — for example, if your HTTP endpoint requires JWT authentication or a non-standard request format. Custom adapters subclass AgentAdapter from autoprompt/adapters/base.py and implement two methods: send() and health_check().
For python_callable, AutoPrompt calls your function with either:
handle_message(message: str, context: dict)— single-turnchat(messages: list[dict], context: dict)— multi-turn
Return {"content": "..."} or just a plain str.
For http, AutoPrompt POSTs {"message": "..."} (or {"messages": [...]} for multi-turn) and expects {"content": "..."} in the response.
agent:
adapter: "python_callable" # or "http" or "cli"
name: "My Agent"
import_path: "my_agent.core:handle_message"
optimizable:
- type: system_prompt
path: "system.md" # file AutoPrompt will mutate
rubric:
path: "rubric.md"
scoring_model: "deepseek/deepseek-v3.2"
score_range: [1.0, 10.0]
dimensions:
- name: "helpfulness"
weight: 0.6
- name: "accuracy"
weight: 0.4
tests:
mode: "mix" # "static", "dynamic", or "mix"
static_suite: "tests.yaml"
generator_model: "deepseek/deepseek-v3.2"
tests_per_iteration: 6
loop:
max_iterations: 10
budget_limit_usd: 2.00
mutation_model: "deepseek/deepseek-v3.2"
improvement_threshold: 0.05
logging:
backend: "sqlite" # or "supabase" or "jsonl"All LLM calls go through OpenRouter. Set OPENROUTER_API_KEY in .env.
autoprompt leaderboard autoprompt.yaml --top 5
autoprompt report autoprompt.yaml
autoprompt diff autoprompt.yaml| Example | Description |
|---|---|
examples/multi_turn/ |
Complete multi-turn agent — runs fully offline, no API key needed for the agent |
examples/simple/ |
Minimal single-turn LLM agent using python_callable |
Run the test suite:
pytestProject contribution and governance docs: