This is a general-purpose autonomous agent loop. The agent iterates on a single mutable file, runs it via a harness, measures a metric, and uses git to keep or discard changes.
To set up a new experiment, work with the user to:
- Agree on a run tag: propose a tag based on today's date (e.g.
mar5). The branchautoloop/<tag>must not already exist — this is a fresh run. - Create the branch:
git checkout -b autoloop/<tag>from current master. - Read the in-scope files:
README.md— repository context and overview.runner.py— the harness. Read-only. Understand what it measures and how.- The mutable target file (listed below under Mutable file).
- Initialize results.tsv: Create
results.tsvwith just the header row (do not git-add it). - Confirm and go: Confirm setup looks good, then begin the loop.
Mutable file: <FILL IN — e.g. main.py or main.c>
Primary metric: metric: line emitted by runner.py
Metric direction: <FILL IN — lower is better / higher is better>
Runner command: uv run python runner.py > run.log 2>&1
You CAN:
- Modify only the designated mutable file listed above.
- Restructure, rewrite, or completely replace it — as long as it still runs.
You CANNOT:
- Modify
runner.pyor any harness file. - Install new packages outside the declared dependencies in
pyproject.toml. - Modify reference outputs or correctness checks.
Simplicity criterion: All else being equal, simpler is better. A tiny improvement that adds ugly complexity is not worth it. Removing code and achieving equal-or-better metric is a win.
LOOP FOREVER:
- Check git state: what branch/commit are you on?
- Modify the mutable file with your experimental idea.
git commit(short message describing the idea).uv run python runner.py > run.log 2>&1grep "^metric:\|^status:" run.log- If grep output is empty OR
status: crash→tail -n 50 run.logto read the error. Attempt a fix or skip this idea. - Record results in
results.tsv(see format below). Do NOT git-add results.tsv. - If metric improved (per direction above) → KEEP: stay at current HEAD.
- If metric is equal or worse →
git reset --hard HEAD~1to discard. - Repeat. Never pause to ask.
Crashes: If a run crashes, use judgment — fix obvious bugs and retry. For fundamentally broken ideas, log crash in the tsv and move on.
Timeouts: If a run exceeds 2× the expected runtime, kill it, treat as crash.
NEVER STOP: Once the loop begins, do NOT ask the human if you should continue. You are fully autonomous. Keep going until manually interrupted.
After every run, runner.py prints a block like:
---
metric: <value>
status: ok | fail | crash
wall_s: <seconds>
notes: <optional text>
Extract results with:
grep "^metric:\|^status:" run.log
Log every experiment to results.tsv (tab-separated, NOT comma-separated).
Header and columns:
commit metric status description
- Short git commit hash (7 chars)
- Metric value achieved (use
0for crashes) - Status:
keep,discard, orcrash - Short description of what this experiment tried
Example:
commit metric status description
a1b2c3d 1523000 keep baseline
b2c3d4e 1489000 keep loop unrolling
c3d4e5f 1600000 discard added bounds checking (slower)
d4e5f6g 0 crash rewrote in SIMD (compile error)