autoloop

This is a general-purpose autonomous agent loop. The agent iterates on a single mutable file, runs it via a harness, measures a metric, and uses git to keep or discard changes.

Setup

To set up a new experiment, work with the user to:

Agree on a run tag: propose a tag based on today's date (e.g. mar5). The branch autoloop/<tag> must not already exist — this is a fresh run.
Create the branch: git checkout -b autoloop/<tag> from current master.
Read the in-scope files:
- README.md — repository context and overview.
- runner.py — the harness. Read-only. Understand what it measures and how.
- The mutable target file (listed below under Mutable file).
Initialize results.tsv: Create results.tsv with just the header row (do not git-add it).
Confirm and go: Confirm setup looks good, then begin the loop.

Configuration

Mutable file: <FILL IN — e.g. main.py or main.c> Primary metric: metric: line emitted by runner.py Metric direction: <FILL IN — lower is better / higher is better> Runner command: uv run python runner.py > run.log 2>&1

Constraints

You CAN:

Modify only the designated mutable file listed above.
Restructure, rewrite, or completely replace it — as long as it still runs.

You CANNOT:

Modify runner.py or any harness file.
Install new packages outside the declared dependencies in pyproject.toml.
Modify reference outputs or correctness checks.

Simplicity criterion: All else being equal, simpler is better. A tiny improvement that adds ugly complexity is not worth it. Removing code and achieving equal-or-better metric is a win.

The experiment loop

LOOP FOREVER:

Check git state: what branch/commit are you on?
Modify the mutable file with your experimental idea.
git commit (short message describing the idea).
uv run python runner.py > run.log 2>&1
grep "^metric:\|^status:" run.log
If grep output is empty OR status: crash → tail -n 50 run.log to read the error. Attempt a fix or skip this idea.
Record results in results.tsv (see format below). Do NOT git-add results.tsv.
If metric improved (per direction above) → KEEP: stay at current HEAD.
If metric is equal or worse → git reset --hard HEAD~1 to discard.
Repeat. Never pause to ask.

Crashes: If a run crashes, use judgment — fix obvious bugs and retry. For fundamentally broken ideas, log crash in the tsv and move on.

Timeouts: If a run exceeds 2× the expected runtime, kill it, treat as crash.

NEVER STOP: Once the loop begins, do NOT ask the human if you should continue. You are fully autonomous. Keep going until manually interrupted.

Output format

After every run, runner.py prints a block like:

---
metric:    <value>
status:    ok | fail | crash
wall_s:    <seconds>
notes:     <optional text>

Extract results with:

grep "^metric:\|^status:" run.log

Logging results

Log every experiment to results.tsv (tab-separated, NOT comma-separated).

Header and columns:

commit	metric	status	description

Short git commit hash (7 chars)
Metric value achieved (use 0 for crashes)
Status: keep, discard, or crash
Short description of what this experiment tried

Example:

commit	metric	status	description
a1b2c3d	1523000	keep	baseline
b2c3d4e	1489000	keep	loop unrolling
c3d4e5f	1600000	discard	added bounds checking (slower)
d4e5f6g	0	crash	rewrote in SIMD (compile error)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autoloop

Setup

Configuration

Constraints

The experiment loop

Output format

Logging results

FilesExpand file tree

program.md

Latest commit

History

program.md

File metadata and controls

autoloop

Setup

Configuration

Constraints

The experiment loop

Output format

Logging results