Skip to content

Latest commit

 

History

History
110 lines (77 loc) · 3.49 KB

File metadata and controls

110 lines (77 loc) · 3.49 KB

autoloop

This is a general-purpose autonomous agent loop. The agent iterates on a single mutable file, runs it via a harness, measures a metric, and uses git to keep or discard changes.

Setup

To set up a new experiment, work with the user to:

  1. Agree on a run tag: propose a tag based on today's date (e.g. mar5). The branch autoloop/<tag> must not already exist — this is a fresh run.
  2. Create the branch: git checkout -b autoloop/<tag> from current master.
  3. Read the in-scope files:
    • README.md — repository context and overview.
    • runner.py — the harness. Read-only. Understand what it measures and how.
    • The mutable target file (listed below under Mutable file).
  4. Initialize results.tsv: Create results.tsv with just the header row (do not git-add it).
  5. Confirm and go: Confirm setup looks good, then begin the loop.

Configuration

Mutable file: <FILL IN — e.g. main.py or main.c> Primary metric: metric: line emitted by runner.py Metric direction: <FILL IN — lower is better / higher is better> Runner command: uv run python runner.py > run.log 2>&1


Constraints

You CAN:

  • Modify only the designated mutable file listed above.
  • Restructure, rewrite, or completely replace it — as long as it still runs.

You CANNOT:

  • Modify runner.py or any harness file.
  • Install new packages outside the declared dependencies in pyproject.toml.
  • Modify reference outputs or correctness checks.

Simplicity criterion: All else being equal, simpler is better. A tiny improvement that adds ugly complexity is not worth it. Removing code and achieving equal-or-better metric is a win.


The experiment loop

LOOP FOREVER:

  1. Check git state: what branch/commit are you on?
  2. Modify the mutable file with your experimental idea.
  3. git commit (short message describing the idea).
  4. uv run python runner.py > run.log 2>&1
  5. grep "^metric:\|^status:" run.log
  6. If grep output is empty OR status: crashtail -n 50 run.log to read the error. Attempt a fix or skip this idea.
  7. Record results in results.tsv (see format below). Do NOT git-add results.tsv.
  8. If metric improved (per direction above) → KEEP: stay at current HEAD.
  9. If metric is equal or worse → git reset --hard HEAD~1 to discard.
  10. Repeat. Never pause to ask.

Crashes: If a run crashes, use judgment — fix obvious bugs and retry. For fundamentally broken ideas, log crash in the tsv and move on.

Timeouts: If a run exceeds 2× the expected runtime, kill it, treat as crash.

NEVER STOP: Once the loop begins, do NOT ask the human if you should continue. You are fully autonomous. Keep going until manually interrupted.


Output format

After every run, runner.py prints a block like:

---
metric:    <value>
status:    ok | fail | crash
wall_s:    <seconds>
notes:     <optional text>

Extract results with:

grep "^metric:\|^status:" run.log

Logging results

Log every experiment to results.tsv (tab-separated, NOT comma-separated).

Header and columns:

commit	metric	status	description
  1. Short git commit hash (7 chars)
  2. Metric value achieved (use 0 for crashes)
  3. Status: keep, discard, or crash
  4. Short description of what this experiment tried

Example:

commit	metric	status	description
a1b2c3d	1523000	keep	baseline
b2c3d4e	1489000	keep	loop unrolling
c3d4e5f	1600000	discard	added bounds checking (slower)
d4e5f6g	0	crash	rewrote in SIMD (compile error)