Serverless Autoresearch — Parallel Evolutionary Pipeline

A system that runs Karpathy's autoresearch in parallel on SageMaker Spot Training.

Architecture

Unlike the original autoresearch, which runs experiments sequentially one at a time, this system operates using a generation-based parallel evolution approach:

Candidate Generation: Automatically generate N variants of train.py
Parallel Execution: Run simultaneously across N SageMaker Spot instances
Selection: The candidate with the lowest val_bpb becomes the new baseline
Iteration: Repeat for M generations

Setup Checklist

Prepare Data: python scripts/prepare_s3.py
IAM Role: ./infrastructure/setup_iam.sh → enter role_arn in config.yaml
Docker Image: ./infrastructure/build_and_push.sh → enter image_uri in config.yaml
Validate Config: python -m pipeline.orchestrator --dry-run

Running Experiments

Dry Run (zero cost)

python -m pipeline.orchestrator --dry-run

Single Generation Test (~$0.40)

python -m pipeline.orchestrator --single --population 10

Full Pipeline (~$4)

python -m pipeline.orchestrator --generations 10 --population 10

Single Experiment Debugging (~$0.04)

python scripts/run_single.py

The Rules

Only train.py may be modified (model, optimizer, hyperparameters)
prepare.py must not be modified (evaluation functions, data loaders, constants)
No new dependencies may be added
Training time: fixed at 5 minutes (TIME_BUDGET=300 seconds)
Goal: achieve the lowest val_bpb

Candidate Diversity Strategy

Each generation produces candidates with varied strategies:

Type	Count	Strategy
Conservative	3	Fine-tune LR ±10-30%
Moderate	4	Change DEPTH, ASPECT_RATIO, BATCH_SIZE, WINDOW
Aggressive	2	Radical combinations (deep-narrow, wide-shallow, high-LR)
Crossover	1	Combine top-2 ideas from previous generation

Cost Model

Instance: ml.g5.xlarge (A10G 24GB, Ampere)
Spot price: ~$0.30/hr
Per experiment: ~$0.04 (8 minutes)
Per generation (10 candidates): ~$0.40
Full pipeline (10 gen × 10 pop): ~$4.00

Output

results.tsv: Full experiment log (TSV format)
generations/gen_NNN/: Per-generation candidate code + results
Git tags: gen-NNN-best — best state for each generation

OMC Autopilot Integration

Run the full loop autonomously with OMC autopilot:

/autopilot

Read program.md and run the full pipeline:
python -m pipeline.orchestrator --generations 10 --population 10

After completion, analyze results.tsv and summarize:
1. Best val_bpb achieved
2. Most impactful changes
3. Cost summary
4. Recommendations for next run

For Education/Demo

After experiments complete, prepare the following:

results.tsv + generations/ → visualize the experiment process
Git log → trace the evolutionary progression
Cost report → demonstrate cloud efficiency
Comparison against baseline: 8-hour sequential vs 100-minute parallel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serverless Autoresearch — Parallel Evolutionary Pipeline

Architecture

Setup Checklist

Running Experiments

Dry Run (zero cost)

Single Generation Test (~$0.40)

Full Pipeline (~$4)

Single Experiment Debugging (~$0.04)

The Rules

Candidate Diversity Strategy

Cost Model

Output

OMC Autopilot Integration

For Education/Demo

FilesExpand file tree

program.md

Latest commit

History

program.md

File metadata and controls

Serverless Autoresearch — Parallel Evolutionary Pipeline

Architecture

Setup Checklist

Running Experiments

Dry Run (zero cost)

Single Generation Test (~$0.40)

Full Pipeline (~$4)

Single Experiment Debugging (~$0.04)

The Rules

Candidate Diversity Strategy

Cost Model

Output

OMC Autopilot Integration

For Education/Demo