feat: Godspeed Lite — SOTA benchmark infrastructure + minimal coding agent by t-timms · Pull Request #169 · t-timms/godspeed-coding-agent

t-timms · 2026-05-12T05:03:27Z

Summary

Adds comprehensive benchmark infrastructure and Godspeed Lite, a minimal ~480-line coding agent optimized for SWE-bench performance. Synthesizes architecture learnings from Amp, Factory, PI, Claude Code, and mini-SWE-agent v2.

What's included

Benchmark Infrastructure

SOTA Audit — 12-benchmark competitive analysis vs 6 competitor harnesses (Amp, Factory, PI, Claude Code, Cursor, mini-SWE-agent). Maps Godspeed position across security, cost, and score dimensions.
Benchmark Plan — 6 SOTA targets, prioritized run order, cost estimates, reliability architecture, 14 research references.
NIM Key Rotation — 4-key manager with RPM tracking, 429 cooldown with exponential backoff, env var integration. 17 tests passing.
Pre-flight Checker — Validates NIM auth, Docker, WSL, disk space, Python env before benchmark runs.
SWE-bench Runner — Predictions generator with heartbeat logging, crash-resume checkpointing, per-instance timeout enforcement, failure forensics.

Godspeed Lite Agent

Bash-only agent loop — 480 lines, subprocess.run, linear history, no permissions, no async dispatch
3 modes — smart (Opus-class), rush (15-step fast), deep (60-step thorough + model roulette)
Model roulette — Random driver model swap per step (free 3-8% benchmark boost per mini-SWE-agent blog)
AGENTS.md loading — Amp-style project context auto-detection
Auto test detection — Infers pytest/jest/cargo/go test commands from project files
Single-shot fallback — 1-call mode for rate-constrained free tier
29 tests passing, 0 lint errors

Verification

uv run pytest tests/test_nim_key_rotation.py tests/test_godspeed_lite.py  # 46 passed
uv run ruff check src/godspeed/lite/ src/godspeed/benchmarks/            # 0 errors

Commands

godspeed-lite "fix bug"                          # smart mode (default)
godspeed-lite --mode rush "add docstring"        # fast mode
godspeed-lite --mode deep "debug race condition" # thorough mode
godspeed-lite --single-shot "fix the IndexError" # 1-call mode

…n manager - benchmarks/SOTA_AUDIT.md: 12-benchmark competitive analysis vs Amp, Factory, PI, Claude Code - benchmarks/BENCHMARK_PLAN.md: 6 SOTA targets, run strategy, cost estimates, 14 research refs - benchmarks/nim_key_rotation.py: 4-key NIM rotation with RPM tracking, 429 cooldown, env var integration - benchmarks/preflight.py: pre-flight checks (NIM auth, Docker, WSL, disk, Python env) - benchmarks/swebench_runner.py: SWE-bench Verified/Lite runner with resume, heartbeat, failure forensics - tests/test_nim_key_rotation.py: 17 tests (all passing)

- agent.py: bash-only agent loop with model roulette, AGENTS.md loading, auto test detection - cli.py: CLI entrypoint registered as godspeed-lite command - pyproject.toml: added godspeed-lite = godspeed.lite.cli:main entrypoint - tests/test_godspeed_lite.py: 29 tests (command extraction, AGENTS.md, agent loop, async mock)

t-timms added 3 commits May 12, 2026 00:01

feat(scripts): add comprehensive smoke test for Godspeed Lite pipeline

3b11d5a

t-timms merged commit 3b11d5a into main May 12, 2026
5 of 6 checks passed

t-timms deleted the feat/godspeed-lite branch May 12, 2026 05:04

github-advanced-security AI found potential problems May 12, 2026

View reviewed changes

Comment thread scripts/smoke_test.py Dismissed

Comment thread scripts/smoke_test.py Dismissed

Comment thread src/godspeed/benchmarks/swebench_runner.py Dismissed

Comment thread src/godspeed/lite/agent.py Dismissed

Comment thread tests/test_godspeed_lite.py Dismissed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Godspeed Lite — SOTA benchmark infrastructure + minimal coding agent#169

feat: Godspeed Lite — SOTA benchmark infrastructure + minimal coding agent#169
t-timms merged 3 commits into
mainfrom
feat/godspeed-lite

t-timms commented May 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

t-timms commented May 12, 2026

Summary

What's included

Benchmark Infrastructure

Godspeed Lite Agent

Verification

Commands

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants