Skip to content

feat: Godspeed Lite — SOTA benchmark infrastructure + minimal coding agent#169

Merged
t-timms merged 3 commits into
mainfrom
feat/godspeed-lite
May 12, 2026
Merged

feat: Godspeed Lite — SOTA benchmark infrastructure + minimal coding agent#169
t-timms merged 3 commits into
mainfrom
feat/godspeed-lite

Conversation

@t-timms
Copy link
Copy Markdown
Owner

@t-timms t-timms commented May 12, 2026

Summary

Adds comprehensive benchmark infrastructure and Godspeed Lite, a minimal ~480-line coding agent optimized for SWE-bench performance. Synthesizes architecture learnings from Amp, Factory, PI, Claude Code, and mini-SWE-agent v2.

What's included

Benchmark Infrastructure

  • SOTA Audit — 12-benchmark competitive analysis vs 6 competitor harnesses (Amp, Factory, PI, Claude Code, Cursor, mini-SWE-agent). Maps Godspeed position across security, cost, and score dimensions.
  • Benchmark Plan — 6 SOTA targets, prioritized run order, cost estimates, reliability architecture, 14 research references.
  • NIM Key Rotation — 4-key manager with RPM tracking, 429 cooldown with exponential backoff, env var integration. 17 tests passing.
  • Pre-flight Checker — Validates NIM auth, Docker, WSL, disk space, Python env before benchmark runs.
  • SWE-bench Runner — Predictions generator with heartbeat logging, crash-resume checkpointing, per-instance timeout enforcement, failure forensics.

Godspeed Lite Agent

  • Bash-only agent loop — 480 lines, subprocess.run, linear history, no permissions, no async dispatch
  • 3 modes — smart (Opus-class), rush (15-step fast), deep (60-step thorough + model roulette)
  • Model roulette — Random driver model swap per step (free 3-8% benchmark boost per mini-SWE-agent blog)
  • AGENTS.md loading — Amp-style project context auto-detection
  • Auto test detection — Infers pytest/jest/cargo/go test commands from project files
  • Single-shot fallback — 1-call mode for rate-constrained free tier
  • 29 tests passing, 0 lint errors

Verification

uv run pytest tests/test_nim_key_rotation.py tests/test_godspeed_lite.py  # 46 passed
uv run ruff check src/godspeed/lite/ src/godspeed/benchmarks/            # 0 errors

Commands

godspeed-lite "fix bug"                          # smart mode (default)
godspeed-lite --mode rush "add docstring"        # fast mode
godspeed-lite --mode deep "debug race condition" # thorough mode
godspeed-lite --single-shot "fix the IndexError" # 1-call mode

t-timms added 3 commits May 12, 2026 00:01
…n manager

- benchmarks/SOTA_AUDIT.md: 12-benchmark competitive analysis vs Amp, Factory, PI, Claude Code
- benchmarks/BENCHMARK_PLAN.md: 6 SOTA targets, run strategy, cost estimates, 14 research refs
- benchmarks/nim_key_rotation.py: 4-key NIM rotation with RPM tracking, 429 cooldown, env var integration
- benchmarks/preflight.py: pre-flight checks (NIM auth, Docker, WSL, disk, Python env)
- benchmarks/swebench_runner.py: SWE-bench Verified/Lite runner with resume, heartbeat, failure forensics
- tests/test_nim_key_rotation.py: 17 tests (all passing)
- agent.py: bash-only agent loop with model roulette, AGENTS.md loading, auto test detection
- cli.py: CLI entrypoint registered as godspeed-lite command
- pyproject.toml: added godspeed-lite = godspeed.lite.cli:main entrypoint
- tests/test_godspeed_lite.py: 29 tests (command extraction, AGENTS.md, agent loop, async mock)
@t-timms t-timms merged commit 3b11d5a into main May 12, 2026
5 of 6 checks passed
@t-timms t-timms deleted the feat/godspeed-lite branch May 12, 2026 05:04
Comment thread scripts/smoke_test.py Dismissed
Comment thread scripts/smoke_test.py Dismissed
Comment thread src/godspeed/benchmarks/swebench_runner.py Dismissed
Comment thread src/godspeed/lite/agent.py Dismissed
Comment thread tests/test_godspeed_lite.py Dismissed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants