lance-autoresearch

A multi-target workspace for evolving Lance hot-path kernels via LLM coding agents (Claude Code, Codex, Cursor), in the style of Andrej Karpathy's autoresearch single-agent loop.

What autoresearch is, and why it works

Karpathy's autoresearch (early 2026): give an LLM agent one mutable file, a fixed bench, and a program.md of priors. The agent loops (edit, build, run, keep-or-revert, commit) overnight, until you stop it. Karpathy's framing: "You wake up to a log of experiments and (hopefully) a better model."

This repo adapts that shape for Lance kernel optimization: per-trial ~30s, one mutable file (crates/<target>/src/kernels.rs), correctness oracle is upstream Lance code itself (vendored verbatim in crates/lance-snapshots/). Any kept commit is bit-equivalent to what Lance ships; wins port upstream as Apache-2.0 PRs.

Why the shape works: fixed-cost trials bound the per-iteration budget (~100/hour); one mutable file keeps diffs reviewable and prevents scope creep; a deterministic oracle kills failed trials without spiraling; the loop self-orchestrates so the human can leave; findings compound across sessions via gitignored lessons.md per target.

Each landed target is an independent Rust crate under crates/. The candidates below are listed as a roadmap. They have no code yet, only a docs/targets/<name>.md capsule when one exists. Spinning up a candidate follows the docs/adding-a-target.md workflow.

Target	Status	Lance source area	What's optimized	Best result
`crates/pq-l2`	landed	`lance-linalg::distance::l2`, PQ probe	PQ L2 distance: distance_table + per-vector distances	−43% geomean vs upstream (M1 Max, aarch64; bit-equivalent output; x86 untested)
`crates/pq-cosine`	candidate	`lance-linalg::distance::cosine`	PQ cosine distance	pending
`crates/pq-dot`	candidate	`lance-linalg::distance::dot`	PQ dot-product distance	pending
`crates/ivf-partition`	candidate	`lance-index::vector::ivf` partition select	IVF partition selection (centroid scan)	pending
`crates/fts-bm25`	candidate (Step 0 ✓)	`lance-index::scalar::inverted::scorer` `Scorer::doc_weight`	FTS BM25 scoring inner loop	pending; clean call site at `wand.rs:252` via the `Scorer` trait — ready to scaffold
`crates/bitpack`	candidate	`lance-encoding::encodings::bitpack`	Bitpack integer decode	pending
`crates/dictionary`	candidate	`lance-encoding::encodings::dictionary`	Dictionary decode	pending
`crates/fsst`	candidate	`lance-encoding::encodings::fsst`	FSST string decode	pending
`crates/take`	candidate	`lance-core::utils::take`	Take / gather kernel	pending
`crates/predicate`	candidate	`lance-datafusion` filter eval	Predicate evaluation kernels	pending
`crates/posting-intersect`	landed (off-path; see capsule)	`lance-index::scalar::inverted` (no direct call site)	Sorted u32 posting-list AND intersect	−81% geomean vs scalar K-way merge (M1 Max, aarch64; bit-equivalent output; x86 fallback intact). Kernel surface not in current Lance hot path; see `posting-seek` for the Lance-aligned shape.
`crates/posting-seek`	kernel landed; REJECTED as upstream PR	`lance-index::scalar::inverted::wand` (`next`, `shallow_next`)	Block-aware seek over compressed posting list	Microbench −97% worst-case / −58% geomean. Upstream integration: 1M no change (p > 0.05); 10M REGRESSES OR queries +12.7% (p=0.03). WAND's score-skip preempts deep `next()` calls; gallop's overhead loses on shallow skips that actually dominate. The microbench was self-fulfilling. See capsule for full empirical breakdown
`crates/topk-merge`	candidate	scan-merge	Top-K k-way merge	pending

The candidate targets are documented in docs/targets/ and can be added by following docs/adding-a-target.md. pq-l2, posting-intersect, and posting-seek are landed; the rest wait for an agent to spin them up. pq-l2 carries a −43% geomean win on M1 Max. posting-intersect lands at −81% geomean via three trials (branchless merge → galloping at ratio>16× → NEON cross-product SIMD merge), but a retroactive Step 0 trace (see docs/adding-a-target.md) showed its kernel surface is not in Lance's current WAND hot path — the trial wins are clean kernel engineering on a primitive Lance would need a refactor to use. posting-seek is the Lance-aligned follow-up: a hybrid linear-budget + McIlroy gallop change in wand.rs::next that drops the worst-case seek (Large × Skip-deep) from 3011 ns → 74 ns, ~30 LOC, no unsafe, no SIMD. Step 0 of the workflow was added in response to posting-intersect's mis-scope; future targets won't ship without their "Lance call site" capsule section filed first.

The contract every target follows

Karpathy's three-file shape, applied per target:

File (per target crate)	Mutability	Edited by
`src/kernels.rs`	mutable	the agent
`src/reference.rs`, `src/inputs.rs`, `src/lib.rs`, `src/bin/run_experiment.rs`, `benches/*.rs`	immutable	nobody
`program.md`	human-iterated	the human, between runs
`results.tsv`	append-only	the agent, per trial (gitignored)
`lessons.md`	append-only	the agent, on load-bearing findings (gitignored)

The shared utilities (deterministic PRNG, geomean, bootstrap CI, PMC counters, peak-RSS readback, tolerance constants, time-budget) live in crates/harness-common and are consumed by every target. There is intentionally no Target trait: decode-kernel signatures and distance-kernel signatures are different enough that a unifying trait would either bloat or require erased boxing. Each target is its own natural shape; the shared crate is plumbing only.

The shared loop conventions every target's program.md inherits live in HARNESS.md. Per-target priors and API specifics live in each target's own program.md.

Dataset-independent by design

Every other ANN benchmark you've seen is "compete on this fixed dataset" (SIFT1M, GIST1M, DEEP1B). That conflates two things: kernel correctness (the math) and kernel speed under one specific data distribution. An LLM agent given recall@K as the oracle has incentive to overfit to the dataset's quirks.

We split them, every target:

Correctness = bit-equivalent (max_abs_err ≤ 1e-4 for floats; bitwise for integer/byte kernels) match to a scalar reference, on diverse generated inputs. Mathematical equivalence; no dataset to overfit. Lossy techniques fail this gate.
Speed = geomean ns/operation across multiple shape × distribution combinations, with worst-case guard. A kernel that wins on one distribution and regresses on another fails to keep.

Fixtures generate from a seeded PRNG in each target's inputs.rs. Nothing to download. Reproducible across machines and across runs from the same SHA.

Quick start

# Run the landed PQ L2 target's baseline (3-pass for tight CI).
cargo run --release --bin run_experiment -p pq-l2 -- --mode baseline

# Or per-trial mode (1-pass, faster iteration):
cargo run --release --bin run_experiment -p pq-l2

# With Claude Code / Codex, working on one target:
cd crates/pq-l2
# Open in your agent of choice and prompt:
#   Hi, have a look at program.md and let's kick off a new experiment.

# Add a new target (see docs/adding-a-target.md):
./scripts/scaffold-target.sh pq-cosine
# Then rewrite kernels.rs / reference.rs / inputs.rs / program.md for the
# new kernel's math.

# Check whether our vendored upstream code has drifted:
./scripts/check-lance-drift.sh

Repo layout

lance-autoresearch/
├── Cargo.toml                         # workspace root
├── README.md                          # you are here
├── HARNESS.md                         # shared loop contract every target inherits
├── LICENSE                            # Apache-2.0 (matches upstream Lance)
├── scripts/
│   ├── scaffold-target.sh             # cp -r pq-l2 + rename for a new target
│   └── check-lance-drift.sh           # report upstream-snapshot drift
├── crates/
│   ├── harness-common/                # SplitMix64, geomean, bootstrap CI, PMC counters, tolerance, time budget
│   │   └── src/{lib,prng,stats,sysinfo,tolerance,perf}.rs
│   ├── lance-snapshots/               # verbatim Apache-2.0 vendored Lance hot-path kernels (pinned SHA)
│   │   ├── RESYNC.md
│   │   └── src/{lib,assume,l2,pq}.rs
│   └── pq-l2/                         # landed target
│       ├── Cargo.toml
│       ├── program.md                 # this target's agent skill
│       ├── src/
│       │   ├── lib.rs                 # PqShape + module wiring (immutable)
│       │   ├── kernels.rs             # MUTABLE; agent's playground (starts as upstream clone)
│       │   ├── reference.rs           # IMMUTABLE; thin wrapper over lance-snapshots (oracle IS upstream code)
│       │   ├── inputs.rs              # IMMUTABLE; diverse test-data generators
│       │   └── bin/run_experiment.rs  # IMMUTABLE; per-trial entry point
│       └── benches/pq_l2.rs           # criterion benchmark (immutable)
└── docs/
    ├── design.md                      # rationale for the workspace shape
    ├── robustness.md                  # why each measurement feature exists
    ├── adding-a-target.md             # workflow for spinning up a new target
    └── targets/
        └── pq-l2.md                   # capsule: upstream Lance pointers, oracle, status

Upstream contribution path

When a commit on any target clears the keep bar by a meaningful margin (≥10% geomean speedup with worst-case guard intact), the human reviews the diff, ports the technique against lance-format/lance HEAD, runs Lance's own test suite, and opens a PR. The harness is Apache-2.0 licensed to match Lance; the upstream PR inherits Apache-2.0 cleanly. The correctness gate (MAX_ABS_ERR ≤ 1e-4 against the vendored upstream code in crates/lance-snapshots) means any kept commit is bit-equivalent to what Lance ships today. Recall is preserved by construction, not just empirically.

License

Licensed under the Apache License, Version 2.0 (LICENSE).

Vendored upstream code in crates/lance-snapshots/ carries the same license and is attributed to The Lance Authors in each file's SPDX header. See crates/lance-snapshots/RESYNC.md for the re-sync ritual.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
crates		crates
docs		docs
scripts		scripts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
HARNESS.md		HARNESS.md
LICENSE		LICENSE
README.md		README.md
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lance-autoresearch

What autoresearch is, and why it works

The contract every target follows

Dataset-independent by design

Quick start

Repo layout

Upstream contribution path

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

lance-autoresearch

What autoresearch is, and why it works

The contract every target follows

Dataset-independent by design

Quick start

Repo layout

Upstream contribution path

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages