BenchDirectory

AI benchmarks from curious people, in one place.

The most interesting AI benchmarks right now aren't the academic ones. They're the personal ones. SnitchBench measures whether a model will report you to the FBI. BullshitBench measures whether it pushes back on a nonsense question. SlopBench measures how much AI slop a model writes. Each one lives on its own site, in its own format. I kept losing track of them, so I put them in one place.

Every score comes from the person who built the benchmark. They run it and publish the numbers. BenchDirectory pulls those numbers in and links straight back to them, so credit and clicks go to the people doing the work.

Current benchmarks

Benchmark	Built by	Where the scores come from
SnitchBench	Theo	`snitching-analysis.json` in T3-Content/SnitchBench
BullshitBench	Peter Gostev	`data/v2/latest/leaderboard.csv` in petergpt/bullshit-benchmark (`data/latest/` is the stale v1 export)
SlopBench	Dan Cleary	public leaderboard query on the production SlopBench backend
SkateBench	Theo	server-rendered leaderboard at skatebench.t3.gg (the committed JSON is a stale v1 run; the live v2 data was never pushed)
ScreenshotBench	Dan Cleary	public matrix query on the production screenshotbench.com backend
DeepSWE	Datacurve	server-rendered leaderboard at deepswe.datacurve.ai. No structured export, so the adapter parses the page and fails loudly if the markup changes
Planning Benchmark	bladnman	per-run branches in bladnman/planning_benchmark, each branch's `results/PLAN_EVAL.md`
Senior Engineer Bench	Every / Dan Shipper	hand-curated from the published Vibe Check articles (scores live in prose only)
CursorBench	Cursor	hand-curated from the published leaderboard page (the benchmark itself is closed)

Hand-curated tables are marked in the UI and carry the creator's link. They're the fallback for benches whose owners publish numbers but not data files.

How it works

adapters/*.ts  ──ingest──▶  src/data/*.json  ──build──▶  static site

adapters/ is one file per benchmark. It grabs the creator's published numbers (raw JSON/CSV from their repo, or a public API) and turns them into a single shape (adapters/types.ts).
src/data/ is the saved snapshots. Plain JSON you can diff and review, no backend required.
src/ is the React/Vite site that renders them, grouped by benchmark.

npm install
npm run ingest          # refresh every snapshot from its source
npm run ingest -- snitchbench   # refresh just one
npm run dev             # local dev server
npm run build           # type-check + production build

Add your benchmark

Two ways.

Easiest: hit Submit a bench on the site and drop a link to your repo. That opens a pre-filled issue and we take it from there.

Or open a PR:

Publish your results somewhere structured. A JSON or CSV in your repo is perfect.
Copy any file in adapters/, point it at your data, and fill in the BenchmarkMeta (name, links, what the score means, which direction is better).
Register it in adapters/run.ts, run npm run ingest, and commit the snapshot.

The UI picks up new snapshots on its own.

Freshness

A GitHub Action re-checks every benchmark daily and commits only when a creator has actually published something new. People update on their own schedule. New models trickle in for days after a launch, so a daily check catches everyone without anyone needing to coordinate. Each section shows when the creator generated the data and when we pulled it.

Roadmap

More benches (got one? open an issue or use the submit button)
Cross-bench model report cards: one model, every indie bench
Bar charts per benchmark
A benchmark.json manifest so creators can push instead of being pulled

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
adapters		adapters
convex		convex
public		public
src		src
.gitignore		.gitignore
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
tsconfig.tsbuildinfo		tsconfig.tsbuildinfo
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BenchDirectory

Current benchmarks

How it works

Add your benchmark

Freshness

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BenchDirectory

Current benchmarks

How it works

Add your benchmark

Freshness

Roadmap

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages