bench: add spec for automatic canary benchmarks by JakeHillion · Pull Request #1483 · exo-explore/exo

JakeHillion · 2026-02-16T13:07:24Z

Adds all the models that can fit onto a single M3 Ultra for single machine benchmarks. Fixes the macOS version, GPU spec, and chip type for maximum reproducibility. Specifies the minimum memory accordingly for each type of model, using the smallest machine available (the smallest M3 Ultra is 96GiB).

Test plan:

Running this with some code that makes machines of this spec available and stores the results. It works.

This will become part of a larger testing/stability strategy once we've collected more of the data.

Adds all the models that can fit onto a single M3 Ultra for single machine benchmarks. Fixes the macOS version, GPU spec, and chip type for maximum reproducibility. Specifies the minimum memory accordingly for each type of model, using the smallest machine available (the smallest M3 Ultra is 96GiB). Test plan: - Running this with some code that makes machines of this spec available and stores the results. It works. This will become part of a larger testing/stability strategy once we've collected more of the data.

AlexCheema · 2026-02-16T22:27:29Z

Review summary (CI: all passing)

Adds benchmark specification files for automated canary benchmarks (+196, new files only):

bench/bench.toml: manifest listing suite files to include
bench/single-m3-ultra.toml: 16+ model benchmarks on a single M3 Ultra (80 GPU cores, 96GiB+ RAM, macOS 25D125)

Models cover a good range: Llama 3.1/3.2 (1B–70B), Qwen3 variants, GLM-4.7, GPT-OSS in 4-bit/8-bit/bf16 quantizations. Default args: pp=[512, 2048, 8192, 16384], tg=128. Constraints pin exact OS build and GPU spec for reproducibility.

Additive-only change, no existing code modified.

rltakashige · 2026-02-17T10:54:24Z

Good bot.

JakeHillion requested a review from rltakashige February 16, 2026 18:28

JakeHillion force-pushed the JakeHillion/wsxotlsxmutq branch from eb65a15 to 574bf32 Compare February 16, 2026 18:40

rltakashige approved these changes Feb 16, 2026

View reviewed changes

JakeHillion force-pushed the JakeHillion/wsxotlsxmutq branch from 574bf32 to 8d64a56 Compare February 16, 2026 19:29

rltakashige merged commit 8392e78 into main Feb 17, 2026
6 checks passed

rltakashige deleted the JakeHillion/wsxotlsxmutq branch February 17, 2026 10:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: add spec for automatic canary benchmarks#1483

bench: add spec for automatic canary benchmarks#1483
rltakashige merged 1 commit intomainfrom
JakeHillion/wsxotlsxmutq

JakeHillion commented Feb 16, 2026

Uh oh!

AlexCheema commented Feb 16, 2026

Uh oh!

Uh oh!

rltakashige commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JakeHillion commented Feb 16, 2026

Uh oh!

AlexCheema commented Feb 16, 2026

Uh oh!

Uh oh!

rltakashige commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants