Skip to content

bench: add spec for automatic canary benchmarks#1483

Merged
rltakashige merged 1 commit intomainfrom
JakeHillion/wsxotlsxmutq
Feb 17, 2026
Merged

bench: add spec for automatic canary benchmarks#1483
rltakashige merged 1 commit intomainfrom
JakeHillion/wsxotlsxmutq

Conversation

@JakeHillion
Copy link
Copy Markdown
Contributor

Adds all the models that can fit onto a single M3 Ultra for single machine benchmarks. Fixes the macOS version, GPU spec, and chip type for maximum reproducibility. Specifies the minimum memory accordingly for each type of model, using the smallest machine available (the smallest M3 Ultra is 96GiB).

Test plan:

  • Running this with some code that makes machines of this spec available and stores the results. It works.

This will become part of a larger testing/stability strategy once we've collected more of the data.

@JakeHillion JakeHillion force-pushed the JakeHillion/wsxotlsxmutq branch from eb65a15 to 574bf32 Compare February 16, 2026 18:40
Adds all the models that can fit onto a single M3 Ultra for single
machine benchmarks. Fixes the macOS version, GPU spec, and chip type for
maximum reproducibility. Specifies the minimum memory accordingly for
each type of model, using the smallest machine available (the smallest
M3 Ultra is 96GiB).

Test plan:
- Running this with some code that makes machines of this spec available
  and stores the results. It works.

This will become part of a larger testing/stability strategy once we've
collected more of the data.
@JakeHillion JakeHillion force-pushed the JakeHillion/wsxotlsxmutq branch from 574bf32 to 8d64a56 Compare February 16, 2026 19:29
@AlexCheema
Copy link
Copy Markdown
Contributor

Review summary (CI: all passing)

Adds benchmark specification files for automated canary benchmarks (+196, new files only):

  • bench/bench.toml: manifest listing suite files to include
  • bench/single-m3-ultra.toml: 16+ model benchmarks on a single M3 Ultra (80 GPU cores, 96GiB+ RAM, macOS 25D125)

Models cover a good range: Llama 3.1/3.2 (1B–70B), Qwen3 variants, GLM-4.7, GPT-OSS in 4-bit/8-bit/bf16 quantizations. Default args: pp=[512, 2048, 8192, 16384], tg=128. Constraints pin exact OS build and GPU spec for reproducibility.

Additive-only change, no existing code modified.

@rltakashige rltakashige merged commit 8392e78 into main Feb 17, 2026
6 checks passed
@rltakashige rltakashige deleted the JakeHillion/wsxotlsxmutq branch February 17, 2026 10:52
@rltakashige
Copy link
Copy Markdown
Collaborator

Good bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants