feat: tok/s regression guards + README factual update by dexwritescode · Pull Request #23 · dexwritescode/neurons

dexwritescode · 2026-05-07T01:25:30Z

Summary

Add GenerateThroughput test to Mistral 7B, Llama-3.1 8B, Gemma 3 1B, and Qwen3 MoE integration test suites. Each test runs a 128-token greedy generation with an 8-token warmup pass to trigger mx::compile before timing, then asserts a floor of ~50% of measured baseline — enough to catch any regression back to pre-pipelining speeds (~10-12 tok/s)
Rewrite README to be factually accurate against the current codebase: Mermaid architecture diagram, correct model class names, Qwen3 dense + Qwen3 MoE in supported models, cuBLAS removed from intro, Performance section with real release-build numbers, updated roadmap

Throughput floors (debug build, post-warmup):

Model	Baseline	Floor
Mistral 7B 8-bit	~44 tok/s	22 tok/s
Llama-3.1 8B 4-bit	~68 tok/s	33 tok/s
Gemma 3 1B 4-bit	~60 tok/s	30 tok/s
Qwen3 MoE 30B-A3B 4-bit	~23 tok/s	11 tok/s

Performance numbers in README (release build, greedy):
TinyLlama 1.1B ~265 tok/s · Gemma 3 1B ~190 tok/s · Llama-3.1 8B ~61 tok/s · Mistral 7B ~57 tok/s · Qwen3.6 35B-A3B ~77 tok/s

Integration tests: - Add GenerateThroughput test to Mistral, Llama3, Gemma, Qwen3 MoE - 128-token greedy run with mx::compile warmup pass before timing - Floors set to ~50% of measured debug-build baseline to catch regressions back to pre-pipelining speeds README: - Replace ASCII architecture diagram with Mermaid - Add Qwen3 (dense) and Qwen3 MoE to supported models table - Fix model class names (GemmaModelMLX, Qwen3MoeModelMLX) - Simplify backend section — MLX is the only backend today - Remove cuBLAS from intro (not yet implemented) - Add Performance section with release-build tok/s measurements - Update roadmap: mark Phase O (MLX perf) done, Qwen3 MoE in Phase F

dexwritescode added the release:minor Bumps minor version on merge label May 7, 2026

fix: replace \n with dashes in Mermaid node labels

d7146c8

dexwritescode merged commit 12441d3 into main May 7, 2026
3 checks passed

dexwritescode deleted the feat-throughput-tests-readme branch May 7, 2026 01:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: tok/s regression guards + README factual update#23

feat: tok/s regression guards + README factual update#23
dexwritescode merged 2 commits intomainfrom
feat-throughput-tests-readme

dexwritescode commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dexwritescode commented May 7, 2026

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant