Skip to content

feat: tok/s regression guards + README factual update#23

Merged
dexwritescode merged 2 commits intomainfrom
feat-throughput-tests-readme
May 7, 2026
Merged

feat: tok/s regression guards + README factual update#23
dexwritescode merged 2 commits intomainfrom
feat-throughput-tests-readme

Conversation

@dexwritescode
Copy link
Copy Markdown
Owner

Summary

  • Add GenerateThroughput test to Mistral 7B, Llama-3.1 8B, Gemma 3 1B, and Qwen3 MoE integration test suites. Each test runs a 128-token greedy generation with an 8-token warmup pass to trigger mx::compile before timing, then asserts a floor of ~50% of measured baseline — enough to catch any regression back to pre-pipelining speeds (~10-12 tok/s)
  • Rewrite README to be factually accurate against the current codebase: Mermaid architecture diagram, correct model class names, Qwen3 dense + Qwen3 MoE in supported models, cuBLAS removed from intro, Performance section with real release-build numbers, updated roadmap

Throughput floors (debug build, post-warmup):

Model Baseline Floor
Mistral 7B 8-bit ~44 tok/s 22 tok/s
Llama-3.1 8B 4-bit ~68 tok/s 33 tok/s
Gemma 3 1B 4-bit ~60 tok/s 30 tok/s
Qwen3 MoE 30B-A3B 4-bit ~23 tok/s 11 tok/s

Performance numbers in README (release build, greedy):
TinyLlama 1.1B ~265 tok/s · Gemma 3 1B ~190 tok/s · Llama-3.1 8B ~61 tok/s · Mistral 7B ~57 tok/s · Qwen3.6 35B-A3B ~77 tok/s

Integration tests:
- Add GenerateThroughput test to Mistral, Llama3, Gemma, Qwen3 MoE
- 128-token greedy run with mx::compile warmup pass before timing
- Floors set to ~50% of measured debug-build baseline to catch
  regressions back to pre-pipelining speeds

README:
- Replace ASCII architecture diagram with Mermaid
- Add Qwen3 (dense) and Qwen3 MoE to supported models table
- Fix model class names (GemmaModelMLX, Qwen3MoeModelMLX)
- Simplify backend section — MLX is the only backend today
- Remove cuBLAS from intro (not yet implemented)
- Add Performance section with release-build tok/s measurements
- Update roadmap: mark Phase O (MLX perf) done, Qwen3 MoE in Phase F
@dexwritescode dexwritescode added the release:minor Bumps minor version on merge label May 7, 2026
@dexwritescode dexwritescode merged commit 12441d3 into main May 7, 2026
3 checks passed
@dexwritescode dexwritescode deleted the feat-throughput-tests-readme branch May 7, 2026 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release:minor Bumps minor version on merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant