Inference Speed Tests on Local LLMs

Inference speed tests on Local Large Language Models on various devices. Feel free to contribute your results.

Note: None of the following results are verified

Automated Benchmarking (mlx-lm)

Run reproducible benchmarks on your Apple Silicon Mac using the included script.

Requirements: macOS with Apple Silicon, uv

# Clone and install dependencies
git clone https://github.com/itsmostafa/inference-speed-tests
cd inference-speed-tests
uv sync

# Benchmark a single model (1 iteration)
# Default prompt: 1 sample from tatsu-lab/alpaca (instruction field)
uv run src/main.py mlx-community/Qwen3-8B-4bit -n 1

# Benchmark multiple models with 3 iterations
uv run src/main.py mlx-community/Qwen3-8B-4bit mlx-community/Qwen3-14B-4bit

# Custom inline prompt, output file, and iteration count
uv run src/main.py mlx-community/Qwen3-8B-4bit \
  --prompt "Explain quantum computing" \
  --iterations 5 \
  --output my_results.md

# Multiple inline prompts in one run (results grouped by prompt)
uv run src/main.py mlx-community/Qwen3-8B-4bit \
  --prompt "Write a 500 word story" \
  --prompt "Summarize the history of Rome"

# Prompts from files
uv run src/main.py mlx-community/Qwen3-8B-4bit \
  --prompt-files prompts/500_word_story.md prompts/summarize-turbo-quant.md

# Sample prompts from a HuggingFace dataset
uv run src/main.py mlx-community/Qwen3-8B-4bit \
  --dataset EdinburghNLP/xsum --dataset-field document --dataset-samples 3

# Long-form prompts (CNN/DailyMail articles, ~600–1200 tokens)
uv run src/main.py mlx-community/Qwen3-8B-4bit \
  --dataset cnn_dailymail --dataset-config 3.0.0 --dataset-field article

# Reasoning prompts (GSM8K math questions)
uv run src/main.py mlx-community/Qwen3-8B-4bit \
  --dataset gsm8k --dataset-config main --dataset-field question

Dataset flags (all optional when --dataset is set):

Flag	Default	Description
`--dataset DATASET_ID`	—	HuggingFace dataset to sample from
`--dataset-field FIELD`	—	Column to use as prompt text (required with `--dataset`)
`--dataset-config CONFIG`	—	Dataset config/subset (e.g. `3.0.0` for `cnn_dailymail`)
`--dataset-split SPLIT`	`train`	Dataset split to sample from
`--dataset-samples N`	`1`	Number of samples to draw
`--dataset-seed SEED`	`42`	Random seed for reproducible sampling

Recommended datasets for varying prompt lengths:

Length	Dataset	Config	Field	Avg tokens
Short (default)	`tatsu-lab/alpaca`	—	`instruction`	~30–80
Medium	`EdinburghNLP/xsum`	—	`document`	~200–400
Long	`cnn_dailymail`	`3.0.0`	`article`	~600–1200
Reasoning	`gsm8k`	`main`	`question`	~50–200

Only a small shuffle buffer (~1000 rows) is downloaded — not the full dataset. Results are cached locally by the datasets library, so repeated runs with the same seed skip re-downloading.

Results are written as a Markdown file grouped by prompt, each with a summary table (mean ± stdev across iterations) and per-iteration details including model context size, prompt tps, generation tps, time-to-first-token, peak memory, and total time.

Results are automatically saved into a device-specific folder under results/, derived from your Mac model, chip, RAM, and GPU core count — for example:

results/macbook-pro-m5-max-128gb-40-core-gpu/
results/mac-mini-m4-pro-64gb-20-core-gpu/

This keeps results from different machines organized without any manual effort. To override the output location, pass a path that includes a directory to -o (e.g. --output my-folder/results.md).

Contributing

Everyone is welcome and encouraged to contribute their benchmark results. The more devices represented, the more useful this becomes for the community.

To submit your results:

Fork this repo and clone it locally
Run the benchmark script on your machine (see above) — results will be saved into a device-specific folder automatically
Open a pull request with your new folder added to the repo

That's it. No special format required — just run the script and submit the output as-is.

If you're running models manually or on non-Apple-Silicon hardware, feel free to add results in whatever format makes sense. Open a PR or an issue and we'll figure it out.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
prompts		prompts
results		results
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
Taskfile.yml		Taskfile.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inference Speed Tests on Local LLMs

Automated Benchmarking (mlx-lm)

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Inference Speed Tests on Local LLMs

Automated Benchmarking (mlx-lm)

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages