m4 max mac studio

Thanks for your blog post. I didn't know I could more than double token count by switching from ollama to omlx.


| Metric | Ollama | oMLX |
| --- | --- | --- |
| Total time | 82.4s | 35.5s |
| Gen tok/s | 40.4 | 84.0 |
| Prefill | 14.6s | 7.6s |


**oMLX is 132% faster** (2.3x throughput) on sustained multi-turn workloads.

[m4-max-128gb-40gpu_ollama_fa-kvq80_responses.md](https://github.com/user-attachments/files/26803396/m4-max-128gb-40gpu_ollama_fa-kvq80_responses.md)
[m4-max-128gb-40gpu_ollama_fa-kvq80.md](https://github.com/user-attachments/files/26803395/m4-max-128gb-40gpu_ollama_fa-kvq80.md)
[m4-max-128gb-40gpu_omlx_fa-kvq80_responses.md](https://github.com/user-attachments/files/26803397/m4-max-128gb-40gpu_omlx_fa-kvq80_responses.md)
[m4-max-128gb-40gpu_omlx_fa-kvq80.md](https://github.com/user-attachments/files/26803394/m4-max-128gb-40gpu_omlx_fa-kvq80.md)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

m4 max mac studio #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

m4 max mac studio #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions