Thanks for your blog post. I didn't know I could more than double token count by switching from ollama to omlx.
| Metric |
Ollama |
oMLX |
| Total time |
82.4s |
35.5s |
| Gen tok/s |
40.4 |
84.0 |
| Prefill |
14.6s |
7.6s |
oMLX is 132% faster (2.3x throughput) on sustained multi-turn workloads.
m4-max-128gb-40gpu_ollama_fa-kvq80_responses.md
m4-max-128gb-40gpu_ollama_fa-kvq80.md
m4-max-128gb-40gpu_omlx_fa-kvq80_responses.md
m4-max-128gb-40gpu_omlx_fa-kvq80.md
Thanks for your blog post. I didn't know I could more than double token count by switching from ollama to omlx.
oMLX is 132% faster (2.3x throughput) on sustained multi-turn workloads.
m4-max-128gb-40gpu_ollama_fa-kvq80_responses.md
m4-max-128gb-40gpu_ollama_fa-kvq80.md
m4-max-128gb-40gpu_omlx_fa-kvq80_responses.md
m4-max-128gb-40gpu_omlx_fa-kvq80.md