-
-
Notifications
You must be signed in to change notification settings - Fork 5
Comparing AI Models: Timings
This analysis focuses on the relationship between cost and average response time across various AI models. The data reflects a cost-driven approach, emphasizing affordability and practicality for a privately funded project. The table is not intended to reflect model comprehensiveness or capability but solely to compare average response time based on usage frequency and associated costs.
| Provider | Model Name | Usage Count | Average Response Time (ms) |
|---|---|---|---|
| OpenAI | gpt-4o-mini | 564 | 6.196728 |
| TogetherAI | meta-llama/Llama-Vision-Free | 548 | 2.947308 |
| Cohere | command-r-plus-08-2024 | 238 | 35.785773 |
| OpenRouter | meta-llama/llama-3.1-405b-instruct:free | 55 | 5.174699 |
| OpenRouter | meta-llama/llama-3.2-3b-instruct:free | 28 | 1.257336 |
| Anthropic | claude-3-haiku-20240307 | 28 | 0.567675 |
| Perplexity | llama-3.1-sonar-small-128k-online | 18 | 5.427681 |
| OpenRouter | nousresearch/hermes-3-llama-3.1-405b:free | 12 | 6.468109 |
| Ollama | tinyllama | 11 | 34.074528 |
| Anthropic | claude-3-5-haiku-20241022 | 11 | 5.215944 |
| OpenRouter | liquid/lfm-40b:free | 10 | 21.880264 |
| HuggingFace | Qwen/Qwen2-VL-7B-Instruct | 10 | 1.861794 |
| TogetherAI | mistralai/Mistral-7B-Instruct-v0.3 | 5 | 3.967040 |
| OpenRouter | gryphe/mythomax-l2-13b:free | 2 | 5.825755 |
| HuggingFace | AIDC-AI/Ovis1.6-Gemma2-9B | 2 | 3.204573 |
| Cohere | command-r-08-2024 | 2 | 1.859857 |
| OpenRouter | meta-llama/llama-3.1-8b-instruct:free | 1 | 2.686100 |
| OpenRouter | meta-llama/llama-3.1-70b-instruct:free | 1 | 0.835187 |
| HuggingFace | meta-llama/Llama-3.2-11B-Vision | 1 | 0.360707 |
| Anthropic | claude-3.5-haiku-20241022 | 1 | 0.199897 |
This table is not intended to reflect the comprehensiveness of the models or their capabilities. For instance, models such as OpenAI's GPT-4 and Anthropic's Claude feature 128K token context windows, far exceeding smaller models like Ollama's TinyLlama, which offers an 8K token limit. This comparison is purely based on cost and response time relative to usage frequency.
Usage trends reveal that more frequently used models, like OpenAI's GPT-4o-mini, tend to be more cost-effective. However, certain models, such as Ollama’s TinyLlama, incur additional expenses due to self-hosting factors like electricity, maintenance, and infrastructure.
Considerations were also based on rate limits and the functionality within those limits. Some models have strict usage caps or throttling that impact how efficiently they can be leveraged over time. For example, models with lower rate limits may require more careful management to avoid disruptions, while models with higher limits may allow for more continuous use without encountering performance bottlenecks.
Cohere models represent one of the higher costs per response. However, extended usage has been made possible through a credits grant, allowing for broader experimentation within the constraints of this project.
This project is entirely self-funded, requiring careful consideration of cost efficiency. The use of certain models is influenced by specific needs, such as smaller context windows for lightweight applications or larger models for extended-context tasks. This framework reflects a focus on balancing affordability, accessibility, and the practical application of AI models within personal budgetary constraints. If you would like more models tested, please consider sponsoring this project.
ScrapingAnt is a web page retrieval service. This is an affiliate link. IF you puchase services from this company using the link provided on this page, I will receive a small amount of compensation. ALL received compensation goes strictly to covering the expenses of continued developement of this software, not personal profit.
Please consider sponsoring this project as it help cover the expenses for continued developement. Thank you.