Skip to content

Comparing AI Models: Timings

Rose Heart edited this page Dec 29, 2024 · 1 revision

Comparing AI Models Timings

Overview

This analysis focuses on the relationship between cost and average response time across various AI models. The data reflects a cost-driven approach, emphasizing affordability and practicality for a privately funded project. The table is not intended to reflect model comprehensiveness or capability but solely to compare average response time based on usage frequency and associated costs.

Table of AI Models

Provider Model Name Usage Count Average Response Time (ms)
OpenAI gpt-4o-mini 564 6.196728
TogetherAI meta-llama/Llama-Vision-Free 548 2.947308
Cohere command-r-plus-08-2024 238 35.785773
OpenRouter meta-llama/llama-3.1-405b-instruct:free 55 5.174699
OpenRouter meta-llama/llama-3.2-3b-instruct:free 28 1.257336
Anthropic claude-3-haiku-20240307 28 0.567675
Perplexity llama-3.1-sonar-small-128k-online 18 5.427681
OpenRouter nousresearch/hermes-3-llama-3.1-405b:free 12 6.468109
Ollama tinyllama 11 34.074528
Anthropic claude-3-5-haiku-20241022 11 5.215944
OpenRouter liquid/lfm-40b:free 10 21.880264
HuggingFace Qwen/Qwen2-VL-7B-Instruct 10 1.861794
TogetherAI mistralai/Mistral-7B-Instruct-v0.3 5 3.967040
OpenRouter gryphe/mythomax-l2-13b:free 2 5.825755
HuggingFace AIDC-AI/Ovis1.6-Gemma2-9B 2 3.204573
Cohere command-r-08-2024 2 1.859857
OpenRouter meta-llama/llama-3.1-8b-instruct:free 1 2.686100
OpenRouter meta-llama/llama-3.1-70b-instruct:free 1 0.835187
HuggingFace meta-llama/Llama-3.2-11B-Vision 1 0.360707
Anthropic claude-3.5-haiku-20241022 1 0.199897

Context and Comprehensiveness

This table is not intended to reflect the comprehensiveness of the models or their capabilities. For instance, models such as OpenAI's GPT-4 and Anthropic's Claude feature 128K token context windows, far exceeding smaller models like Ollama's TinyLlama, which offers an 8K token limit. This comparison is purely based on cost and response time relative to usage frequency.

Cost vs. Usage

Usage trends reveal that more frequently used models, like OpenAI's GPT-4o-mini, tend to be more cost-effective. However, certain models, such as Ollama’s TinyLlama, incur additional expenses due to self-hosting factors like electricity, maintenance, and infrastructure.

Considerations were also based on rate limits and the functionality within those limits. Some models have strict usage caps or throttling that impact how efficiently they can be leveraged over time. For example, models with lower rate limits may require more careful management to avoid disruptions, while models with higher limits may allow for more continuous use without encountering performance bottlenecks.

Cohere models represent one of the higher costs per response. However, extended usage has been made possible through a credits grant, allowing for broader experimentation within the constraints of this project.

Project Context

This project is entirely self-funded, requiring careful consideration of cost efficiency. The use of certain models is influenced by specific needs, such as smaller context windows for lightweight applications or larger models for extended-context tasks. This framework reflects a focus on balancing affordability, accessibility, and the practical application of AI models within personal budgetary constraints. If you would like more models tested, please consider sponsoring this project.

Clone this wiki locally