|
1 | | -# LLM Benchmark Tool |
| 1 | +# ModelPulse |
2 | 2 |
|
3 | | -A powerful benchmarking tool for comparing performance across multiple Large Language Model providers including OpenRouter, Groq, and OpenAI. |
| 3 | +**Real-time LLM benchmarking tool** — compare model speed, cost, and quality side-by-side. |
| 4 | + |
| 5 | +[](https://www.python.org/downloads/) |
| 6 | +[](LICENSE) |
| 7 | +[](https://doc.qt.io/qtforpython-6/) |
| 8 | + |
| 9 | +ModelPulse is a desktop application that benchmarks Large Language Model providers head-to-head with real-time streaming. Select models, fire the same prompt at each, and instantly see which one is faster, cheaper, and better. |
4 | 10 |
|
5 | 11 | ## Features |
6 | 12 |
|
7 | | -- **Multi-Provider Support**: Compare models from OpenRouter, Groq Direct, and OpenAI Direct |
8 | | -- **Real-time Performance Metrics**: Track time to first token (TTFT) and total latency |
9 | | -- **Dark Theme UI**: Professional dark-mode interface with clean, efficient design |
10 | | -- **History Tracking**: Navigate through previous benchmark runs with full state restoration |
11 | | -- **Side-by-Side Comparison**: Run multiple models simultaneously for direct comparison |
12 | | -- **Customizable Parameters**: Adjust temperature, max tokens, and system prompts |
| 13 | +- **Multi-provider support** — OpenRouter, Groq, and OpenAI in one tool |
| 14 | +- **Real-time streaming** — Watch responses arrive token-by-token with TTFT (time-to-first-token) tracking |
| 15 | +- **Side-by-side comparison** — Benchmark 2 models simultaneously on the same prompt |
| 16 | +- **Cost tracking** — Per-request USD cost calculated from provider pricing |
| 17 | +- **Performance metrics** — TTFT, total latency, tokens/second, input/output token counts |
| 18 | +- **History** — Browse and restore previous benchmark runs with full state |
| 19 | +- **Smart caching** — 30-minute TTL cache for model listings (no redundant API calls) |
| 20 | +- **Persistent config** — API keys and settings saved locally in TOML format |
| 21 | +- **Dark UI** — Professional navy-black theme with purple-violet accents |
13 | 22 |
|
14 | | -## Installation |
| 23 | +## Quick Start |
15 | 24 |
|
16 | | -1. Clone the repository: |
17 | | -```bash |
18 | | -git clone https://github.com/DevStrategist/LLMCompareV2.git |
19 | | -cd LLMCompareV2/llm-benchmark |
20 | | -``` |
| 25 | +### Prerequisites |
| 26 | + |
| 27 | +- Python 3.11+ |
| 28 | +- At least one API key: [OpenRouter](https://openrouter.ai/settings/keys), [Groq](https://console.groq.com/keys), or [OpenAI](https://platform.openai.com/api-keys) |
| 29 | + |
| 30 | +### Install |
21 | 31 |
|
22 | | -2. Install dependencies: |
23 | 32 | ```bash |
| 33 | +git clone https://github.com/DevStrategist/ModelPulse.git |
| 34 | +cd ModelPulse/llm-benchmark |
24 | 35 | pip install -r requirements.txt |
25 | 36 | ``` |
26 | 37 |
|
27 | | -3. Run the application: |
| 38 | +### Run |
| 39 | + |
28 | 40 | ```bash |
29 | 41 | python main.py |
30 | 42 | ``` |
31 | 43 |
|
32 | | -## Configuration |
| 44 | +On first launch, click **Settings** to enter your API key(s). Select models in each panel, type a prompt, and hit **Run Benchmark** (or `Ctrl+Enter`). |
| 45 | + |
| 46 | +## Architecture |
33 | 47 |
|
34 | | -### API Keys |
| 48 | +``` |
| 49 | +llm-benchmark/ |
| 50 | +├── main.py # Entry point |
| 51 | +├── src/ |
| 52 | +│ ├── benchmark_runner.py # Orchestrates concurrent benchmark runs |
| 53 | +│ ├── clients/ # API client implementations |
| 54 | +│ │ ├── base_client.py # Abstract base with streaming logic |
| 55 | +│ │ ├── openrouter_client.py |
| 56 | +│ │ ├── groq_client.py |
| 57 | +│ │ └── openai_client.py |
| 58 | +│ ├── gui/ # PySide6 user interface |
| 59 | +│ │ ├── main_window_clean_dark.py # Main application window |
| 60 | +│ │ ├── settings_dialog.py # API key management |
| 61 | +│ │ ├── history_widget.py # Run history sidebar |
| 62 | +│ │ ├── dark_design_system.py # Colors, typography, spacing |
| 63 | +│ │ └── styles/ # Qt stylesheets |
| 64 | +│ ├── models/ # Data classes (RunResult, ModelInfo, etc.) |
| 65 | +│ └── utils/ # Config (TOML), cache (TTL), logger (JSONL) |
| 66 | +└── tests/ # Unit and integration tests |
| 67 | +``` |
35 | 68 |
|
36 | | -Set your API keys through the Settings dialog in the application or via environment variables: |
| 69 | +### How It Works |
37 | 70 |
|
38 | | -- `OPENROUTER_API_KEY`: Your OpenRouter API key |
39 | | -- `GROQ_API_KEY`: Your Groq API key |
40 | | -- `OPENAI_API_KEY`: Your OpenAI API key |
| 71 | +1. User selects models and enters a prompt |
| 72 | +2. `BenchmarkRunner` fires concurrent async requests via `httpx` |
| 73 | +3. Each `Client` streams the response, tracking TTFT and latency with `time.monotonic()` |
| 74 | +4. Results are displayed in real-time, with the fastest model highlighted |
| 75 | +5. Run data is logged to JSONL and stored in history for later comparison |
41 | 76 |
|
42 | | -### Settings |
| 77 | +### Key Design Decisions |
43 | 78 |
|
44 | | -Access settings through the gear icon in the application to configure: |
45 | | -- API keys for each provider |
46 | | -- Default temperature and max tokens |
47 | | -- UI preferences |
| 79 | +- **Async streaming** — `httpx.AsyncClient.stream()` for true streaming with accurate TTFT measurement |
| 80 | +- **Thread isolation** — Async event loops run in `QThread` workers to keep the GUI responsive |
| 81 | +- **Provider abstraction** — `BaseClient` handles all streaming/timing logic; subclasses only define endpoints and headers |
| 82 | +- **TTL cache** — Thread-safe cache prevents redundant model-listing API calls within 30 minutes |
48 | 83 |
|
49 | | -## Usage |
| 84 | +## Configuration |
50 | 85 |
|
51 | | -1. **Select Models**: Choose up to 4 models from the dropdown menus |
52 | | -2. **Configure Prompts**: Set your system prompt and user prompt |
53 | | -3. **Run Benchmark**: Click "Run Benchmark" to compare models |
54 | | -4. **View Results**: Results show total time, tokens/second, and response text |
55 | | -5. **History**: Click the history button to view and navigate previous runs |
| 86 | +Settings are saved to `~/.openrouter-bench/config.toml`: |
56 | 87 |
|
57 | | -## Architecture |
| 88 | +```toml |
| 89 | +[api_keys] |
| 90 | +openrouter = "sk-or-..." |
| 91 | +groq = "gsk_..." |
| 92 | +openai = "sk-..." |
58 | 93 |
|
| 94 | +[settings] |
| 95 | +temperature = 0.7 |
| 96 | +max_tokens = 1000 |
59 | 97 | ``` |
60 | | -llm-benchmark/ |
61 | | -├── src/ |
62 | | -│ ├── clients/ # API client implementations |
63 | | -│ ├── gui/ # User interface components |
64 | | -│ │ ├── main_window_clean_dark.py # Main application window |
65 | | -│ │ ├── history_widget.py # History tracking |
66 | | -│ │ └── dark_design_system.py # Design system |
67 | | -│ ├── models/ # Data models |
68 | | -│ └── utils/ # Utilities and helpers |
69 | | -├── main.py # Application entry point |
70 | | -└── requirements.txt # Python dependencies |
71 | | -``` |
72 | 98 |
|
73 | | -## Development |
| 99 | +Benchmark logs are appended to `~/.openrouter-bench/benchmark.jsonl`. |
| 100 | + |
| 101 | +## Testing |
74 | 102 |
|
75 | | -### Running Tests |
76 | 103 | ```bash |
77 | | -pytest tests/ |
78 | | -``` |
| 104 | +# Unit tests |
| 105 | +pytest tests/ -v |
79 | 106 |
|
80 | | -### Code Style |
81 | | -The project uses: |
82 | | -- Black for code formatting |
83 | | -- Type hints for better code clarity |
84 | | -- Async/await for concurrent API calls |
| 107 | +# Integration tests (requires API keys as env vars) |
| 108 | +export OPENROUTER_API_KEY=your_key |
| 109 | +export GROQ_API_KEY=your_key |
| 110 | +pytest tests/test_integration.py -v -m integration |
| 111 | +``` |
85 | 112 |
|
86 | | -## Contributing |
| 113 | +## Adding a New Provider |
87 | 114 |
|
88 | | -1. Fork the repository |
89 | | -2. Create a feature branch (`git checkout -b feature/amazing-feature`) |
90 | | -3. Commit your changes (`git commit -m 'Add amazing feature'`) |
91 | | -4. Push to the branch (`git push origin feature/amazing-feature`) |
92 | | -5. Open a Pull Request |
| 115 | +See [CONTRIBUTING.md](CONTRIBUTING.md) for a guide on extending ModelPulse with additional LLM providers. |
93 | 116 |
|
94 | | -## License |
| 117 | +## Tech Stack |
95 | 118 |
|
96 | | -This project is licensed under the MIT License - see the LICENSE file for details. |
| 119 | +- **Python 3.11+** with async/await |
| 120 | +- **PySide6** for the desktop GUI |
| 121 | +- **httpx** for async HTTP streaming |
| 122 | +- **Pydantic** for data validation |
| 123 | +- **TOML** for configuration persistence |
97 | 124 |
|
98 | | -## Acknowledgments |
| 125 | +## License |
99 | 126 |
|
100 | | -- Built with PySide6 for the GUI |
101 | | -- Uses httpx for async HTTP requests |
102 | | -- Design inspired by modern dark theme principles |
| 127 | +[MIT](LICENSE) |
0 commit comments