Skip to content

Commit b6de01d

Browse files
DevStrategistclaude
andcommitted
Prepare repo for public release as portfolio project
- Remove ~3000 lines of dead code (unused main window implementations, duplicate design systems, broken entry point, unused widgets/animations) - Remove __pycache__ from git tracking - Remove debug/test scripts from project root - Remove .claude/ dev-specific config - Fix broken import in gui/__init__.py (referenced non-existent FluidMainWindow) - Rewrite README with architecture docs, badges, and usage guide - Add MIT LICENSE - Add CONTRIBUTING.md with provider extension guide - Add GitHub Actions CI workflow - Update .gitignore for Python/IDE/OS artifacts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 3b0ec26 commit b6de01d

48 files changed

Lines changed: 241 additions & 5381 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/settings.local.json

Lines changed: 0 additions & 19 deletions
This file was deleted.

.github/workflows/test.yml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
name: Tests
2+
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
7+
branches: [main]
8+
9+
jobs:
10+
test:
11+
runs-on: ubuntu-latest
12+
strategy:
13+
matrix:
14+
python-version: ["3.11", "3.12"]
15+
16+
steps:
17+
- uses: actions/checkout@v4
18+
19+
- name: Set up Python ${{ matrix.python-version }}
20+
uses: actions/setup-python@v5
21+
with:
22+
python-version: ${{ matrix.python-version }}
23+
24+
- name: Install dependencies
25+
run: |
26+
python -m pip install --upgrade pip
27+
pip install -r llm-benchmark/requirements.txt
28+
pip install pytest pytest-asyncio pytest-mock
29+
30+
- name: Run unit tests
31+
working-directory: llm-benchmark
32+
run: pytest tests/ -v --ignore=tests/test_integration.py

.gitignore

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Python
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
*.so
6+
*.egg-info/
7+
dist/
8+
build/
9+
10+
# Virtual environments
11+
env/
12+
venv/
13+
.env
14+
15+
# IDE
16+
.vscode/
17+
.idea/
18+
*.swp
19+
*.swo
20+
21+
# OS
22+
.DS_Store
23+
Thumbs.db
24+
25+
# Project specific
26+
*.log
27+
.cache/
28+
temp/
29+
.pytest_cache/
30+
.claude/

CONTRIBUTING.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Contributing to ModelPulse
2+
3+
Thanks for your interest in contributing! Here's how to get started.
4+
5+
## Development Setup
6+
7+
1. Clone the repo and install dependencies:
8+
9+
```bash
10+
git clone https://github.com/DevStrategist/ModelPulse.git
11+
cd ModelPulse/llm-benchmark
12+
pip install -r requirements.txt
13+
pip install -e ".[dev]"
14+
```
15+
16+
2. Run the app:
17+
18+
```bash
19+
python main.py
20+
```
21+
22+
3. Run the tests:
23+
24+
```bash
25+
pytest tests/ -v
26+
```
27+
28+
## Adding a New Provider
29+
30+
The client architecture makes it straightforward to add new LLM providers:
31+
32+
1. Create a new file in `src/clients/` (e.g., `anthropic_client.py`)
33+
2. Extend `BaseClient` and implement `get_models()` and `chat()`
34+
3. Register it in `src/clients/__init__.py`
35+
4. Add the source option in `src/gui/main_window_clean_dark.py`
36+
37+
See `src/clients/groq_client.py` for a clean example.
38+
39+
## Code Style
40+
41+
- Format with **Black** (`black --line-length 100`)
42+
- Lint with **Ruff** (`ruff check .`)
43+
- Use type hints where practical
44+
- Keep async/await patterns consistent with existing code
45+
46+
## Pull Requests
47+
48+
1. Fork the repo and create a feature branch
49+
2. Make your changes with clear commit messages
50+
3. Ensure tests pass (`pytest tests/ -v`)
51+
4. Open a PR with a description of what changed and why
52+
53+
## Reporting Issues
54+
55+
Open an issue with:
56+
- Steps to reproduce
57+
- Expected vs actual behavior
58+
- Python version and OS

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2024 DevStrategist
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 92 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -1,102 +1,127 @@
1-
# LLM Benchmark Tool
1+
# ModelPulse
22

3-
A powerful benchmarking tool for comparing performance across multiple Large Language Model providers including OpenRouter, Groq, and OpenAI.
3+
**Real-time LLM benchmarking tool** — compare model speed, cost, and quality side-by-side.
4+
5+
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
6+
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
7+
[![PySide6](https://img.shields.io/badge/GUI-PySide6-41CD52.svg)](https://doc.qt.io/qtforpython-6/)
8+
9+
ModelPulse is a desktop application that benchmarks Large Language Model providers head-to-head with real-time streaming. Select models, fire the same prompt at each, and instantly see which one is faster, cheaper, and better.
410

511
## Features
612

7-
- **Multi-Provider Support**: Compare models from OpenRouter, Groq Direct, and OpenAI Direct
8-
- **Real-time Performance Metrics**: Track time to first token (TTFT) and total latency
9-
- **Dark Theme UI**: Professional dark-mode interface with clean, efficient design
10-
- **History Tracking**: Navigate through previous benchmark runs with full state restoration
11-
- **Side-by-Side Comparison**: Run multiple models simultaneously for direct comparison
12-
- **Customizable Parameters**: Adjust temperature, max tokens, and system prompts
13+
- **Multi-provider support** — OpenRouter, Groq, and OpenAI in one tool
14+
- **Real-time streaming** — Watch responses arrive token-by-token with TTFT (time-to-first-token) tracking
15+
- **Side-by-side comparison** — Benchmark 2 models simultaneously on the same prompt
16+
- **Cost tracking** — Per-request USD cost calculated from provider pricing
17+
- **Performance metrics** — TTFT, total latency, tokens/second, input/output token counts
18+
- **History** — Browse and restore previous benchmark runs with full state
19+
- **Smart caching** — 30-minute TTL cache for model listings (no redundant API calls)
20+
- **Persistent config** — API keys and settings saved locally in TOML format
21+
- **Dark UI** — Professional navy-black theme with purple-violet accents
1322

14-
## Installation
23+
## Quick Start
1524

16-
1. Clone the repository:
17-
```bash
18-
git clone https://github.com/DevStrategist/LLMCompareV2.git
19-
cd LLMCompareV2/llm-benchmark
20-
```
25+
### Prerequisites
26+
27+
- Python 3.11+
28+
- At least one API key: [OpenRouter](https://openrouter.ai/settings/keys), [Groq](https://console.groq.com/keys), or [OpenAI](https://platform.openai.com/api-keys)
29+
30+
### Install
2131

22-
2. Install dependencies:
2332
```bash
33+
git clone https://github.com/DevStrategist/ModelPulse.git
34+
cd ModelPulse/llm-benchmark
2435
pip install -r requirements.txt
2536
```
2637

27-
3. Run the application:
38+
### Run
39+
2840
```bash
2941
python main.py
3042
```
3143

32-
## Configuration
44+
On first launch, click **Settings** to enter your API key(s). Select models in each panel, type a prompt, and hit **Run Benchmark** (or `Ctrl+Enter`).
45+
46+
## Architecture
3347

34-
### API Keys
48+
```
49+
llm-benchmark/
50+
├── main.py # Entry point
51+
├── src/
52+
│ ├── benchmark_runner.py # Orchestrates concurrent benchmark runs
53+
│ ├── clients/ # API client implementations
54+
│ │ ├── base_client.py # Abstract base with streaming logic
55+
│ │ ├── openrouter_client.py
56+
│ │ ├── groq_client.py
57+
│ │ └── openai_client.py
58+
│ ├── gui/ # PySide6 user interface
59+
│ │ ├── main_window_clean_dark.py # Main application window
60+
│ │ ├── settings_dialog.py # API key management
61+
│ │ ├── history_widget.py # Run history sidebar
62+
│ │ ├── dark_design_system.py # Colors, typography, spacing
63+
│ │ └── styles/ # Qt stylesheets
64+
│ ├── models/ # Data classes (RunResult, ModelInfo, etc.)
65+
│ └── utils/ # Config (TOML), cache (TTL), logger (JSONL)
66+
└── tests/ # Unit and integration tests
67+
```
3568

36-
Set your API keys through the Settings dialog in the application or via environment variables:
69+
### How It Works
3770

38-
- `OPENROUTER_API_KEY`: Your OpenRouter API key
39-
- `GROQ_API_KEY`: Your Groq API key
40-
- `OPENAI_API_KEY`: Your OpenAI API key
71+
1. User selects models and enters a prompt
72+
2. `BenchmarkRunner` fires concurrent async requests via `httpx`
73+
3. Each `Client` streams the response, tracking TTFT and latency with `time.monotonic()`
74+
4. Results are displayed in real-time, with the fastest model highlighted
75+
5. Run data is logged to JSONL and stored in history for later comparison
4176

42-
### Settings
77+
### Key Design Decisions
4378

44-
Access settings through the gear icon in the application to configure:
45-
- API keys for each provider
46-
- Default temperature and max tokens
47-
- UI preferences
79+
- **Async streaming**`httpx.AsyncClient.stream()` for true streaming with accurate TTFT measurement
80+
- **Thread isolation** — Async event loops run in `QThread` workers to keep the GUI responsive
81+
- **Provider abstraction**`BaseClient` handles all streaming/timing logic; subclasses only define endpoints and headers
82+
- **TTL cache** — Thread-safe cache prevents redundant model-listing API calls within 30 minutes
4883

49-
## Usage
84+
## Configuration
5085

51-
1. **Select Models**: Choose up to 4 models from the dropdown menus
52-
2. **Configure Prompts**: Set your system prompt and user prompt
53-
3. **Run Benchmark**: Click "Run Benchmark" to compare models
54-
4. **View Results**: Results show total time, tokens/second, and response text
55-
5. **History**: Click the history button to view and navigate previous runs
86+
Settings are saved to `~/.openrouter-bench/config.toml`:
5687

57-
## Architecture
88+
```toml
89+
[api_keys]
90+
openrouter = "sk-or-..."
91+
groq = "gsk_..."
92+
openai = "sk-..."
5893

94+
[settings]
95+
temperature = 0.7
96+
max_tokens = 1000
5997
```
60-
llm-benchmark/
61-
├── src/
62-
│ ├── clients/ # API client implementations
63-
│ ├── gui/ # User interface components
64-
│ │ ├── main_window_clean_dark.py # Main application window
65-
│ │ ├── history_widget.py # History tracking
66-
│ │ └── dark_design_system.py # Design system
67-
│ ├── models/ # Data models
68-
│ └── utils/ # Utilities and helpers
69-
├── main.py # Application entry point
70-
└── requirements.txt # Python dependencies
71-
```
7298

73-
## Development
99+
Benchmark logs are appended to `~/.openrouter-bench/benchmark.jsonl`.
100+
101+
## Testing
74102

75-
### Running Tests
76103
```bash
77-
pytest tests/
78-
```
104+
# Unit tests
105+
pytest tests/ -v
79106

80-
### Code Style
81-
The project uses:
82-
- Black for code formatting
83-
- Type hints for better code clarity
84-
- Async/await for concurrent API calls
107+
# Integration tests (requires API keys as env vars)
108+
export OPENROUTER_API_KEY=your_key
109+
export GROQ_API_KEY=your_key
110+
pytest tests/test_integration.py -v -m integration
111+
```
85112

86-
## Contributing
113+
## Adding a New Provider
87114

88-
1. Fork the repository
89-
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
90-
3. Commit your changes (`git commit -m 'Add amazing feature'`)
91-
4. Push to the branch (`git push origin feature/amazing-feature`)
92-
5. Open a Pull Request
115+
See [CONTRIBUTING.md](CONTRIBUTING.md) for a guide on extending ModelPulse with additional LLM providers.
93116

94-
## License
117+
## Tech Stack
95118

96-
This project is licensed under the MIT License - see the LICENSE file for details.
119+
- **Python 3.11+** with async/await
120+
- **PySide6** for the desktop GUI
121+
- **httpx** for async HTTP streaming
122+
- **Pydantic** for data validation
123+
- **TOML** for configuration persistence
97124

98-
## Acknowledgments
125+
## License
99126

100-
- Built with PySide6 for the GUI
101-
- Uses httpx for async HTTP requests
102-
- Design inspired by modern dark theme principles
127+
[MIT](LICENSE)

llm-benchmark/.gitignore

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ __pycache__/
77
env/
88
venv/
99
.env
10+
*.egg-info/
11+
dist/
12+
build/
1013

1114
# IDE
1215
.vscode/
@@ -21,4 +24,6 @@ Thumbs.db
2124
# Project specific
2225
*.log
2326
.cache/
24-
temp/
27+
temp/
28+
.pytest_cache/
29+
.claude/

0 commit comments

Comments
 (0)