DevStrategist
diff --git a/‎.claude/settings.local.json‎
Lines changed: 0 additions & 19 deletions b/‎.claude/settings.local.json‎
Lines changed: 0 additions & 19 deletions
diff --git a/‎.github/workflows/test.yml‎
Lines changed: 32 additions & 0 deletions b/‎.github/workflows/test.yml‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 30 additions & 0 deletions b/‎.gitignore‎
Lines changed: 30 additions & 0 deletions
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 58 additions & 0 deletions b/‎CONTRIBUTING.md‎
Lines changed: 58 additions & 0 deletions
diff --git a/‎LICENSE‎
Lines changed: 21 additions & 0 deletions b/‎LICENSE‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 92 additions & 67 deletions b/‎README.md‎
Lines changed: 92 additions & 67 deletions
diff --git a/‎llm-benchmark/.gitignore‎
Lines changed: 6 additions & 1 deletion b/‎llm-benchmark/.gitignore‎
Lines changed: 6 additions & 1 deletion
@@ -0,0 +1,32 @@
+name: Tests
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.11", "3.12"]
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -r llm-benchmark/requirements.txt
+          pip install pytest pytest-asyncio pytest-mock
+
+      - name: Run unit tests
+        working-directory: llm-benchmark
+        run: pytest tests/ -v --ignore=tests/test_integration.py
@@ -0,0 +1,30 @@
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+*.egg-info/
+dist/
+build/
+
+# Virtual environments
+env/
+venv/
+.env
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Project specific
+*.log
+.cache/
+temp/
+.pytest_cache/
+.claude/
@@ -0,0 +1,58 @@
+# Contributing to ModelPulse
+
+Thanks for your interest in contributing! Here's how to get started.
+
+## Development Setup
+
+1. Clone the repo and install dependencies:
+
+```bash
+git clone https://github.com/DevStrategist/ModelPulse.git
+cd ModelPulse/llm-benchmark
+pip install -r requirements.txt
+pip install -e ".[dev]"
+```
+
+2. Run the app:
+
+```bash
+python main.py
+```
+
+3. Run the tests:
+
+```bash
+pytest tests/ -v
+```
+
+## Adding a New Provider
+
+The client architecture makes it straightforward to add new LLM providers:
+
+1. Create a new file in `src/clients/` (e.g., `anthropic_client.py`)
+2. Extend `BaseClient` and implement `get_models()` and `chat()`
+3. Register it in `src/clients/__init__.py`
+4. Add the source option in `src/gui/main_window_clean_dark.py`
+
+See `src/clients/groq_client.py` for a clean example.
+
+## Code Style
+
+- Format with **Black** (`black --line-length 100`)
+- Lint with **Ruff** (`ruff check .`)
+- Use type hints where practical
+- Keep async/await patterns consistent with existing code
+
+## Pull Requests
+
+1. Fork the repo and create a feature branch
+2. Make your changes with clear commit messages
+3. Ensure tests pass (`pytest tests/ -v`)
+4. Open a PR with a description of what changed and why
+
+## Reporting Issues
+
+Open an issue with:
+- Steps to reproduce
+- Expected vs actual behavior
+- Python version and OS
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2024 DevStrategist
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -1,102 +1,127 @@
-# LLM Benchmark Tool
+# ModelPulse
 
-A powerful benchmarking tool for comparing performance across multiple Large Language Model providers including OpenRouter, Groq, and OpenAI.
+**Real-time LLM benchmarking tool** — compare model speed, cost, and quality side-by-side.
+
+[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
+[![PySide6](https://img.shields.io/badge/GUI-PySide6-41CD52.svg)](https://doc.qt.io/qtforpython-6/)
+
+ModelPulse is a desktop application that benchmarks Large Language Model providers head-to-head with real-time streaming. Select models, fire the same prompt at each, and instantly see which one is faster, cheaper, and better.
 
 ## Features
 
-- **Multi-Provider Support**: Compare models from OpenRouter, Groq Direct, and OpenAI Direct
-- **Real-time Performance Metrics**: Track time to first token (TTFT) and total latency
-- **Dark Theme UI**: Professional dark-mode interface with clean, efficient design
-- **History Tracking**: Navigate through previous benchmark runs with full state restoration
-- **Side-by-Side Comparison**: Run multiple models simultaneously for direct comparison
-- **Customizable Parameters**: Adjust temperature, max tokens, and system prompts
+- **Multi-provider support** — OpenRouter, Groq, and OpenAI in one tool
+- **Real-time streaming** — Watch responses arrive token-by-token with TTFT (time-to-first-token) tracking
+- **Side-by-side comparison** — Benchmark 2 models simultaneously on the same prompt
+- **Cost tracking** — Per-request USD cost calculated from provider pricing
+- **Performance metrics** — TTFT, total latency, tokens/second, input/output token counts
+- **History** — Browse and restore previous benchmark runs with full state
+- **Smart caching** — 30-minute TTL cache for model listings (no redundant API calls)
+- **Persistent config** — API keys and settings saved locally in TOML format
+- **Dark UI** — Professional navy-black theme with purple-violet accents
 
-## Installation
+## Quick Start
 
-1. Clone the repository:
-```bash
-git clone https://github.com/DevStrategist/LLMCompareV2.git
-cd LLMCompareV2/llm-benchmark
-```
+### Prerequisites
+
+- Python 3.11+
+- At least one API key: [OpenRouter](https://openrouter.ai/settings/keys), [Groq](https://console.groq.com/keys), or [OpenAI](https://platform.openai.com/api-keys)
+
+### Install
 
-2. Install dependencies:
 ```bash
+git clone https://github.com/DevStrategist/ModelPulse.git
+cd ModelPulse/llm-benchmark
 pip install -r requirements.txt
 ```
 
-3. Run the application:
+### Run
+
 ```bash
 python main.py
 ```
 
-## Configuration
+On first launch, click **Settings** to enter your API key(s). Select models in each panel, type a prompt, and hit **Run Benchmark** (or `Ctrl+Enter`).
+
+## Architecture
 
-### API Keys
+```
+llm-benchmark/
+├── main.py                    # Entry point
+├── src/
+│   ├── benchmark_runner.py    # Orchestrates concurrent benchmark runs
+│   ├── clients/               # API client implementations
+│   │   ├── base_client.py     # Abstract base with streaming logic
+│   │   ├── openrouter_client.py
+│   │   ├── groq_client.py
+│   │   └── openai_client.py
+│   ├── gui/                   # PySide6 user interface
+│   │   ├── main_window_clean_dark.py   # Main application window
+│   │   ├── settings_dialog.py          # API key management
+│   │   ├── history_widget.py           # Run history sidebar
+│   │   ├── dark_design_system.py       # Colors, typography, spacing
+│   │   └── styles/                     # Qt stylesheets
+│   ├── models/                # Data classes (RunResult, ModelInfo, etc.)
+│   └── utils/                 # Config (TOML), cache (TTL), logger (JSONL)
+└── tests/                     # Unit and integration tests
+```
 
-Set your API keys through the Settings dialog in the application or via environment variables:
+### How It Works
 
-- `OPENROUTER_API_KEY`: Your OpenRouter API key
-- `GROQ_API_KEY`: Your Groq API key  
-- `OPENAI_API_KEY`: Your OpenAI API key
+1. User selects models and enters a prompt
+2. `BenchmarkRunner` fires concurrent async requests via `httpx`
+3. Each `Client` streams the response, tracking TTFT and latency with `time.monotonic()`
+4. Results are displayed in real-time, with the fastest model highlighted
+5. Run data is logged to JSONL and stored in history for later comparison
 
-### Settings
+### Key Design Decisions
 
-Access settings through the gear icon in the application to configure:
-- API keys for each provider
-- Default temperature and max tokens
-- UI preferences
+- **Async streaming** — `httpx.AsyncClient.stream()` for true streaming with accurate TTFT measurement
+- **Thread isolation** — Async event loops run in `QThread` workers to keep the GUI responsive
+- **Provider abstraction** — `BaseClient` handles all streaming/timing logic; subclasses only define endpoints and headers
+- **TTL cache** — Thread-safe cache prevents redundant model-listing API calls within 30 minutes
 
-## Usage
+## Configuration
 
-1. **Select Models**: Choose up to 4 models from the dropdown menus
-2. **Configure Prompts**: Set your system prompt and user prompt
-3. **Run Benchmark**: Click "Run Benchmark" to compare models
-4. **View Results**: Results show total time, tokens/second, and response text
-5. **History**: Click the history button to view and navigate previous runs
+Settings are saved to `~/.openrouter-bench/config.toml`:
 
-## Architecture
+```toml
+[api_keys]
+openrouter = "sk-or-..."
+groq = "gsk_..."
+openai = "sk-..."
 
+[settings]
+temperature = 0.7
+max_tokens = 1000
 ```
-llm-benchmark/
-├── src/
-│   ├── clients/          # API client implementations
-│   ├── gui/              # User interface components
-│   │   ├── main_window_clean_dark.py  # Main application window
-│   │   ├── history_widget.py          # History tracking
-│   │   └── dark_design_system.py      # Design system
-│   ├── models/           # Data models
-│   └── utils/            # Utilities and helpers
-├── main.py               # Application entry point
-└── requirements.txt      # Python dependencies
-```
 
-## Development
+Benchmark logs are appended to `~/.openrouter-bench/benchmark.jsonl`.
+
+## Testing
 
-### Running Tests
 ```bash
-pytest tests/
-```
+# Unit tests
+pytest tests/ -v
 
-### Code Style
-The project uses:
-- Black for code formatting
-- Type hints for better code clarity
-- Async/await for concurrent API calls
+# Integration tests (requires API keys as env vars)
+export OPENROUTER_API_KEY=your_key
+export GROQ_API_KEY=your_key
+pytest tests/test_integration.py -v -m integration
+```
 
-## Contributing
+## Adding a New Provider
 
-1. Fork the repository
-2. Create a feature branch (`git checkout -b feature/amazing-feature`)
-3. Commit your changes (`git commit -m 'Add amazing feature'`)
-4. Push to the branch (`git push origin feature/amazing-feature`)
-5. Open a Pull Request
+See [CONTRIBUTING.md](CONTRIBUTING.md) for a guide on extending ModelPulse with additional LLM providers.
 
-## License
+## Tech Stack
 
-This project is licensed under the MIT License - see the LICENSE file for details.
+- **Python 3.11+** with async/await
+- **PySide6** for the desktop GUI
+- **httpx** for async HTTP streaming
+- **Pydantic** for data validation
+- **TOML** for configuration persistence
 
-## Acknowledgments
+## License
 
-- Built with PySide6 for the GUI
-- Uses httpx for async HTTP requests
-- Design inspired by modern dark theme principles
+[MIT](LICENSE)
@@ -7,6 +7,9 @@ __pycache__/
 env/
 venv/
 .env
+*.egg-info/
+dist/
+build/
 
 # IDE
 .vscode/
@@ -21,4 +24,6 @@ Thumbs.db
 # Project specific
 *.log
 .cache/
-temp/
+temp/
+.pytest_cache/
+.claude/