fix(benchmark): wire measured profiles into recommender by Mog9 · Pull Request #3 · tensormux/Tensorpath

Mog9 · 2026-06-13T07:44:23Z

Problem

The recommender was using 100% estimated data — 28 benchmark profiles, all marked "source": "estimated". Meanwhile, 4 measured profiles (real benchmark data from actual RTX 4070 runs) were sitting in benchmarks/profiles/measured/ but never loaded into the system.

Three checkpoints were blocking the measured data:

BenchmarkStore used glob("*.json") which only finds files directly in benchmarks/profiles/, not in subdirectories like measured/
GpuTier enum had no RTX_4070 entry, so even if loaded, the profiles would fail validation
RuntimeRegistry didn't list rtx4070 as a supported GPU tier for vLLM, so the candidate generator would filter them out

Changes

`app/services/benchmark_store/store.py`

Changed glob("*.json") to rglob("*.json") so the store recursively scans subdirectories and finds the measured profiles.

`app/schemas/hardware.py`

Added RTX_4070 = "rtx4070" to the GpuTier enum and added a GpuSpec entry to GPU_CATALOG:

VRAM: 12 GB
FP16 TFLOPS: 44
Memory bandwidth: 432 GB/s
Hourly cost: $0.00 (local GPU, matching the measured profiles)

`app/services/runtime_registry/registry.py`

Added "rtx4070" to vLLM's supported_gpu_tiers list. Not added to TensorRT-LLM (no TRT-LLM data for RTX 4070).

`tests/test_engine.py`

Updated 2 tests that broke because RTX 4070 at $0/hr is now a valid candidate:

test_cost_optimized_recommendation — now accepts "rtx4070" as a valid budget GPU
test_recommendation_response_shape — changed > 0 to >= 0 for hourly cost (local GPU is free)

Benchmarks Now Active

The 4 measured profiles are now flowing through the recommender:

Model	GPU	Backend	Quantization	Tokens/sec	Source
`qwen2.5-7b`	RTX 4070	vLLM	AWQ	79.4	measured
`qwen2.5-3b`	RTX 4070	vLLM	FP16	66.4	measured
`llama3.1-8b`	RTX 4070	vLLM	AWQ	81.0	measured
`llama3.2-3b`	RTX 4070	vLLM	AWQ	113.0	measured

Total profiles: 32 (was 28)
Measured profiles: 4 (was 0)
Estimated profiles: 28 (unchanged)

Test Results

149 passed, 1 skipped

All existing tests pass with the updated assertions.

Impact

The recommender now has real benchmark data for RTX 4070, not just estimates
Users can get recommendations for local GPU deployments (RTX 4070 at $0/hr)
Cost-optimized workloads may now pick RTX 4070 over cloud GPUs
No breaking changes to existing profiles or scoring logic

Three checkpoints were blocking measured benchmark data from reaching the recommender: 1. BenchmarkStore used glob() instead of rglob(), so profiles in benchmarks/profiles/measured/ were never loaded 2. GpuTier enum had no RTX_4070 entry, causing ValueError on load 3. RuntimeRegistry didn't list rtx4070 as a vLLM-supported GPU tier All three are now fixed. The 4 measured RTX 4070 profiles (qwen2.5-7b, qwen2.5-3b, llama3.1-8b, llama3.2-3b) are now active in the recommender. Total profiles: 32 (was 28). Measured: 4 (was 0). Tests updated: 2 (cost optimization and response shape now account for $0/hr local GPU). All 149 tests pass.

Mog9 force-pushed the fix-orphaned-measured-benchmarks branch from 98c7dbc to 06ee988 Compare June 13, 2026 07:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(benchmark): wire measured profiles into recommender#3

fix(benchmark): wire measured profiles into recommender#3
Mog9 wants to merge 1 commit into
tensormux:mainfrom
Mog9:fix-orphaned-measured-benchmarks

Mog9 commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mog9 commented Jun 13, 2026

Problem

Changes

app/services/benchmark_store/store.py

app/schemas/hardware.py

app/services/runtime_registry/registry.py

tests/test_engine.py

Benchmarks Now Active

Test Results

Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`app/services/benchmark_store/store.py`

`app/schemas/hardware.py`

`app/services/runtime_registry/registry.py`

`tests/test_engine.py`