Skip to content

fix(benchmark): wire measured profiles into recommender#3

Open
Mog9 wants to merge 1 commit into
tensormux:mainfrom
Mog9:fix-orphaned-measured-benchmarks
Open

fix(benchmark): wire measured profiles into recommender#3
Mog9 wants to merge 1 commit into
tensormux:mainfrom
Mog9:fix-orphaned-measured-benchmarks

Conversation

@Mog9

@Mog9 Mog9 commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

Problem

The recommender was using 100% estimated data — 28 benchmark profiles, all marked "source": "estimated". Meanwhile, 4 measured profiles (real benchmark data from actual RTX 4070 runs) were sitting in benchmarks/profiles/measured/ but never loaded into the system.

Three checkpoints were blocking the measured data:

  1. BenchmarkStore used glob("*.json") which only finds files directly in benchmarks/profiles/, not in subdirectories like measured/
  2. GpuTier enum had no RTX_4070 entry, so even if loaded, the profiles would fail validation
  3. RuntimeRegistry didn't list rtx4070 as a supported GPU tier for vLLM, so the candidate generator would filter them out

Changes

app/services/benchmark_store/store.py

Changed glob("*.json") to rglob("*.json") so the store recursively scans subdirectories and finds the measured profiles.

app/schemas/hardware.py

Added RTX_4070 = "rtx4070" to the GpuTier enum and added a GpuSpec entry to GPU_CATALOG:

  • VRAM: 12 GB
  • FP16 TFLOPS: 44
  • Memory bandwidth: 432 GB/s
  • Hourly cost: $0.00 (local GPU, matching the measured profiles)

app/services/runtime_registry/registry.py

Added "rtx4070" to vLLM's supported_gpu_tiers list. Not added to TensorRT-LLM (no TRT-LLM data for RTX 4070).

tests/test_engine.py

Updated 2 tests that broke because RTX 4070 at $0/hr is now a valid candidate:

  • test_cost_optimized_recommendation — now accepts "rtx4070" as a valid budget GPU
  • test_recommendation_response_shape — changed > 0 to >= 0 for hourly cost (local GPU is free)

Benchmarks Now Active

The 4 measured profiles are now flowing through the recommender:

Model GPU Backend Quantization Tokens/sec Source
qwen2.5-7b RTX 4070 vLLM AWQ 79.4 measured
qwen2.5-3b RTX 4070 vLLM FP16 66.4 measured
llama3.1-8b RTX 4070 vLLM AWQ 81.0 measured
llama3.2-3b RTX 4070 vLLM AWQ 113.0 measured

Total profiles: 32 (was 28)
Measured profiles: 4 (was 0)
Estimated profiles: 28 (unchanged)

Test Results

149 passed, 1 skipped

All existing tests pass with the updated assertions.

Impact

  • The recommender now has real benchmark data for RTX 4070, not just estimates
  • Users can get recommendations for local GPU deployments (RTX 4070 at $0/hr)
  • Cost-optimized workloads may now pick RTX 4070 over cloud GPUs
  • No breaking changes to existing profiles or scoring logic

Three checkpoints were blocking measured benchmark data from reaching
the recommender:

1. BenchmarkStore used glob() instead of rglob(), so profiles in
   benchmarks/profiles/measured/ were never loaded
2. GpuTier enum had no RTX_4070 entry, causing ValueError on load
3. RuntimeRegistry didn't list rtx4070 as a vLLM-supported GPU tier

All three are now fixed. The 4 measured RTX 4070 profiles (qwen2.5-7b,
qwen2.5-3b, llama3.1-8b, llama3.2-3b) are now active in the recommender.

Total profiles: 32 (was 28). Measured: 4 (was 0).
Tests updated: 2 (cost optimization and response shape now account for
$0/hr local GPU).

All 149 tests pass.
@Mog9 Mog9 force-pushed the fix-orphaned-measured-benchmarks branch from 98c7dbc to 06ee988 Compare June 13, 2026 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant