fix(benchmark): wire measured profiles into recommender#3
Open
Mog9 wants to merge 1 commit into
Open
Conversation
Three checkpoints were blocking measured benchmark data from reaching the recommender: 1. BenchmarkStore used glob() instead of rglob(), so profiles in benchmarks/profiles/measured/ were never loaded 2. GpuTier enum had no RTX_4070 entry, causing ValueError on load 3. RuntimeRegistry didn't list rtx4070 as a vLLM-supported GPU tier All three are now fixed. The 4 measured RTX 4070 profiles (qwen2.5-7b, qwen2.5-3b, llama3.1-8b, llama3.2-3b) are now active in the recommender. Total profiles: 32 (was 28). Measured: 4 (was 0). Tests updated: 2 (cost optimization and response shape now account for $0/hr local GPU). All 149 tests pass.
98c7dbc to
06ee988
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The recommender was using 100% estimated data — 28 benchmark profiles, all marked
"source": "estimated". Meanwhile, 4 measured profiles (real benchmark data from actual RTX 4070 runs) were sitting inbenchmarks/profiles/measured/but never loaded into the system.Three checkpoints were blocking the measured data:
glob("*.json")which only finds files directly inbenchmarks/profiles/, not in subdirectories likemeasured/RTX_4070entry, so even if loaded, the profiles would fail validationrtx4070as a supported GPU tier for vLLM, so the candidate generator would filter them outChanges
app/services/benchmark_store/store.pyChanged
glob("*.json")torglob("*.json")so the store recursively scans subdirectories and finds the measured profiles.app/schemas/hardware.pyAdded
RTX_4070 = "rtx4070"to theGpuTierenum and added aGpuSpecentry toGPU_CATALOG:app/services/runtime_registry/registry.pyAdded
"rtx4070"to vLLM'ssupported_gpu_tierslist. Not added to TensorRT-LLM (no TRT-LLM data for RTX 4070).tests/test_engine.pyUpdated 2 tests that broke because RTX 4070 at $0/hr is now a valid candidate:
test_cost_optimized_recommendation— now accepts"rtx4070"as a valid budget GPUtest_recommendation_response_shape— changed> 0to>= 0for hourly cost (local GPU is free)Benchmarks Now Active
The 4 measured profiles are now flowing through the recommender:
qwen2.5-7bqwen2.5-3bllama3.1-8bllama3.2-3bTotal profiles: 32 (was 28)
Measured profiles: 4 (was 0)
Estimated profiles: 28 (unchanged)
Test Results
149 passed, 1 skipped
All existing tests pass with the updated assertions.
Impact