🚀 The feature, motivation and pitch
https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/bench/benchmark/utils/general.py#L172
is hardcoded as 1.2 so TRT-llm bench says version 1.2 no matter what.
TensorRT LLM Version: 1.2 Dtype: bfloat16 KV Cache Dtype: FP8 Quantization: NVFP4
Alternatives
match the TRT-llm version
Additional context
No response
Before submitting a new issue...