System Info
b300
Who can help?
for q in 8 16 32 64 128; do
n=$(ls cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/QkvE4m3OE2m1ForGencubin.cpp 2>/dev/null | grep -E "VarSeqQ${q}Kv128" | wc -l)
echo "Q${q}: ${n}"
done
res
Q8: 24
Q16: 24
Q32: 0
Q64: 0
Q128: 0
vllm-project/vllm#34988
Information
Tasks
Reproduction
.
Expected behavior
.
actual behavior
.
additional notes
vllm-project/vllm#34988
Before submitting a new issue...