[Bug]: [trtllm-gen fmha] Missing OE2m1 ForGen VarSeqQ32/64/128 kernels + E2M1 hard-gated selector causes high-throughput decode regression

### System Info

b300

### Who can help?

for q in 8 16 32 64 128; do
  n=$(ls cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*QkvE4m3*OE2m1*ForGen*cubin.cpp 2>/dev/null | grep -E "VarSeqQ${q}Kv128" | wc -l)
  echo "Q${q}: ${n}"
done  


res
   Q8: 24
Q16: 24
Q32: 0
Q64: 0
Q128: 0

https://github.com/vllm-project/vllm/issues/34988

### Information

- [x] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

[ .](https://github.com/vllm-project/vllm/issues/34988)

### Expected behavior

 .

### actual behavior

.

### additional notes

https://github.com/vllm-project/vllm/issues/34988

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: [trtllm-gen fmha] Missing OE2m1 ForGen VarSeqQ32/64/128 kernels + E2M1 hard-gated selector causes high-throughput decode regression #11620

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: [trtllm-gen fmha] Missing OE2m1 ForGen VarSeqQ32/64/128 kernels + E2M1 hard-gated selector causes high-throughput decode regression #11620

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions