Skip to content

Conversation

@ltqin
Copy link
Collaborator

@ltqin ltqin commented Dec 1, 2025

Proposed changes

Implement fp8 block scale quantization for fmha fwd

Checklist

./build/bin/tile_example_fmha_fwd -init=3 -b=16 -s=256 -s_k=1024 -h=16 -h_k=1 -d=128 -prec=fp8bf16 -vlayout=r -qscale=2 -kname=1 -mode=0 -v=1 -operm=1 -iperm=1

@ltqin ltqin requested a review from rocking5566 December 3, 2025 00:26
@poyenc
Copy link
Contributor

poyenc commented Jan 19, 2026

LGTM. Please make this PR won't introduce performance regression before mergin it.

poyenc
poyenc previously approved these changes Jan 19, 2026
@ltqin ltqin enabled auto-merge (squash) January 20, 2026 06:21
@ltqin ltqin disabled auto-merge January 20, 2026 06:22
@ltqin ltqin enabled auto-merge (squash) January 20, 2026 14:09
@ltqin ltqin disabled auto-merge January 20, 2026 14:09
@illsilin
Copy link
Collaborator

Please let's make sure we don't break AITER with these changes!

@poyenc
Copy link
Contributor

poyenc commented Jan 21, 2026

@ltqin could you update the CHANGELOG.md as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants