update attention in fused_forward, head blocking and add prefillonly transform by quic-mamta · Pull Request #857 · quic/efficient-transformers

quic-mamta · 2026-03-16T07:53:09Z

update attention in fused_forward
use head blocking
add prefillonly transform
update min_masked_attention_value

…rd, prefillonly transform Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com>

Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com>

update min_masked_attention_value, head blocking, attn in fused_forwa…

0662e58

…rd, prefillonly transform Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com>

quic-mamta requested review from ochougul, quic-amitraj, quic-hemagnih and quic-rishinr as code owners March 16, 2026 07:53

Add replicatekvhead transform

5711faa

Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update attention in fused_forward, head blocking and add prefillonly transform#857

update attention in fused_forward, head blocking and add prefillonly transform#857
quic-mamta wants to merge 2 commits intomla_fusionfrom
mla_fusion1

quic-mamta commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

quic-mamta commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants