Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions ggml/src/ggml-cuda/fattn.cu
Original file line number Diff line number Diff line change
Expand Up @@ -519,6 +519,14 @@ static best_fattn_kernel ggml_cuda_get_best_fattn_kernel(const int device, const
return BEST_FATTN_KERNEL_VEC;
}
}
// MMA template instances for (DKQ=512, DV=512) only exist with ncols2 in {4, 8}
// (see template-instances/generate_cu_files.py). When gqa_ratio < 3 the MMA
// dispatcher in switch_ncols2<512,512> falls through to GGML_ABORT (fattn.cu:109).
// fattn-tile has DKQ=DV=512 with ncols2 fallback to {2,1} (commit 425db5b),
// so route DKQ=512 + low gqa_ratio to TILE (e.g. Gemma 4 E4B with gqa_ratio=2).
if (Q->ne[0] == 512 && gqa_ratio < 3) {
return BEST_FATTN_KERNEL_TILE;
}
return BEST_FATTN_KERNEL_MMA_F16;
}

Expand Down
Loading