Tune a8w8_blockscale_bpreshuffle_tuned_gemm for gfx942(MI308X)#2172
Tune a8w8_blockscale_bpreshuffle_tuned_gemm for gfx942(MI308X)#2172chuanbowang2026 wants to merge 24 commits intoROCm:mainfrom
Conversation
chuanbowang2026
commented
Mar 4, 2026
- Tune a8w8_blockscale_bpreshuffle_tuned_gemm for gfx942 (MI308X)
- Update tuned/untuned CSV configs
- Update tuner splitK behavior for ASM
- Ran tuner and checked output CSV format."
|
Addressed, removed redundant local definitions and imported from aiter.jit.core directly. |
|
@chuanbowang2026 please resolve the failed ci |
|
Addressed CI lint failures from Changes in this update:
No functional tuning logic was changed; this is a lint/format cleanup to pass CI. |
|
Thank you, it has been completed: |
|
Hi @chuanbowang2026 , I commit "add_mi355_tuned" |
11cff67 to
7a64d88
Compare
|
let's split config to per model csv |
- Add a8w8_blockscale_bpreshuffle_tuned_gemm_dsv3.csv (MI308/MI355 tuned configs) - Add a8w8_blockscale_bpreshuffle_untuned_gemm_dsv3.csv (M,N,K shapes for tuning) - Update a8w8_blockscale_bpreshuffle_tuned/untuned_gemm.csv
- Keep a8w8_blockscale_bpreshuffle_tuned_gemm_dsv3.csv (MI308/MI355 DSv3 results) - Keep tuned/untuned_gemm.csv with headers for config merge compatibility - Add headers to root tuned/untuned for model_configs merge to work
Auto-select cu80/cu256 tuned and untuned files for blockscale bpreshuffle tuning and codegen so each machine only consumes its own config set.
|
Split DSV3 blockscale bpreshuffle tuned/untuned configs into cu80 and cu256 variants, and updated tuning/codegen to auto-select the machine-specific config by CU count. Because the previously tuned results were concatenated, the untuned results were run separately, resulting in a timeout of 1 hour. |
Drop the temporary cu80/cu256 split config flow and restore the merged blockscale bpreshuffle tuning/codegen paths so CI keeps using the original non-retuning config layout.
|
This CI timeout issue has also been observed by others. One suspicion is that MAX_JOBS is set too low for forked PRs, which may cause longer runtimes on some GPUs. |
The merge-base changed after approval.
|
This PR is no longer needed due to history issues. Please submit it now and transfer it to #2366 To prevent data loss, this PR will be closed after the new PR is completed. |
