Tune a8w8_blockscale_bpreshuffle_tuned_gemm for gfx942(MI308X) by chuanbowang2026 · Pull Request #2172 · ROCm/aiter

chuanbowang2026 · 2026-03-04T10:06:40Z

Tune a8w8_blockscale_bpreshuffle_tuned_gemm for gfx942 (MI308X)
Update tuned/untuned CSV configs
Update tuner splitK behavior for ASM
Ran tuner and checked output CSV format."

chuanbowang2026 · 2026-03-04T10:24:56Z

Addressed, removed redundant local definitions and imported from aiter.jit.core directly.

valarLip · 2026-03-04T15:17:40Z

@chuanbowang2026 please resolve the failed ci

chuanbowang2026 · 2026-03-05T03:10:55Z

Addressed CI lint failures from black and ruff in gemm_a8w8_blockscale_bpreshuffle_tune.py.

Changes in this update:

Added file-level # ruff: noqa: E402 to allow intentional import ordering (we modify sys.path before module imports).
Removed unused local variables reported by Ruff (F841) in tune().
Cleaned up comment style and removed stale commented-out lines.
Applied Black formatting only.

No functional tuning logic was changed; this is a lint/format cleanup to pass CI.

yzhou103 · 2026-03-05T08:16:21Z

aiter/csrc/ck_gemm_a8w8_blockscale_bpreshuffle/gen_instances.py should be updated, you refer to gen_instances.py in ck_gemm_a88_bpreshuffle

we should filter asm solutions out when generating ck lookup file.

chuanbowang2026 · 2026-03-06T02:34:28Z

Thank you, it has been completed:
Set asm_kernel_id to start from 0 in gemm_a8w8_blockscale_bpreshuffle_tune.py.
In gen_instances.py, filter tuned results with libtype == "ck" before building the CK lookup table, so ASM solutions are excluded from CK lookup generation.

amd-ruitang3 · 2026-03-06T09:12:29Z

Hi @chuanbowang2026 , I commit "add_mi355_tuned"

valarLip · 2026-03-07T12:27:31Z

let's split config to per model csv

- Add a8w8_blockscale_bpreshuffle_tuned_gemm_dsv3.csv (MI308/MI355 tuned configs) - Add a8w8_blockscale_bpreshuffle_untuned_gemm_dsv3.csv (M,N,K shapes for tuning) - Update a8w8_blockscale_bpreshuffle_tuned/untuned_gemm.csv

- Keep a8w8_blockscale_bpreshuffle_tuned_gemm_dsv3.csv (MI308/MI355 DSv3 results) - Keep tuned/untuned_gemm.csv with headers for config merge compatibility - Add headers to root tuned/untuned for model_configs merge to work

Auto-select cu80/cu256 tuned and untuned files for blockscale bpreshuffle tuning and codegen so each machine only consumes its own config set.

chuanbowang2026 · 2026-03-11T09:56:09Z

Split DSV3 blockscale bpreshuffle tuned/untuned configs into cu80 and cu256 variants, and updated tuning/codegen to auto-select the machine-specific config by CU count. Because the previously tuned results were concatenated, the untuned results were run separately, resulting in a timeout of 1 hour.

Drop the temporary cu80/cu256 split config flow and restore the merged blockscale bpreshuffle tuning/codegen paths so CI keeps using the original non-retuning config layout.

chuanbowang2026 · 2026-03-16T03:14:15Z

This CI timeout issue has also been observed by others. One suspicion is that MAX_JOBS is set too low for forked PRs, which may cause longer runtimes on some GPUs.
Huang, Xin is currently verifying this.

The merge-base changed after approval.

chuanbowang2026 · 2026-03-20T09:49:59Z

This PR is no longer needed due to history issues. Please submit it now and transfer it to #2366 To prevent data loss, this PR will be closed after the new PR is completed.

Tune a8w8_blockscale_bpreshuffle_tuned_gemm for gfx942(MI308X)

53bfe6a

chuanbowang2026 requested a review from a team March 4, 2026 10:06

Remove redundant AITER config/get_asm_dir definitions

683a7ee

Format blockscale bpreshuffle tuner with black

4587c3c

chuanbowang2026 and others added 2 commits March 5, 2026 11:34

Merge branch 'main' into tune/a8w8-blockscale-bpreshuffle-gfx942-mi308x

f85cb41

Merge branch 'main' into tune/a8w8-blockscale-bpreshuffle-gfx942-mi308x

8cd1ce8

junxiaguo requested review from DDEle and yzhou103 and removed request for DDEle March 5, 2026 07:58

yzhou103 reviewed Mar 5, 2026

View reviewed changes

Comment thread csrc/ck_gemm_a8w8_blockscale_bpreshuffle/gemm_a8w8_blockscale_bpreshuffle_tune.py Outdated

Address review comments for blockscale bpreshuffle

7a64d88

amd-ruitang3 force-pushed the tune/a8w8-blockscale-bpreshuffle-gfx942-mi308x branch from 11cff67 to 7a64d88 Compare March 6, 2026 12:36

amd-ruitang3 and others added 8 commits March 7, 2026 09:47

update_mi355_dsv3

3f48668

Merge DSv3 MI308/MI355 results into model_configs

8ed8b67

- Add a8w8_blockscale_bpreshuffle_tuned_gemm_dsv3.csv (MI308/MI355 tuned configs) - Add a8w8_blockscale_bpreshuffle_untuned_gemm_dsv3.csv (M,N,K shapes for tuning) - Update a8w8_blockscale_bpreshuffle_tuned/untuned_gemm.csv

Merge upstream main, resolve conflicts for DSv3 configs

af2c444

- Keep a8w8_blockscale_bpreshuffle_tuned_gemm_dsv3.csv (MI308/MI355 DSv3 results) - Keep tuned/untuned_gemm.csv with headers for config merge compatibility - Add headers to root tuned/untuned for model_configs merge to work

Merge branch 'main' into tune/a8w8-blockscale-bpreshuffle-gfx942-mi308x

a81368d

Merge branch 'main' into tune/a8w8-blockscale-bpreshuffle-gfx942-mi308x

d201954

Merge branch 'main' into tune/a8w8-blockscale-bpreshuffle-gfx942-mi308x

b4d0001

Split blockscale bpreshuffle configs by CU count.

8c6c59d

Auto-select cu80/cu256 tuned and untuned files for blockscale bpreshuffle tuning and codegen so each machine only consumes its own config set.

Merge branch 'main' into tune/a8w8-blockscale-bpreshuffle-gfx942-mi308x

f8e9543

chuanbowang2026 added 2 commits March 11, 2026 18:27

Merge branch 'main' into tune/a8w8-blockscale-bpreshuffle-gfx942-mi308x

308917e

Remove split blockscale bpreshuffle configs.

67daa75

Drop the temporary cu80/cu256 split config flow and restore the merged blockscale bpreshuffle tuning/codegen paths so CI keeps using the original non-retuning config layout.

chuanbowang2026 and others added 3 commits March 12, 2026 11:38

Merge branch 'main' into tune/a8w8-blockscale-bpreshuffle-gfx942-mi308x

700c880

Merge branch 'main' into tune/a8w8-blockscale-bpreshuffle-gfx942-mi308x

ad6eb6c

Merge branch 'main' into tune/a8w8-blockscale-bpreshuffle-gfx942-mi308x

e2d1bcd

gyohuangxin and others added 3 commits March 16, 2026 13:27

Merge branch 'main' into tune/a8w8-blockscale-bpreshuffle-gfx942-mi308x

7572e54

Merge branch 'main' into tune/a8w8-blockscale-bpreshuffle-gfx942-mi308x

9e30269

Merge branch 'main' into tune/a8w8-blockscale-bpreshuffle-gfx942-mi308x

41ffd72

valarLip added the ci:atom label Mar 17, 2026

chuanbowang2026 and others added 2 commits March 17, 2026 15:07

Merge branch 'main' into tune/a8w8-blockscale-bpreshuffle-gfx942-mi308x

b9da0c4

Merge branch 'main' into tune/a8w8-blockscale-bpreshuffle-gfx942-mi308x

bdd4755

yzhou103 previously approved these changes Mar 17, 2026

View reviewed changes

chuanbowang2026 mentioned this pull request Mar 20, 2026

Tune a8w8_blockscale_bpreshuffle_tuned_gemm for gfx942 (MI308X) #2366

Closed

chuanbowang2026 closed this Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tune a8w8_blockscale_bpreshuffle_tuned_gemm for gfx942(MI308X)#2172

Tune a8w8_blockscale_bpreshuffle_tuned_gemm for gfx942(MI308X)#2172
chuanbowang2026 wants to merge 24 commits intoROCm:mainfrom
chuanbowang2026:tune/a8w8-blockscale-bpreshuffle-gfx942-mi308x

chuanbowang2026 commented Mar 4, 2026

Uh oh!

chuanbowang2026 commented Mar 4, 2026

Uh oh!

valarLip commented Mar 4, 2026

Uh oh!

chuanbowang2026 commented Mar 5, 2026

Uh oh!

Uh oh!

yzhou103 commented Mar 5, 2026

Uh oh!

chuanbowang2026 commented Mar 6, 2026

Uh oh!

amd-ruitang3 commented Mar 6, 2026

Uh oh!

valarLip commented Mar 7, 2026

Uh oh!

chuanbowang2026 commented Mar 11, 2026

Uh oh!

chuanbowang2026 commented Mar 16, 2026

Uh oh!

chuanbowang2026 commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

chuanbowang2026 commented Mar 4, 2026

Uh oh!

chuanbowang2026 commented Mar 4, 2026

Uh oh!

valarLip commented Mar 4, 2026

Uh oh!

chuanbowang2026 commented Mar 5, 2026

Uh oh!

Uh oh!

yzhou103 commented Mar 5, 2026

Uh oh!

chuanbowang2026 commented Mar 6, 2026

Uh oh!

amd-ruitang3 commented Mar 6, 2026

Uh oh!

valarLip commented Mar 7, 2026

Uh oh!

chuanbowang2026 commented Mar 11, 2026

Uh oh!

chuanbowang2026 commented Mar 16, 2026

Uh oh!

chuanbowang2026 commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants