Add benchmark capabilities for ops. by neoblizz · Pull Request #346 · ROCm/iris

neoblizz · 2026-02-03T17:39:22Z

Motivation

Add benchmarking capabilities for iris.ops.

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…yaswann/iris_xops_perf

…e temp files - Restore optional `hint` parameter in `__translate` and all public iris API functions (load, store, get, put, copy, atomic_*) to match main branch pattern. The previous hardcoded `tl.multiple_of(ptr, (32, 32))` assumed 2D pointers and broke all scalar-pointer atomic operations. - Align tritonBLAS commit across pyproject.toml, run_tests.sh, apptainer/iris.def, and docker/Dockerfile to cd119279f. - Remove tracked backup files (iris.py.backup, all_gather_matmul.py.with_chunked) and add gitignore patterns. - Remove unimplemented "chunked" variant from test_all_gather_matmul parametrization. - Fix test_matmul_all_reduce_via_shmem_ops dimensions (N=128->256) to match new default block_size_n=256. - Remove phantom "matmul" from iris/ops/__init__.py __all__. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolve conflicts by adopting main's approach: - Remove manual tritonBLAS from containers and run_tests.sh (handled via pyproject.toml dependency) - Use torchrun for test execution (from main) - Keep main's shorter docstring for hint parameter in iris.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Examples 28 (matmul_all_reduce) and 30 (matmul_all_gather) used N=128 as default, which is smaller than the new FusedConfig default block_size_n=256. This triggers assertion failures (N >= block_size_n) in CI, crashing all ranks and causing the 8-rank test to hang for 179 minutes waiting for the dead rank. Increase both examples' default N from 128 to 256 to match the new config defaults. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…256" This reverts commit 477b472.

The Triton kernels already handle block_size > dimension via: - tl.cdiv(N, BLOCK_SIZE_N) for grid sizing - mask=(rn < N) on loads/stores - tritonblas GemmContext.reduce_axis handles K masking The assertions were preventing valid configurations (e.g., block_size_n=256 with N=128) that the kernels handle correctly. Removed for_problem() clamping too — it's unnecessary when the kernels already mask. Fixes CI failures on examples 28 and 30 which use N=128 with default FusedConfig block_size_n=256.

Add benchmark capabilities for ops.

595423d

github-actions bot added in-progress We are working on it iris Iris project issue labels Feb 3, 2026

neoblizz and others added 27 commits February 7, 2026 11:03

Merge branch 'main' into neoblizz/iris-xops-perf

8c965a1

Merge conflicts.

ef227b0

Up the tritonBLAS commit.

f132ceb

...

1628a61

Apply Ruff auto-fixes

c26e872

Fix load vectorization and transpose config

3d4c7d7

Apply Ruff auto-fixes

5b02211

Add HBM buffered version

4c3b3f4

Merge branch 'ryaswann/iris_xops_perf' of github.com:ROCm/iris into r…

a301392

…yaswann/iris_xops_perf

Apply Ruff auto-fixes

1f3b9ef

Use workgroup specialized variant

45288ff

Apply Ruff auto-fixes

b2aadcd

Update hbm buffered all gather matmul

7b2321e

Merge branch 'ryaswann/iris_xops_perf' of github.com:ROCm/iris into r…

a4d845f

…yaswann/iris_xops_perf

Apply Ruff auto-fixes

9692222

Add tracing

44ebc97

Merge branch 'ryaswann/iris_xops_perf' of github.com:ROCm/iris into r…

0c2842e

…yaswann/iris_xops_perf

Apply Ruff auto-fixes

11d017a

Add stages to all_gather_matmul_hbm_buffer

ace40d0

Merge branch 'ryaswann/iris_xops_perf' of github.com:ROCm/iris into r…

950c3a0

…yaswann/iris_xops_perf

Apply Ruff auto-fixes

f7612bd

Updates to benchmark and kernel

51bccb5

Merge branch 'ryaswann/iris_xops_perf' of github.com:ROCm/iris into r…

9b71523

…yaswann/iris_xops_perf

Apply Ruff auto-fixes

cbe2aff

Add predictive params, fix pointer overflows, fix race conditions

11d9001

Apply Ruff auto-fixes

3c4cb4d

Merge branch 'neoblizz/iris-xops-perf' into ryaswann/iris_xops_perf

f2f755a

Ryan Swann and others added 10 commits March 3, 2026 16:32

Reverse 2D block translate

77eff5b

Properly use iris tracing APIs

dcafd2a

Apply Ruff auto-fixes

6fdad6d

Remove test.sh

08755b7

All gather matmul with improved performance. (#415)

88f7767

Revert "Fix CI: increase default N to match FusedConfig block_size_n=…

76cc30d

…256" This reverts commit 477b472.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark capabilities for ops.#346

Add benchmark capabilities for ops.#346
neoblizz wants to merge 38 commits intomainfrom
neoblizz/iris-xops-perf

neoblizz commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

neoblizz commented Feb 3, 2026

Motivation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants