Skip to content

Add benchmark capabilities for ops.#346

Draft
neoblizz wants to merge 38 commits intomainfrom
neoblizz/iris-xops-perf
Draft

Add benchmark capabilities for ops.#346
neoblizz wants to merge 38 commits intomainfrom
neoblizz/iris-xops-perf

Conversation

@neoblizz
Copy link
Member

@neoblizz neoblizz commented Feb 3, 2026

Motivation

Add benchmarking capabilities for iris.ops.

@github-actions github-actions bot added in-progress We are working on it iris Iris project issue labels Feb 3, 2026
Ryan Swann and others added 10 commits March 3, 2026 16:32
…e temp files

- Restore optional `hint` parameter in `__translate` and all public
  iris API functions (load, store, get, put, copy, atomic_*) to match
  main branch pattern. The previous hardcoded `tl.multiple_of(ptr, (32, 32))`
  assumed 2D pointers and broke all scalar-pointer atomic operations.
- Align tritonBLAS commit across pyproject.toml, run_tests.sh,
  apptainer/iris.def, and docker/Dockerfile to cd119279f.
- Remove tracked backup files (iris.py.backup, all_gather_matmul.py.with_chunked)
  and add gitignore patterns.
- Remove unimplemented "chunked" variant from test_all_gather_matmul parametrization.
- Fix test_matmul_all_reduce_via_shmem_ops dimensions (N=128->256) to match
  new default block_size_n=256.
- Remove phantom "matmul" from iris/ops/__init__.py __all__.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve conflicts by adopting main's approach:
- Remove manual tritonBLAS from containers and run_tests.sh (handled
  via pyproject.toml dependency)
- Use torchrun for test execution (from main)
- Keep main's shorter docstring for hint parameter in iris.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Examples 28 (matmul_all_reduce) and 30 (matmul_all_gather) used N=128
as default, which is smaller than the new FusedConfig default
block_size_n=256. This triggers assertion failures (N >= block_size_n)
in CI, crashing all ranks and causing the 8-rank test to hang for 179
minutes waiting for the dead rank.

Increase both examples' default N from 128 to 256 to match the new
config defaults.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Triton kernels already handle block_size > dimension via:
- tl.cdiv(N, BLOCK_SIZE_N) for grid sizing
- mask=(rn < N) on loads/stores
- tritonblas GemmContext.reduce_axis handles K masking

The assertions were preventing valid configurations (e.g., block_size_n=256
with N=128) that the kernels handle correctly. Removed for_problem()
clamping too — it's unnecessary when the kernels already mask.

Fixes CI failures on examples 28 and 30 which use N=128 with default
FusedConfig block_size_n=256.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

in-progress We are working on it iris Iris project issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants