Skip to content

Don't unconditionally set the sub group size.#436

Merged
maleadt merged 1 commit into
mainfrom
tb/sub_group_size
May 28, 2026
Merged

Don't unconditionally set the sub group size.#436
maleadt merged 1 commit into
mainfrom
tb/sub_group_size

Conversation

@maleadt
Copy link
Copy Markdown
Member

@maleadt maleadt commented May 28, 2026

intel_reqd_sub_group_size is currently unconditionally set to the device's
reported subgroup size (or a heuristic default). However, the spec mentions:

Note that there is no guarantee for the value of get_sub_group_size()
even when this attribute is present, particularly when the work-group size
is not evenly divisible by the required sub-group size.

Specifically, PoCL reports a subgroup count of 0 when using a work-group size
that's smaller than the chosen subgroup size:

julia> @opencl kernel();
get_num_sub_groups() = 1

julia> @opencl sub_group_size=32 kernel();
get_num_sub_groups() = 0

The above is with the fix from this PR already, which only sets the
attribute when explicitly requesting a subgroup size. Normally, PoCL
determines an appropriate subgroup size per launch, so revert to that
by not setting the attribute by default.

This bug broke the RNG, which queries the sub group count. FWIW, this only
surfaced on JuliaGPU/GPUCompiler.jl#812, because
previously the exception trap was simply removed by PoCL resulting in
the subsequent memory access simply happening as if there was no OOB.

x-ref #413 (cc @christiangnrd)
x-ref https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_required_subgroup_size.html
x-ref pocl/pocl#2181

 is currently unconditionally set to the device's
reported subgroup size (or a heuristic default). However, the spec mentions:

> Note that there is no guarantee for the value of get_sub_group_size()
> even when this attribute is present, particularly when the work-group size
> is not evenly divisible by the required sub-group size.

Specifically, PoCL reports a subgroup count of 0 when using a work-group size
that's smaller than the chosen subgroup size:

The above is with the fix from this PR already, which only sets the
attribute when explicitly requesting a subgroup size. Normally, PoCL
determines an appropriate subgroup size per launch, so revert to that
by not setting the attribute by default.

This bug broke the RNG, which queries the sub group count. FWIW, this only
surfaced on JuliaGPU/GPUCompiler.jl#812, because
previously the exception trap was simply removed by PoCL resulting in
the subsequent memory access simply happening as if there was no OOB.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 79.03%. Comparing base (67580be) to head (7f1fed8).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/compiler/compilation.jl 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #436      +/-   ##
==========================================
- Coverage   79.19%   79.03%   -0.17%     
==========================================
  Files          12       12              
  Lines         745      744       -1     
==========================================
- Hits          590      588       -2     
- Misses        155      156       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maleadt maleadt merged commit 73c4014 into main May 28, 2026
34 of 38 checks passed
@maleadt maleadt deleted the tb/sub_group_size branch May 28, 2026 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant