Don't unconditionally set the sub group size.#436
Merged
Conversation
is currently unconditionally set to the device's reported subgroup size (or a heuristic default). However, the spec mentions: > Note that there is no guarantee for the value of get_sub_group_size() > even when this attribute is present, particularly when the work-group size > is not evenly divisible by the required sub-group size. Specifically, PoCL reports a subgroup count of 0 when using a work-group size that's smaller than the chosen subgroup size: The above is with the fix from this PR already, which only sets the attribute when explicitly requesting a subgroup size. Normally, PoCL determines an appropriate subgroup size per launch, so revert to that by not setting the attribute by default. This bug broke the RNG, which queries the sub group count. FWIW, this only surfaced on JuliaGPU/GPUCompiler.jl#812, because previously the exception trap was simply removed by PoCL resulting in the subsequent memory access simply happening as if there was no OOB.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #436 +/- ##
==========================================
- Coverage 79.19% 79.03% -0.17%
==========================================
Files 12 12
Lines 745 744 -1
==========================================
- Hits 590 588 -2
- Misses 155 156 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
intel_reqd_sub_group_sizeis currently unconditionally set to the device'sreported subgroup size (or a heuristic default). However, the spec mentions:
Specifically, PoCL reports a subgroup count of 0 when using a work-group size
that's smaller than the chosen subgroup size:
The above is with the fix from this PR already, which only sets the
attribute when explicitly requesting a subgroup size. Normally, PoCL
determines an appropriate subgroup size per launch, so revert to that
by not setting the attribute by default.
This bug broke the RNG, which queries the sub group count. FWIW, this only
surfaced on JuliaGPU/GPUCompiler.jl#812, because
previously the exception trap was simply removed by PoCL resulting in
the subsequent memory access simply happening as if there was no OOB.
x-ref #413 (cc @christiangnrd)
x-ref https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_required_subgroup_size.html
x-ref pocl/pocl#2181