Skip to content

Conversation

@xinjin01
Copy link
Contributor

@xinjin01 xinjin01 commented Aug 7, 2025

This PR fixes issues in the reduce sample where the chosen work-group size (WGS)
could exceed the device’s per-dimension maximum (CL_DEVICE_MAX_WORK_ITEM_SIZES[0]),
causing clEnqueueNDRangeKernel to fail with CL_INVALID_WORK_GROUP_SIZE.

The reduce sample has two implementations (C and C++), and both required fixes:

  • C version: The first patch ensures the kernel is launched with a valid WGS by
    clamping against the device’s maximum.
  • C++ version: The second patch applies the same clamping logic to the C++ reduce.cpp
    application, ensuring consistent behavior.

Together, these patches make both variants of the reduce sample more robust and
portable across a wider range of OpenCL devices.

Signed-off-by: Xin Jin xin.jin@arm.com

Copy link
Contributor

@bashbaug bashbaug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Are you able to disclose which device needs this change?

Regardless, could you please make the same change to the C++ version of this sample also, in main.cpp? I think it would have the same problem. Thanks!

The work-group size (WGS) may exceed the maximum allowed per-dimension
limit reported by CL_DEVICE_MAX_WORK_ITEM_SIZES[0], leading to
clEnqueueNDRangeKernel failing with CL_INVALID_WORK_GROUP_SIZE.

This patch queries the device's max work-item sizes and clamps WGS to
the valid maximum for the x-dimension. This ensures correct behavior on
devices with smaller limits and improves portability.

Signed-off-by: Xin Jin <xin.jin@arm.com>
The work-group size (WGS) may exceed the maximum allowed per-dimension
limit reported by CL_DEVICE_MAX_WORK_ITEM_SIZES[0], leading to
clEnqueueNDRangeKernel failing with CL_INVALID_WORK_GROUP_SIZE.

The reduce sample launches kernels with a 1-D NDRange over a flat array
of integers, so only the x-dimension limit is relevant. This patch
queries the device's max work-item sizes and clamps WGS to the valid
maximum for the x-dimension. This ensures correct behavior on devices
with smaller limits and improves portability.

Signed-off-by: Xin Jin <xin.jin@arm.com>
@xinjin01
Copy link
Contributor Author

Interesting. Are you able to disclose which device needs this change?

Regardless, could you please make the same change to the C++ version of this sample also, in main.cpp? I think it would have the same problem. Thanks!

Thanks for the review! and sorry for the delay:

This issue was observed on a device where the requested WGS exceeded the per-dimension limit (CL_DEVICE_MAX_WORK_ITEM_SIZES[0]). To improve portability across all devices, the fix clamps the WGS to the valid maximum, independent of the specific device.

I’ve also applied the same change to the C++ version (reduce.cpp), as suggested. This is included in the second commit of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants