Skip to content

Fix oversized dispatch#4

Merged
sebcrozet merged 1 commit into
dimforge:mainfrom
cutlerjake:fix/dispatch-workgroup-limit
Jun 24, 2026
Merged

Fix oversized dispatch#4
sebcrozet merged 1 commit into
dimforge:mainfrom
cutlerjake:fix/dispatch-workgroup-limit

Conversation

@cutlerjake

Copy link
Copy Markdown
Contributor

Problem

Several per-particle compute kernels dispatch ceil(n_particles / workgroup_size)
workgroups along a single (X) dimension. Once the particle count exceeds
~4.19M (65535 * 64), this exceeds the WebGPU / Vulkan / D3D12 maximum number of
workgroups per dimension, and the simulation aborts with a validation error:

In a dispatch command, indirect:false
  Each current dispatch group size dimension ([103849, 1, 1]) must be less or equal to 65535

Fix

Clamp the dispatch to at most MAX_NUM_WORKGROUPS (65535) using the existing
GpuFunction::launch_capped, and grid-stride over the particles inside each
affected kernel so every input is still processed:

let total = min((n + WG - 1) / WG, 65535u) * WG;
for (var id = invocation_id.x; id < n; id += total) {
    ...
}

Affected kernels:

  • grid sort — touch_particle_blocks, update_block_particle_count, finalize_particles_sort
  • solver — particle_update, estimate_timestep_bound

Testing

  • cargo check -p slosh3d --features dim3,runtime passes.
  • Validated end-to-end in a downstream MPM simulation with 6,646,275 particles:
    previously panicked ~3s in with the error above; now runs to completion and
    produces correct output.
  • At sub-limit particle counts the grid-stride loop runs exactly once per thread,
    so results are unchanged (verified output matches the pre-fix build within
    run-to-run convergence noise).

@sebcrozet sebcrozet merged commit 362072a into dimforge:main Jun 24, 2026
5 checks passed
@sebcrozet

Copy link
Copy Markdown
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants