CUDA Unbound is a great library when performance is critical.
CUB example where I:
- Thrust sequence to generate indices
- Sort an array and sort the generated indices with the array to generate outputs
- Sort additional arrays in a kernel utilizing the new sorted indices.
CUDA Unbound is a great library when performance is critical.
CUB example where I: