Improve Primitives a Little Bit #2002

BBC-Esq · 2026-02-01T05:44:19Z

CUDA: Replace thrust::reduce with CUB DeviceReduce::Sum in primitives::sum

Summary

Inspired by this conversation here NVIDIA/cccl#520

Replaces the CUDA implementation of primitives<Device::CUDA>::sum with cub::DeviceReduce::Sum using cudaMallocAsync for temp storage. Interface and behavior are unchanged.

While both Thrust and CUB use the same kernel backend, this removes the hidden sync cost of cudaMalloc in Thrust's default allocator path.

This PR updates only sum. Similar changes could be made for max, max_element, logsumexp, etc.

Happy to share benchmarks if helpful!

jordimas · 2026-02-01T08:38:08Z

Close it. We will implement it manually with somebody that understand the context.

BBC-Esq · 2026-02-01T12:11:34Z

??? why was this closed? Was it not correct or are you just closing PRs now whenever you think I don't understand, even though there's nothing wrong with it? I put a fair amount of time into this and would like to know for future pull requests. Thanks.

Improve Primitives a Little Bit

114d903

jordimas closed this Feb 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Primitives a Little Bit #2002

Improve Primitives a Little Bit #2002

Uh oh!

BBC-Esq commented Feb 1, 2026 •

edited

Loading

Uh oh!

jordimas commented Feb 1, 2026

Uh oh!

BBC-Esq commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve Primitives a Little Bit #2002

Improve Primitives a Little Bit #2002

Uh oh!

Conversation

BBC-Esq commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CUDA: Replace thrust::reduce with CUB DeviceReduce::Sum in primitives::sum

Summary

Uh oh!

jordimas commented Feb 1, 2026

Uh oh!

BBC-Esq commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BBC-Esq commented Feb 1, 2026 •

edited

Loading