Skip to content

Implement pre-sorting, caching and contigous warp processing in group_index_select#144

Open
avbokovoy wants to merge 8 commits intoabokovoi/upstreamfrom
abokovoi/group-index-sort-and-cache-opt
Open

Implement pre-sorting, caching and contigous warp processing in group_index_select#144
avbokovoy wants to merge 8 commits intoabokovoi/upstreamfrom
abokovoi/group-index-sort-and-cache-opt

Conversation

@avbokovoy
Copy link

Follow-up of #139

The differences are:

  1. Reduced #ifdef USE_ROCM usage in favor of if constexpr (OPT_BOOL).
  2. Added compile-time host side codegen guard for the kernel (CUDA vs ROCm)
  3. Fixed an issue with tailing row cache flush

@avbokovoy avbokovoy self-assigned this Mar 3, 2026
Copy link

@aryaman-gupta aryaman-gupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR introduces crucial optimizations for the group_index_select_or_add_2d_kernel. The majority of the code is clean and the separation of ROCm and CUDA codepaths has been done well.

Most of these changes were already reviewed in #139 . I have left a few comments that I think should be looked at before merging. Some of these are design choices, and the PR could proceed with merging even if the code is not modified,

@avbokovoy avbokovoy force-pushed the abokovoi/group-index-sort-and-cache-opt branch from 3754ab4 to 1248c83 Compare March 11, 2026 12:21
@avbokovoy avbokovoy requested a review from aryaman-gupta March 11, 2026 12:35
Copy link

@aryaman-gupta aryaman-gupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The points raised in #144 (review) have been addressed and the PR is ready for merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants