Skip to content

Pull requests: ROCm/FBGEMM

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Fix test_cache_int32_overflow test failure on ROCm
#149 opened Mar 25, 2026 by avbokovoy Loading…
Fix UVM FP16 performance regression on MI300
#145 opened Mar 10, 2026 by avbokovoy Loading…
Implement cached member_id upper bound search enhancement New feature or request
#141 opened Feb 2, 2026 by avbokovoy Loading…
sync pytorch/FBGEMM to rocm/FBGEMM
#140 opened Jan 18, 2026 by keerthana-bidar Loading…
1 task
Implement asynchronous LDS loads for MI350 enhancement New feature or request
#138 opened Dec 19, 2025 by avbokovoy Loading…
Optimizations for index_select_scalar_cumsum_kernel
#137 opened Dec 16, 2025 by amd-wsung102 Loading…
1 task
group_index_select_or_add_2d_kernel optimization
#131 opened Nov 11, 2025 by shbiswas834 Loading…
Yandai/temp opt codegen
#129 opened Oct 28, 2025 by yadaish Loading…
1 task
fwd optimizations
#125 opened Sep 23, 2025 by shbiswas834 Loading…
Remove fwd and warmup from benchmark profiling
#124 opened Sep 17, 2025 by huizzhan Draft
1 task
added malloc pitch on merged pool embedding
#123 opened Sep 11, 2025 by kudomcho Loading…
added optimized merged pool embedding script
#122 opened Sep 10, 2025 by kudomcho Loading…
apply unroll and prefetch optimization
#119 opened Aug 27, 2025 by zhiding512 Loading…
apply Vec4T on vbe forward
#118 opened Aug 27, 2025 by JaxChen29 Loading…
1 task
tuned grid size by reducing num_warps_per_threadblock to 4
#117 opened Aug 26, 2025 by kudomcho Loading…
1 task
apply Vec4T on vbe forward
#115 opened Aug 21, 2025 by JaxChen29 Loading…
1 task
ProTip! Filter pull requests by the default branch with base:main.