// Transpose tile.
smem_buffer_tr[col * BLOCK_SIZE + row_swizzle] =
smem_buffer[row * BLOCK_SIZE + col_swizzle];
Hi, it seems that bank conflict still exits when writing to smem_buffer_tr. For example, when row=0, col=0~31, row_swizzle will be 0, 4, 8, 12, 16, 20, 24, 28, 0, 4, 8, 12, 16, 20, 24, 28..., which will cause 4 ways conflict. Maybe I have a wrong understanding, any help will be appreicate.
Hi, it seems that bank conflict still exits when writing to smem_buffer_tr. For example, when row=0, col=0~31, row_swizzle will be 0, 4, 8, 12, 16, 20, 24, 28, 0, 4, 8, 12, 16, 20, 24, 28..., which will cause 4 ways conflict. Maybe I have a wrong understanding, any help will be appreicate.