-
Notifications
You must be signed in to change notification settings - Fork 267
WMMA grouped conv fwd large tensor bias bnorm clamp #3595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: streamhpc/conv_fwd_wmma_bias_bnorm_clamp
Are you sure you want to change the base?
WMMA grouped conv fwd large tensor bias bnorm clamp #3595
Conversation
Following operations are added for FP16/BF16 data type and NHWGCxGKYXC layout. - grouped_conv2d_fwd_bias_bnorm_clamp - grouped_conv3d_fwd_bias_bnorm_clamp
7b0341d to
b5c541f
Compare
|
|
||
| gemm_desc_kernel_args_.At(valid_gemms_count_) = new_args; | ||
| auto* gemm_args = &gemm_desc_kernel_args_.At(valid_gemms_count_); | ||
| new(gemm_args) GemmArgs{p_as_grid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice use of placement new here. The basic idea is good but this is difficult to maintain, as operator new with placement-args comes with a caveat: you have to manually call the dtor.
In our case (I think that) the implicitly defaulted dtor doesn't do anything useful. -I haven't checked that- But the implicitly deleted copy assignment operator that made you use this approach is concerning.
My point is basically that: if a future dev changes something that requires the GemmArg dtor to be called, they could very easily oversee this.
My proposal is to define an Emplace function in the Array struct to handle this case :
template<typename... Args>
auto Emplace(ck::index_t i, Args&&... args) -> std::enable_if_t<std::is_nothrow_constructible_v<TData, Args...>>
{
if (i >= ck::index_t{0} && i < NSize)
{
mData[i].~TData();
new(mData + i) TData(std::forward<Args>(args)...);
}
}
On another note... I see this way of initializing structs quite often and I can boldly say that I really don't like it. In C++17 we can already use designated initializers given that they are all included. This makes code more explicit, more robust and safer. So I'd change this section of the code to :
gemm_desc_kernel_args_.Emplace(valid_gemms_count_, GemmArgs{.a_ptrs_ = p_as_grid,
.b_ptrs_ = p_bs_grid,
.ds_ptrs_ = p_ds_grid,
.e_ptr_ = p_e_grid,
.a_element_op_ = a_element_op_,
.b_element_op_ = b_element_op_,
.cde_element_op_ = cde_element_op_,
.M_ = gemm_m,
.N_ = gemm_n,
.a_grid_desc_ = a_grid_desc,
.b_grid_desc_ = b_grid_desc,
.ds_grid_desc_mblock_mperblock_nblock_nperblock_ =
ds_desc_mblock_mperblock_nblock_nperblock,
.e_grid_desc_mblock_mperblock_nblock_nperblock_ =
e_desc_mblock_mperblock_nblock_nperblock,
.BlockStart_ = BlockStart,
.BlockEnd_ = BlockEnd});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the useful suggestion and explanation, Chris! I like this approach to handle gemmargs array and explicit initialization. I added it to the PR with a slight modification of having a runtime bound check.
|
Looks good overall! I have some comments:
|
Proposed changes
Added bias bnorm clamp operation for WMMA conv fwd large tensor (FP16/BF16 data type and NHWGCxGKYXC layout).
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered