-
Notifications
You must be signed in to change notification settings - Fork 308
Add amdgpu intrinsics #1976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add amdgpu intrinsics #1976
Conversation
|
I think it can be "tested" the same way we do |
e429208 to
d6043df
Compare
Add intrinsics for the amdgpu architecture.
d6043df to
e102811
Compare
|
Thanks, I tried to replicate what’s there for nvptx + adding The diff to my first push from when opening this PR is here: https://github.com/rust-lang/stdarch/compare/b3f5bdae0efbdd5f7297d0225623bd31c7fe895b..e1028110e77561574bfb7ea349154d46b5ea7b86 |
sayantn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have put some comments
| if [ "$CI" != "" ]; then | ||
| rustup component add rust-src | ||
| fi | ||
| export CARGO_UNSTABLE_BUILD_STD=core,alloc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to build alloc, core_arch is no_std, no_alloc
| pub unsafe fn update_dpp( | ||
| old: u32, | ||
| src: u32, | ||
| dpp_ctrl: u32, | ||
| row_mask: u32, | ||
| bank_mask: u32, | ||
| bound_control: bool, | ||
| ) -> u32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llvm.amdgcn.update.dpp should have the last 4 arguments immediate, i.e. it should use const generics
| pub unsafe fn update_dpp( | |
| old: u32, | |
| src: u32, | |
| dpp_ctrl: u32, | |
| row_mask: u32, | |
| bank_mask: u32, | |
| bound_control: bool, | |
| ) -> u32 { | |
| pub unsafe fn update_dpp< | |
| const DPP_CTRL: u32, | |
| const ROW_MASK: u32, | |
| const BANK_MASK: u32, | |
| const BOUND_CONTROL: bool | |
| >(old: u32, src: u32) -> u32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have commented on several intrinsics that they should use const-generics, as the LLVM intrinsics use ImmArg. There are a few more, just ensure that the Immediate arguments use const-generics. I checked by temporarily making all the intrinsics #[inline(never)], and running ci/run.sh (this ensures that the functions are actually codegened)
update.dppwave.reduce.{or,and,max,min,xor,umax,umin}permlane{x}16{.var}permlane{16,32}.swapsched.barriersched.group.barriers.barrier.waits.barrier.signals.barrier.signal.isfirsts.sleeps.sethalt
also, if an Immediate argument has some kind of restrictions (e.g. it needs to be in range 0..10, then you can check that using the static_assert macros in )
|
Also I have noticed that Clang provides a different set of intrinsics than these ( |
|
Thanks for the review! Interesting, I thought I do plan to write a common mod amdgpu {
mod gpu {
use super::*;
pub fn block_id_x() -> u32 {
workitem_id_x()
}
}
}
// Same for nvptx
mod gpu {
#[cfg(target_arch = "amdgpu")]
pub use amdgpu::gpu::*;
#[cfg(target_arch = "nvptx64")]
pub use nvptx::gpu::*;
// + more intrinsics as in gpuintrin.h
} |
|
The analogue to If there are some interesting platform-specific intrinsics, we can add them, but generally we follow GCC and Clang in regards to intrinsics. Yea, a common |
Add intrinsics for the amdgpu architecture.
I’m not sure how to add/run CI (
ci/run.shfails for me e.g. fornvptx because
corecannot be found), but I checked that it compileswithout warnings with
Tracking issue: rust-lang/rust#149988