You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add support for newer instructions should these become available. These are currently:
f32x2 instructions:
add{.rnd}{.ftz}.f32x2
sub{.rnd}{.ftz}.f32x2
etc.
Add support for newer instructions should these become available. These are currently:
f32x2 instructions:
add{.rnd}{.ftz}.f32x2
sub{.rnd}{.ftz}.f32x2
etc.
Mixed-precision which requires sm >= 100:
https://docs.nvidia.com/cuda/parallel-thread-execution/#mixed-precision-floating-point-instructions-add
bf16 operations on half, which require sm >= 90:
https://docs.nvidia.com/cuda/parallel-thread-execution/#half-precision-floating-point-instructions-add
OOB instructions, here for example:
https://docs.nvidia.com/cuda/parallel-thread-execution/#half-precision-floating-point-instructions-fma