You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
XPU SYCL kernels in src/ATen/native/xpu/sycl/ currently use std::, ::, or bare C math functions (e.g. std::exp, ::expf, sqrtf) instead of the SYCL-native sycl:: or sycl::native:: namespace equivalents. This should be unified to use sycl:: namespace functions for correctness, portability, and potential performance benefits on Intel GPU hardware.
Mapping convention (mirroring CUDA → SYCL)
Current (std/bare)
SYCL replacement
Notes
std::exp, ::expf, expf
sycl::exp
std::log, ::logf, logf
sycl::log
std::sqrt, sqrtf
sycl::sqrt
std::abs, std::fabs, ::abs
sycl::fabs (float/double), sycl::abs (int)
complex → keep std::abs
std::pow
sycl::pow
std::ceil, ceilf
sycl::ceil
std::floor, floorf
sycl::floor
std::tanh, ::tanhf
sycl::tanh
complex → keep std::
std::fmod
sycl::fmod
std::lgamma
sycl::lgamma
std::log10
sycl::log10
complex → keep std::
std::log1p, ::log1p
sycl::log1p
complex → keep std::
std::log2
sycl::log2
complex → keep std::
std::isinf
sycl::isinf
device code only
std::isnan
sycl::isnan
device code only
std::isfinite
sycl::isfinite
device code only
std::copysign
sycl::copysign
std::erf
sycl::erf
std::erfc
sycl::erfc
std::exp2
sycl::exp2
std::expm1
sycl::expm1
complex → keep std::
std::frexp
sycl::frexp
std::sin
sycl::sin
complex → keep std::
std::cos
sycl::cos
complex → keep std::
std::tan
sycl::tan
complex → keep std::
std::asin
sycl::asin
complex → keep std::
std::acos
sycl::acos
complex → keep std::
std::atan
sycl::atan
complex → keep std::
std::sinh
sycl::sinh
complex → keep std::
std::cosh
sycl::cosh
complex → keep std::
std::asinh
sycl::asinh
std::acosh
sycl::acosh
std::atanh
sycl::atanh
std::trunc, std::truncf
sycl::trunc
complex component calls → keep std::
std::nearbyintf
sycl::rint (closest SYCL equivalent)
needs accuracy verification
std::rsqrt
sycl::rsqrt
Important notes
Complex types:sycl:: math functions do NOT support c10::complex/std::complex types. When an operator is dispatched for both real and complex types, the std:: call must be preserved for complex and only changed to sycl:: for real types.
Host code:std:: calls in host code (outside kernels/functors) should remain as-is.
STD_FUNCTOR macro:ForeachUnaryKernels.cpp defines a STD_FUNCTOR macro (line 181) that generates std::OP_NAME(t) inside device functors. Many functions are affected by this single macro. Migrating these requires either specializing the macro or replacing it with explicit functor definitions that dispatch between sycl:: and std:: based on type.
Summary
XPU SYCL kernels in
src/ATen/native/xpu/sycl/currently usestd::,::, or bare C math functions (e.g.std::exp,::expf,sqrtf) instead of the SYCL-nativesycl::orsycl::native::namespace equivalents. This should be unified to usesycl::namespace functions for correctness, portability, and potential performance benefits on Intel GPU hardware.Mapping convention (mirroring CUDA → SYCL)
std::exp,::expf,expfsycl::expstd::log,::logf,logfsycl::logstd::sqrt,sqrtfsycl::sqrtstd::abs,std::fabs,::abssycl::fabs(float/double),sycl::abs(int)std::absstd::powsycl::powstd::ceil,ceilfsycl::ceilstd::floor,floorfsycl::floorstd::tanh,::tanhfsycl::tanhstd::std::fmodsycl::fmodstd::lgammasycl::lgammastd::log10sycl::log10std::std::log1p,::log1psycl::log1pstd::std::log2sycl::log2std::std::isinfsycl::isinfstd::isnansycl::isnanstd::isfinitesycl::isfinitestd::copysignsycl::copysignstd::erfsycl::erfstd::erfcsycl::erfcstd::exp2sycl::exp2std::expm1sycl::expm1std::std::frexpsycl::frexpstd::sinsycl::sinstd::std::cossycl::cosstd::std::tansycl::tanstd::std::asinsycl::asinstd::std::acossycl::acosstd::std::atansycl::atanstd::std::sinhsycl::sinhstd::std::coshsycl::coshstd::std::asinhsycl::asinhstd::acoshsycl::acoshstd::atanhsycl::atanhstd::trunc,std::truncfsycl::truncstd::std::nearbyintfsycl::rint(closest SYCL equivalent)std::rsqrtsycl::rsqrtImportant notes
sycl::math functions do NOT supportc10::complex/std::complextypes. When an operator is dispatched for both real and complex types, thestd::call must be preserved for complex and only changed tosycl::for real types.std::calls in host code (outside kernels/functors) should remain as-is.STD_FUNCTORmacro:ForeachUnaryKernels.cppdefines aSTD_FUNCTORmacro (line 181) that generatesstd::OP_NAME(t)inside device functors. Many functions are affected by this single macro. Migrating these requires either specializing the macro or replacing it with explicit functor definitions that dispatch betweensycl::andstd::based on type.SYCL math function reference
https://github.khronos.org/SYCL_Reference/iface/math-functions.html
Task List
Each task = one function → one PR with code change + accuracy test + performance test.
PR must be merged before checking the box.
Infrastructure
0.
XPUMathCompat.hinternal updates::expf→sycl::exp,::tanhf→sycl::tanh, etc.CUDAMathCompat.h: abs, ceil, copysign, floor, log, log1p, max, min, pow, sincos, sqrt, tan, normcdf0b.
NumericUtils.h— add XPU fast math branches (PyTorch main repo)aten/src/ATen/NumericUtils.hdefinesat::exp,at::log,at::log1p,at::tantemplate functions__expf,__logf,__log1pf,__tanf)::exp,::log, etc. (C libm)#elif defined(__SYCL_DEVICE_ONLY__)branches usingsycl::native::exp,sycl::native::log, etc. to match CUDA behaviorat::expetc. (0 occurrences in torch-xpu-ops/src/), but aligning for correctness and future useMath functions (alphabetical)
1.
abs/fabs—std::abs,std::fabs,::abs→sycl::fabs(float/double),sycl::abs(int)std::absfor complex types in: AbsKernel.cpp, ForeachUnaryKernels.cpp, UnarySignKernels.cpp, SharedReduceOps.h2.
acos—std::acos→sycl::acosstd::for complex3.
acosh—std::acosh→sycl::acosh4.
asin—std::asin→sycl::asinstd::for complex5.
asinh—std::asinh→sycl::asinh6.
atan—std::atan→sycl::atanstd::for complex7.
atanh—std::atanh→sycl::atanh8.
ceil—std::ceil, bareceilf→sycl::ceilceilfas-is9.
copysign—std::copysign→sycl::copysign10.
cos—std::cos→sycl::cosstd::for complex11.
cosh—std::cosh→sycl::coshstd::for complex12.
erf—std::erf→sycl::erf13.
erfc—std::erfc→sycl::erfc14.
exp—std::exp,::expf, bareexpf→sycl::expexpf, 4 occurrences)15.
exp2—std::exp2→sycl::exp2sycl::exp(ln_2 * x)instead (nosycl::exp2for complex)16.
expm1—std::expm1→sycl::expm1std::for complex17.
floor—std::floor, barefloorf→sycl::floorfloorf)floorf)18.
fmod—std::fmod→sycl::fmod19.
frexp—std::frexp→sycl::frexp20.
isinf/isnan/isfinite—std::isinf/isnan/isfinite→sycl::isinf/isnan/isfinitestd::in host code: SummaryOpsKernels.cpp, DistanceKernels.cpp21.
lgamma—std::lgamma→sycl::lgamma22.
log—std::log, barelogf→sycl::loglogf, 2 occurrences)23.
log1p—std::log1p,::log1p→sycl::log1pstd::)24.
log2—std::log2→sycl::log2std::for complex25.
log10—std::log10→sycl::log10std::)std::26.
nearbyintf—std::nearbyintf→sycl::rint(closest SYCL equivalent)nearbyintuses current rounding mode,rintmay raise inexact. Needs careful accuracy testing.std::nearbyintf27.
pow—std::pow→sycl::pow#define compat_pow std::pow→sycl::pow28.
sin—std::sin→sycl::sinstd::for complex29.
sinh—std::sinh→sycl::sinhstd::for complex30.
sqrt—std::sqrt, baresqrtf→sycl::sqrtsqrtf)sqrtf)#define device_sqrt std::sqrt→sycl::sqrt31.
tan—std::tan→sycl::tanstd::for complex32.
tanh—std::tanh→sycl::tanh33.
trunc/truncf—std::trunc,std::truncf→sycl::truncstd::truncf), :206-207 (complex components)std::truncf