Decompose dot_xpu_mkl into mul and sum in non oneMKL path by Silv3S · Pull Request #3265 · intel/torch-xpu-ops

Silv3S · 2026-04-03T12:56:51Z

If oneMKL is not available, replace CPU fallback with dot to mul+sum decomposition to avoid unnecessary data copies between devices.

Copilot

Pull request overview

This PR addresses XPU torch.dot failing for torch.int64 by decomposing the operation into elementwise mul followed by sum, and it removes the CPU fallback when oneMKL is unavailable to avoid device↔host copies.

Changes:

Add an explicit Long guard in the oneMKL path to run mul + sum instead of dot_xpu_mkl.
Replace the non-oneMKL CPU fallback with mul + sum directly on XPU.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-03T13:00:51Z

  }

 #if defined(USE_ONEMKL_XPU)
+  if (self.scalar_type() == at::ScalarType::Long) {


dot_xpu_mkl only dispatches floating/complex (see AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES... in src/ATen/native/xpu/mkl/BlasImpl.cpp), so in USE_ONEMKL_XPU builds this function will still throw for other integer dtypes (e.g., Int/Short/Byte) even after the Long guard. If torch.dot is expected to work for all integral types, consider expanding this guard to cover all integral (non-bool) dtypes (or adding an explicit error for the unsupported ones).

Suggested change

if (self.scalar_type() == at::ScalarType::Long) {

if (c10::isIntegralType(self.scalar_type(), /*includeBool=*/false)) {

@Silv3S Since oneMKL dot doesn't support all integrate types, similar issues can also occur when the input tensor is another integer. Suggestions for two purposes:

To fix UT only => skipping failed cases just like CUDA did

To support comprehensive functionality => add support not only for long

Thanks for review. I think first suggestion is better than extending functionality for ints. User can invoke mul and sum that supports int inputs if needed

guangyey · 2026-04-07T16:15:27Z

I notice that CUDA's impl dot_cuda doesn't support long data type yet. Why do we need this.

Silv3S · 2026-04-07T16:31:12Z

Thanks for review @guangyey. You're right - we don't need it. I assumed that it should be implemented based on failing ut from open issue. Then I checked that dot for int64 is also not covered by CUDA, as they just skip this ut.

But as general improvement I'd consider replacing existing CPU fallback with mul+sum for non-oneMKL path to keep calculations on device

#if defined(USE_ONEMKL_XPU)
  return at::native::xpu::dot_xpu_mkl(self, other);
#else
  // return at::native::dot(self.cpu(), other.cpu()).to(self.device());
  return at::mul(self, other).sum();
#endif

CuiYifeng · 2026-04-13T14:15:37Z

  }

 #if defined(USE_ONEMKL_XPU)
+  if (self.scalar_type() == at::ScalarType::Long) {


@Silv3S Since oneMKL dot doesn't support all integrate types, similar issues can also occur when the input tensor is another integer. Suggestions for two purposes:

To fix UT only => skipping failed cases just like CUDA did

To support comprehensive functionality => add support not only for long

CuiYifeng · 2026-04-13T14:20:37Z


 #if defined(USE_ONEMKL_XPU)
+  if (self.scalar_type() == at::ScalarType::Long) {
+    return at::mul(self, other).sum();


Just a reminder: not sure whether this mathematically equivalent approach will overflow for large inputs.
For functionality first, this approach is acceptable.

Agree. They are numerically close, but it's not exact match. I checked multiple cases and they pass for 1e-5 tol (in fp32), but to not introduce any instabilities maybe it's better to leave the CPU fallback as is.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fallback to mul+sum instead of CPU fallback

13e358f

Silv3S requested review from PawelSwider2000, Copilot and kdrozd-dev April 3, 2026 12:56

Copilot started reviewing on behalf of Silv3S April 3, 2026 12:57 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Silv3S added 2 commits April 7, 2026 08:51

Declare support for long

f239850

Merge remote-tracking branch 'iops/main' into dot_mkl

9ab9b4d

Silv3S linked an issue Apr 7, 2026 that may be closed by this pull request

NotImplementedError: "dot_xpu_mkl" not implemented for 'Long' #3247

Closed

guangyey requested a review from CuiYifeng April 13, 2026 05:59

CuiYifeng reviewed Apr 13, 2026

View reviewed changes

Remove workaround in oneMKL dot path

afcba19

Copilot AI review requested due to automatic review settings April 14, 2026 11:46

Copilot started reviewing on behalf of Silv3S April 14, 2026 11:46 View session

Silv3S changed the title ~~Decompose dot_xpu_mkl into mul and sum for long dtype~~ Decompose dot_xpu_mkl into mul and sum in non oneMKL path Apr 14, 2026

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Comment thread src/ATen/native/xpu/Blas.cpp

Silv3S closed this Apr 14, 2026

Silv3S deleted the dot_mkl branch April 14, 2026 12:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decompose dot_xpu_mkl into mul and sum in non oneMKL path#3265

Decompose dot_xpu_mkl into mul and sum in non oneMKL path#3265
Silv3S wants to merge 4 commits intointel:mainfrom
Silv3S:dot_mkl

Silv3S commented Apr 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

CuiYifeng Apr 13, 2026

Uh oh!

Silv3S Apr 14, 2026

Uh oh!

Uh oh!

guangyey commented Apr 7, 2026

Uh oh!

Silv3S commented Apr 7, 2026 •

edited

Loading

Uh oh!

CuiYifeng Apr 13, 2026

Uh oh!

CuiYifeng Apr 13, 2026

Uh oh!

Silv3S Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	if (self.scalar_type() == at::ScalarType::Long) {
	if (c10::isIntegralType(self.scalar_type(), /includeBool=/false)) {

Conversation

Silv3S commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

CuiYifeng Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Silv3S Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

guangyey commented Apr 7, 2026

Uh oh!

Silv3S commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CuiYifeng Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

CuiYifeng Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Silv3S Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Silv3S commented Apr 3, 2026 •

edited

Loading

Silv3S commented Apr 7, 2026 •

edited

Loading