Add triangular solve function for sparse CSR tensors in XPU#3261
Add triangular solve function for sparse CSR tensors in XPU#3261tszulist-hbn wants to merge 1 commit intointel:mainfrom
Conversation
a47240d to
fb6f866
Compare
There was a problem hiding this comment.
Pull request overview
Adds SparseCsrXPU support for triangular_solve by wiring a SparseCsrXPU dispatch and implementing an XPU sparse CSR fallback that densifies A and calls the existing dense XPU solve.
Changes:
- Register
SparseCsrXPUdispatch fortriangular_solve.X(and delegatetriangular_solveto it). - Implement
triangular_solve_out_sparse_csr_xpuin the XPU sparse CSR math file usingA.to_dense()+at::triangular_solve. - Add an
_nnz()==0fast-path intended to match CUDA behavior (fillsXwith NaNs).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| yaml/native/native_functions.yaml | Adds SparseCsrXPU dispatch registration for triangular_solve.X and structured delegate entry for triangular_solve. |
| src/ATen/native/sparse/xpu/SparseCsrTensorMath.cpp | Implements SparseCsrXPU out-kernel that densifies A and delegates to dense at::triangular_solve, with an _nnz()==0 special-case. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fb6f866 to
2a8ed8f
Compare
2a8ed8f to
1abc210
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
6b2c07b to
f242fda
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f242fda to
2ab054a
Compare
There was a problem hiding this comment.
Overall LGTM. Could you please add a test case? Maybe test/regressions is a good place.
Defer to @CuiYifeng
2ab054a to
2498a6e
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2498a6e to
92cf1c0
Compare
92cf1c0 to
fa646ce
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fa646ce to
396cfbf
Compare
396cfbf to
b99485b
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
b99485b to
8d7cf90
Compare
CuiYifeng
left a comment
There was a problem hiding this comment.
LGTM, but I noticed that test_block_triangular_solve_block_size like cases are still failed in CI https://github.com/intel/torch-xpu-ops/actions/runs/24331820702/job/71154120366?pr=3261. Please check, thanks.
8d7cf90 to
af701a3
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
af701a3 to
ce13ed7
Compare
ce13ed7 to
ca66bac
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ca66bac to
da45265
Compare
da45265 to
9665e01
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
9665e01 to
3da9260
Compare
3da9260 to
028a161
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
028a161 to
70f9b3c
Compare
Resolves: #3167
This PR adds the SparseCsrXPU dispatch for triangular_solve by converting the sparse matrix to dense and delegating to the existing dense XPU triangular_solve implementation. This follows the same pattern used by other XPU sparse ops (addmm, baddbmm, etc.).
Changes:
yaml/native/native_functions.yaml — Register SparseCsrXPU: triangular_solve_out_sparse_csr_xpu dispatch for both triangular_solve.X (structured out variant) and triangular_solve (structured delegate).
src/ATen/native/sparse/xpu/SparseCsrTensorMath.cpp — Implement triangular_solve_out_sparse_csr_xpu() which handles the zero-nnz edge case (fills X with NaN, matching CUDA behavior) and otherwise converts sparse A to dense before calling at::triangular_solve.
Tests verified:
test_block_triangular_solve — 64/64 variants pass (block_size 2/3, int32/int64, contiguous/noncontiguous, float32/float64/complex64/complex128)
test_sparse_triangular_solve_xpu — 4/4 variants pass (float32/float64/complex64/complex128)
Created test_triangular_solve_sparse.py with test cases covering:
numel() == 0early-exit path that fills with NaN)Each test constructs a non-singular triangular matrix in CSR format, solves on XPU, and compares against the dense CPU result. The style follows the existing test_linalg.py pattern in the same directory.