Feature: Add initial SYCL backend support for gsplat by asrathore-ai · Pull Request #1 · isl-org/gsplat

asrathore-ai · 2025-08-13T15:14:18Z

This pull request introduces initial support for the SYCL backend in gsplat, enabling it to run on a wider range of hardware beyond CUDA-enabled GPUs, particularly Intel GPUs. This is achieved with minimal, non-intrusive changes to the existing codebase, laying a clean foundation for a multi-backend architecture.

Motivation

The primary goal is to broaden the accessibility of gsplat by supporting open, cross-platform standards. By enabling SYCL, we unlock a larger ecosystem of hardware and empower more users to contribute to and benefit from this project. This PR also serves as a template for how other backends can be integrated in the future.

Key Changes

To ensure a smooth integration and easy review, the changes are intentionally minimal and isolated:

setup.py: Updated to detect the SYCL toolchain and compile the SYCL-specific kernels when the appropriate environment is found. The CUDA build path remains the default and is unaffected if a SYCL compiler is not present.

gsplat/__init__.py: Modified to conditionally import the correct compiled backend (cuda or sycl) at runtime. This avoids breaking changes for existing users and abstracts the backend logic away from the core library.

_torch_impl.py/_torch_impl_2dgs.py: Since these file are share by both cuda and sycl, they were moved to gsplat package root.

No changes were made to the core CUDA implementation or the high-level Python logic. This approach ensures that the existing functionality remains stable and untouched.

Testing Strategy

tests/test_basic.py, tests/test_2dgs.py, tests/test_rasterization.py, tests/test_strategy.py: The basic kernel tests now work for both cuda and xpu(sycl).
tests/test_rasterization.py, tests/test_basic.py,tests/test_2dgs.py: The verification of rasterized results wrt PyTorch implementation is disabled for XPU, since this needs nerfacc (unavailable on XPU).
tests/test_ftheta.py, tests/test_compression.py: These features are not implemented.

Future Work

This PR represents the foundational step for full SYCL support. A few kernels are still in development and will be added in subsequent PRs.

We are fully committed to the success of this integration. Our team will take on the role of fully integrating the SYCL backend, adding the remaining kernels and testing thoroughly, addressing any issues.

We believe this addition will be a valuable asset to the nerf-studio community and look forward to your feedback.

Feature list:

ssheorey · 2025-09-11T21:12:19Z

@asrathore-ai can you check the CI failures?

Copilot

Pull Request Overview

This pull request introduces initial support for the SYCL backend in gsplat, enabling it to run on a wider range of hardware beyond CUDA-enabled GPUs, particularly Intel GPUs. The changes are designed to be minimal, non-intrusive, and lay a clean foundation for a multi-backend architecture.

Adds SYCL backend detection and compilation infrastructure in setup.py
Implements backend selection logic in gsplat/init.py to conditionally import CUDA or SYCL compiled modules
Moves shared torch implementations to package root for use by both backends
Updates test infrastructure to support both CUDA and SYCL devices with appropriate backend detection

Reviewed Changes

Copilot reviewed 66 out of 67 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
setup.py	Adds SYCL toolchain detection and CMake-based build configuration for SYCL backend
tests/test_basic.py	Updates test infrastructure to support both CUDA and SYCL backends with conditional device selection
gsplat/sycl/src/*.cpp	Implements SYCL kernel wrappers and operators for spherical harmonics, projection, rasterization and other core functions
gsplat/sycl/include/*.hpp	Provides SYCL kernel implementations and utility headers for mathematical operations
gsplat/sycl/ext.cpp	Python binding definitions for SYCL backend functions

Comments suppressed due to low confidence (2)

gsplat/sycl/include/kernels/RasterizeToPixelsFwdKernel.hpp:1

The local accessor slm_color is declared but only conditionally initialized. When the condition is false, slm_color remains uninitialized but may still be passed to the kernel constructor.

#ifndef RasterizeToPixelsFwdKernel_HPP

gsplat/sycl/include/kernels/RasterizeToPixelsBwdKernel.hpp:1

Same issue as in the forward kernel - slm_color is declared but only conditionally initialized, which could lead to undefined behavior when passed to the kernel constructor.

#ifndef RasterizeToPixelsBwdKernel_HPP

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-12T00:05:40Z

+						if constexpr(BufferType<S, COLOR_DIM>::isVec && COLOR_DIM == 4){
+                                   // means(2) + conics(3) + colors(4) + opac(1)
+                                   auto temp1 = *(reinterpret_cast<const sycl::vec<S,8>*>(data) );
+                                   auto temp2 = *(reinterpret_cast<const sycl::vec<S,2>*>(data + 8) );


Magic numbers like 8 and 2 in the data offsets should be defined as constants (e.g., MEANS_SIZE + CONICS_SIZE) to improve code readability and maintainability.

Copilot · 2025-09-12T00:05:40Z

+				if (m_v_means2d_abs != nullptr) {
+					local_mean_abs = sycl::reduce_over_group( work_item.get_group(), v_xy_abs_local, sycl::plus<sycl::vec<S, 2>>());
+				}
+


The condition threadRank == idx seems counterintuitive. Consider adding a comment to explain why this specific thread is chosen for the atomic operations.

Suggested change

// Only the thread with threadRank == idx performs the atomic operations.

// This ensures that exactly one thread per group (the one corresponding to the original work item)

// updates the global memory after the group reduction, avoiding redundant atomic updates.

fix mismatch between pass by value (CUDA) and pass by ref (sycl) for at::Tensor

ssheorey

[WIP] Some comments and updates. I tried building and running on Windows and got some build and link errors.
Current linker error:

lld-link: error: undefined symbol: __declspec(dllimport) public: static class pybind11::handle __cdecl pybind11::detail::type_caster<class at::Tensor, void>::cast(class at::Tensor const &, enum pybind11::return_value_policy, class pybind11::handle)
>>> referenced by ext.cpp
>>>               C:\Users\intel\AppData\Local\Temp\icx-00d308f3f9\ext-ed469e.obj

lld-link: error: undefined symbol: __declspec(dllimport) public: bool __cdecl pybind11::detail::type_caster<class at::Tensor, void>::load(class pybind11::handle, bool)
>>> referenced by ext.cpp
>>>               C:\Users\intel\AppData\Local\Temp\icx-00d308f3f9\ext-ed469e.obj

lld-link: error: undefined symbol: __declspec(dllimport) public: class at::Tensor && __cdecl pybind11::detail::type_caster<class at::Tensor, void>::operator class at::Tensor &&(void) &&
>>> referenced by ext.cpp
>>>               C:\Users\intel\AppData\Local\Temp\icx-00d308f3f9\ext-ed469e.obj
icx: error: linker command failed with exit code 1 (use -v to see invocation)

ssheorey · 2025-09-12T18:18:44Z

Is the glm upgrade required? If yes, we need to explain why. If no, this should not be part of this PR.

ssheorey · 2025-09-12T23:01:18Z

Change file names to CamelCase.cpp to match CUDA src file convention.

…accelerator in the future. Ensure empty_cache() and synchronize() for XPU match those for CUDA,

…ation

…e install Co-authored-by: ssheorey <41028320+ssheorey@users.noreply.github.com>

added sycl code

9089568

ssheorey requested a review from Copilot September 12, 2025 00:04

Copilot AI reviewed Sep 12, 2025

View reviewed changes

Resolve Windows compiler / linker errors.

dedfc2d

fix mismatch between pass by value (CUDA) and pass by ref (sycl) for at::Tensor

ssheorey reviewed Sep 13, 2025

View reviewed changes

ssheorey and others added 24 commits September 12, 2025 22:56

Fix Windows linker errors

6f44999

Loading kernels now works in Windows.

a9eaf99

COrrected libtorch path issues

2f28ba3

Updated correct shape calculation

3fff3de

Updated proj changes

10c0efd

corrected isect code

7b3bfb8

Update fully fused projection kernels

0d25c9e

wip packed kernel

02a596b

added fully fused projection packed

fc801e8

Update kernel

f773859

updated training code

4dfea00

update steps for memory error

6624ec0

torch_acc uniform API for both cuda and xpu. Maybe replaced by torch.…

6597d44

…accelerator in the future. Ensure empty_cache() and synchronize() for XPU match those for CUDA,

Added forward pass for 2fgs fully fused projection

0a472cf

added backward pass

417942e

update tests

6763ada

Working tests

f4d0123

Added rasterize forward kernel

167d0b6

added backward kernel

a4f90ac

Update rasterize_to_pixels_2dgs_bwd.cpp

aa6ff08

Update RasterizeToPixels2DGSBwdKernel.hpp

6542d4c

Update RasterizeToPixels2DGSBwdKernel.hpp

cb5c72a

Update RasterizeToPixels2DGSFwdKernel.hpp

4a5bde3

Update rasterize_to_pixels_2dgs_fwd.cpp for correct block size comput…

11604f5

…ation

ssheorey had a problem deploying to production February 23, 2026 20:14 — with GitHub Actions Failure

Copilot AI and others added 2 commits February 23, 2026 23:00

Initial plan

2406dfb

Fix XPU workflow: truncate oneAPI version X.Y.Z to X.Y for apt packag…

a0025c6

…e install Co-authored-by: ssheorey <41028320+ssheorey@users.noreply.github.com>

ssheorey had a problem deploying to production February 26, 2026 13:51 — with GitHub Actions Failure

ssheorey temporarily deployed to production February 26, 2026 13:51 — with GitHub Actions Inactive

ssheorey had a problem deploying to production February 26, 2026 13:51 — with GitHub Actions Failure

ssheorey temporarily deployed to production February 26, 2026 13:51 — with GitHub Actions Inactive

ssheorey had a problem deploying to production February 26, 2026 13:51 — with GitHub Actions Failure

ssheorey temporarily deployed to production February 26, 2026 13:51 — with GitHub Actions Inactive

ssheorey had a problem deploying to production February 26, 2026 13:51 — with GitHub Actions Failure

ssheorey temporarily deployed to production February 26, 2026 13:51 — with GitHub Actions Inactive

ssheorey and others added 10 commits February 27, 2026 11:41

Windows wheel workflow

1256a06

Fix

1baa172

Fix setup.py for source only wheel

3328824

installer fix

b3a8f5d

fix

02b390b

Separate Windows and Linux build steps.

54de18a

fix build type for windows

1442ff5

Update docs

21f1f85

SYCL CI - wheels and PyPI repo hosting on github pages.

711958b

Add info about pre-built wheels.

760d963

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Add initial SYCL backend support for gsplat#1

Feature: Add initial SYCL backend support for gsplat#1
asrathore-ai wants to merge 61 commits intomainfrom
sycl_code_integration

asrathore-ai commented Aug 13, 2025 •

edited by ssheorey

Loading

Uh oh!

ssheorey commented Sep 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Sep 12, 2025

Uh oh!

Copilot AI Sep 12, 2025

Uh oh!

ssheorey left a comment

Uh oh!

ssheorey Sep 12, 2025

Uh oh!

ssheorey Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

+				// Only the thread with threadRank == idx performs the atomic operations.
+				// This ensures that exactly one thread per group (the one corresponding to the original work item)
+				// updates the global memory after the group reduction, avoiding redundant atomic updates.

Conversation

asrathore-ai commented Aug 13, 2025 • edited by ssheorey Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssheorey commented Sep 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

ssheorey left a comment

Choose a reason for hiding this comment

Uh oh!

ssheorey Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

ssheorey Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

asrathore-ai commented Aug 13, 2025 •

edited by ssheorey

Loading