Skip to content

Feature: Add initial SYCL backend support for gsplat#1

Open
asrathore-ai wants to merge 61 commits intomainfrom
sycl_code_integration
Open

Feature: Add initial SYCL backend support for gsplat#1
asrathore-ai wants to merge 61 commits intomainfrom
sycl_code_integration

Conversation

@asrathore-ai
Copy link
Copy Markdown
Collaborator

@asrathore-ai asrathore-ai commented Aug 13, 2025

This pull request introduces initial support for the SYCL backend in gsplat, enabling it to run on a wider range of hardware beyond CUDA-enabled GPUs, particularly Intel GPUs. This is achieved with minimal, non-intrusive changes to the existing codebase, laying a clean foundation for a multi-backend architecture.

Motivation

The primary goal is to broaden the accessibility of gsplat by supporting open, cross-platform standards. By enabling SYCL, we unlock a larger ecosystem of hardware and empower more users to contribute to and benefit from this project. This PR also serves as a template for how other backends can be integrated in the future.

Key Changes

To ensure a smooth integration and easy review, the changes are intentionally minimal and isolated:

setup.py: Updated to detect the SYCL toolchain and compile the SYCL-specific kernels when the appropriate environment is found. The CUDA build path remains the default and is unaffected if a SYCL compiler is not present.

gsplat/__init__.py: Modified to conditionally import the correct compiled backend (cuda or sycl) at runtime. This avoids breaking changes for existing users and abstracts the backend logic away from the core library.

_torch_impl.py/_torch_impl_2dgs.py: Since these file are share by both cuda and sycl, they were moved to gsplat package root.

No changes were made to the core CUDA implementation or the high-level Python logic. This approach ensures that the existing functionality remains stable and untouched.

Testing Strategy

tests/test_basic.py, tests/test_2dgs.py, tests/test_rasterization.py, tests/test_strategy.py: The basic kernel tests now work for both cuda and xpu(sycl).
tests/test_rasterization.py, tests/test_basic.py,tests/test_2dgs.py: The verification of rasterized results wrt PyTorch implementation is disabled for XPU, since this needs nerfacc (unavailable on XPU).
tests/test_ftheta.py, tests/test_compression.py: These features are not implemented.

Future Work

This PR represents the foundational step for full SYCL support. A few kernels are still in development and will be added in subsequent PRs.

We are fully committed to the success of this integration. Our team will take on the role of fully integrating the SYCL backend, adding the remaining kernels and testing thoroughly, addressing any issues.

We believe this addition will be a valuable asset to the nerf-studio community and look forward to your feedback.

Feature list:

  • 3DGS fused kernels
  • 3DGS packed kernels
  • Distributed training (PyTorch 2.8+)
  • MCMC strategy (relocation kernel)
  • 2DGS fused kernels
  • 2DGS packed kernels
  • 3DGUT kernels (+FTheta cameras)
  • Fused Bilateral grid kernels (from https://github.com/harry7557558/fused-bilagrid)
  • 3DGS Compression (requires PLAS)
  • rasterize_to_indices_{2,3}dgs and rasterize_to_pixels_from_world_3dgs kernels (only used internally for testing)

@ssheorey
Copy link
Copy Markdown
Member

@asrathore-ai can you check the CI failures?

@ssheorey ssheorey requested a review from Copilot September 12, 2025 00:04
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request introduces initial support for the SYCL backend in gsplat, enabling it to run on a wider range of hardware beyond CUDA-enabled GPUs, particularly Intel GPUs. The changes are designed to be minimal, non-intrusive, and lay a clean foundation for a multi-backend architecture.

  • Adds SYCL backend detection and compilation infrastructure in setup.py
  • Implements backend selection logic in gsplat/init.py to conditionally import CUDA or SYCL compiled modules
  • Moves shared torch implementations to package root for use by both backends
  • Updates test infrastructure to support both CUDA and SYCL devices with appropriate backend detection

Reviewed Changes

Copilot reviewed 66 out of 67 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
setup.py Adds SYCL toolchain detection and CMake-based build configuration for SYCL backend
tests/test_basic.py Updates test infrastructure to support both CUDA and SYCL backends with conditional device selection
gsplat/sycl/src/*.cpp Implements SYCL kernel wrappers and operators for spherical harmonics, projection, rasterization and other core functions
gsplat/sycl/include/*.hpp Provides SYCL kernel implementations and utility headers for mathematical operations
gsplat/sycl/ext.cpp Python binding definitions for SYCL backend functions
Comments suppressed due to low confidence (2)

gsplat/sycl/include/kernels/RasterizeToPixelsFwdKernel.hpp:1

  • The local accessor slm_color is declared but only conditionally initialized. When the condition is false, slm_color remains uninitialized but may still be passed to the kernel constructor.
#ifndef RasterizeToPixelsFwdKernel_HPP

gsplat/sycl/include/kernels/RasterizeToPixelsBwdKernel.hpp:1

  • Same issue as in the forward kernel - slm_color is declared but only conditionally initialized, which could lead to undefined behavior when passed to the kernel constructor.
#ifndef RasterizeToPixelsBwdKernel_HPP

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread setup.py Outdated
Comment thread tests/test_basic.py Outdated
Comment on lines +173 to +176
if constexpr(BufferType<S, COLOR_DIM>::isVec && COLOR_DIM == 4){
// means(2) + conics(3) + colors(4) + opac(1)
auto temp1 = *(reinterpret_cast<const sycl::vec<S,8>*>(data) );
auto temp2 = *(reinterpret_cast<const sycl::vec<S,2>*>(data + 8) );
Copy link

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic numbers like 8 and 2 in the data offsets should be defined as constants (e.g., MEANS_SIZE + CONICS_SIZE) to improve code readability and maintainability.

Copilot uses AI. Check for mistakes.
if (m_v_means2d_abs != nullptr) {
local_mean_abs = sycl::reduce_over_group( work_item.get_group(), v_xy_abs_local, sycl::plus<sycl::vec<S, 2>>());
}

Copy link

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition threadRank == idx seems counterintuitive. Consider adding a comment to explain why this specific thread is chosen for the atomic operations.

Suggested change
// Only the thread with threadRank == idx performs the atomic operations.
// This ensures that exactly one thread per group (the one corresponding to the original work item)
// updates the global memory after the group reduction, avoiding redundant atomic updates.

Copilot uses AI. Check for mistakes.
fix mismatch between pass by value (CUDA) and pass by ref (sycl) for at::Tensor
Copy link
Copy Markdown
Member

@ssheorey ssheorey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WIP] Some comments and updates. I tried building and running on Windows and got some build and link errors.
Current linker error:

lld-link: error: undefined symbol: __declspec(dllimport) public: static class pybind11::handle __cdecl pybind11::detail::type_caster<class at::Tensor, void>::cast(class at::Tensor const &, enum pybind11::return_value_policy, class pybind11::handle)
>>> referenced by ext.cpp
>>>               C:\Users\intel\AppData\Local\Temp\icx-00d308f3f9\ext-ed469e.obj

lld-link: error: undefined symbol: __declspec(dllimport) public: bool __cdecl pybind11::detail::type_caster<class at::Tensor, void>::load(class pybind11::handle, bool)
>>> referenced by ext.cpp
>>>               C:\Users\intel\AppData\Local\Temp\icx-00d308f3f9\ext-ed469e.obj

lld-link: error: undefined symbol: __declspec(dllimport) public: class at::Tensor && __cdecl pybind11::detail::type_caster<class at::Tensor, void>::operator class at::Tensor &&(void) &&
>>> referenced by ext.cpp
>>>               C:\Users\intel\AppData\Local\Temp\icx-00d308f3f9\ext-ed469e.obj
icx: error: linker command failed with exit code 1 (use -v to see invocation)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the glm upgrade required? If yes, we need to explain why. If no, this should not be part of this PR.

Comment thread gsplat/sycl/src/adam.cpp
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change file names to CamelCase.cpp to match CUDA src file convention.

Comment thread gsplat/sycl/src/intersect_offset.cpp Outdated
Comment thread gsplat/sycl/include/utils.hpp Outdated
Comment thread gsplat/sycl/include/kernels/ComputeShFwdKernel.hpp Outdated
Copilot AI and others added 2 commits February 23, 2026 23:00
…e install

Co-authored-by: ssheorey <41028320+ssheorey@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants