Feature: Add initial SYCL backend support for gsplat#1
Feature: Add initial SYCL backend support for gsplat#1asrathore-ai wants to merge 61 commits intomainfrom
Conversation
|
@asrathore-ai can you check the CI failures? |
There was a problem hiding this comment.
Pull Request Overview
This pull request introduces initial support for the SYCL backend in gsplat, enabling it to run on a wider range of hardware beyond CUDA-enabled GPUs, particularly Intel GPUs. The changes are designed to be minimal, non-intrusive, and lay a clean foundation for a multi-backend architecture.
- Adds SYCL backend detection and compilation infrastructure in setup.py
- Implements backend selection logic in gsplat/init.py to conditionally import CUDA or SYCL compiled modules
- Moves shared torch implementations to package root for use by both backends
- Updates test infrastructure to support both CUDA and SYCL devices with appropriate backend detection
Reviewed Changes
Copilot reviewed 66 out of 67 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| setup.py | Adds SYCL toolchain detection and CMake-based build configuration for SYCL backend |
| tests/test_basic.py | Updates test infrastructure to support both CUDA and SYCL backends with conditional device selection |
| gsplat/sycl/src/*.cpp | Implements SYCL kernel wrappers and operators for spherical harmonics, projection, rasterization and other core functions |
| gsplat/sycl/include/*.hpp | Provides SYCL kernel implementations and utility headers for mathematical operations |
| gsplat/sycl/ext.cpp | Python binding definitions for SYCL backend functions |
Comments suppressed due to low confidence (2)
gsplat/sycl/include/kernels/RasterizeToPixelsFwdKernel.hpp:1
- The local accessor
slm_coloris declared but only conditionally initialized. When the condition is false,slm_colorremains uninitialized but may still be passed to the kernel constructor.
#ifndef RasterizeToPixelsFwdKernel_HPP
gsplat/sycl/include/kernels/RasterizeToPixelsBwdKernel.hpp:1
- Same issue as in the forward kernel -
slm_coloris declared but only conditionally initialized, which could lead to undefined behavior when passed to the kernel constructor.
#ifndef RasterizeToPixelsBwdKernel_HPP
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| if constexpr(BufferType<S, COLOR_DIM>::isVec && COLOR_DIM == 4){ | ||
| // means(2) + conics(3) + colors(4) + opac(1) | ||
| auto temp1 = *(reinterpret_cast<const sycl::vec<S,8>*>(data) ); | ||
| auto temp2 = *(reinterpret_cast<const sycl::vec<S,2>*>(data + 8) ); |
There was a problem hiding this comment.
Magic numbers like 8 and 2 in the data offsets should be defined as constants (e.g., MEANS_SIZE + CONICS_SIZE) to improve code readability and maintainability.
| if (m_v_means2d_abs != nullptr) { | ||
| local_mean_abs = sycl::reduce_over_group( work_item.get_group(), v_xy_abs_local, sycl::plus<sycl::vec<S, 2>>()); | ||
| } | ||
|
|
There was a problem hiding this comment.
The condition threadRank == idx seems counterintuitive. Consider adding a comment to explain why this specific thread is chosen for the atomic operations.
| // Only the thread with threadRank == idx performs the atomic operations. | |
| // This ensures that exactly one thread per group (the one corresponding to the original work item) | |
| // updates the global memory after the group reduction, avoiding redundant atomic updates. |
fix mismatch between pass by value (CUDA) and pass by ref (sycl) for at::Tensor
ssheorey
left a comment
There was a problem hiding this comment.
[WIP] Some comments and updates. I tried building and running on Windows and got some build and link errors.
Current linker error:
lld-link: error: undefined symbol: __declspec(dllimport) public: static class pybind11::handle __cdecl pybind11::detail::type_caster<class at::Tensor, void>::cast(class at::Tensor const &, enum pybind11::return_value_policy, class pybind11::handle)
>>> referenced by ext.cpp
>>> C:\Users\intel\AppData\Local\Temp\icx-00d308f3f9\ext-ed469e.obj
lld-link: error: undefined symbol: __declspec(dllimport) public: bool __cdecl pybind11::detail::type_caster<class at::Tensor, void>::load(class pybind11::handle, bool)
>>> referenced by ext.cpp
>>> C:\Users\intel\AppData\Local\Temp\icx-00d308f3f9\ext-ed469e.obj
lld-link: error: undefined symbol: __declspec(dllimport) public: class at::Tensor && __cdecl pybind11::detail::type_caster<class at::Tensor, void>::operator class at::Tensor &&(void) &&
>>> referenced by ext.cpp
>>> C:\Users\intel\AppData\Local\Temp\icx-00d308f3f9\ext-ed469e.obj
icx: error: linker command failed with exit code 1 (use -v to see invocation)
There was a problem hiding this comment.
Is the glm upgrade required? If yes, we need to explain why. If no, this should not be part of this PR.
There was a problem hiding this comment.
Change file names to CamelCase.cpp to match CUDA src file convention.
…accelerator in the future. Ensure empty_cache() and synchronize() for XPU match those for CUDA,
…e install Co-authored-by: ssheorey <41028320+ssheorey@users.noreply.github.com>
This pull request introduces initial support for the SYCL backend in gsplat, enabling it to run on a wider range of hardware beyond CUDA-enabled GPUs, particularly Intel GPUs. This is achieved with minimal, non-intrusive changes to the existing codebase, laying a clean foundation for a multi-backend architecture.
Motivation
The primary goal is to broaden the accessibility of gsplat by supporting open, cross-platform standards. By enabling SYCL, we unlock a larger ecosystem of hardware and empower more users to contribute to and benefit from this project. This PR also serves as a template for how other backends can be integrated in the future.
Key Changes
To ensure a smooth integration and easy review, the changes are intentionally minimal and isolated:
setup.py: Updated to detect the SYCL toolchain and compile the SYCL-specific kernels when the appropriate environment is found. The CUDA build path remains the default and is unaffected if a SYCL compiler is not present.gsplat/__init__.py: Modified to conditionally import the correct compiled backend (cuda or sycl) at runtime. This avoids breaking changes for existing users and abstracts the backend logic away from the core library._torch_impl.py/_torch_impl_2dgs.py: Since these file are share by both cuda and sycl, they were moved to gsplat package root.No changes were made to the core CUDA implementation or the high-level Python logic. This approach ensures that the existing functionality remains stable and untouched.
Testing Strategy
tests/test_basic.py,tests/test_2dgs.py,tests/test_rasterization.py,tests/test_strategy.py: The basic kernel tests now work for both cuda and xpu(sycl).tests/test_rasterization.py,tests/test_basic.py,tests/test_2dgs.py: The verification of rasterized results wrt PyTorch implementation is disabled for XPU, since this needs nerfacc (unavailable on XPU).tests/test_ftheta.py,tests/test_compression.py: These features are not implemented.Future Work
This PR represents the foundational step for full SYCL support. A few kernels are still in development and will be added in subsequent PRs.
We are fully committed to the success of this integration. Our team will take on the role of fully integrating the SYCL backend, adding the remaining kernels and testing thoroughly, addressing any issues.
We believe this addition will be a valuable asset to the nerf-studio community and look forward to your feedback.
Feature list:
rasterize_to_indices_{2,3}dgsandrasterize_to_pixels_from_world_3dgskernels (only used internally for testing)