Skip to content

feat: add rope_in_place tilelang kernel for npu device.#964

Merged
zhang-minchao merged 2 commits intojd-opensource:mainfrom
zhang-minchao:feat/tilelang
Mar 31, 2026
Merged

feat: add rope_in_place tilelang kernel for npu device.#964
zhang-minchao merged 2 commits intojd-opensource:mainfrom
zhang-minchao:feat/tilelang

Conversation

@zhang-minchao
Copy link
Copy Markdown
Collaborator

@zhang-minchao zhang-minchao commented Feb 28, 2026

case torch_native_ms tilelang_ms speedup
1x576_start512_rope64 0.116037 0.0230664 5.03056x
8x576_start512_rope64 0.116174 0.0230544 5.03914x
47x576_start512_rope64 0.119172 0.023024 5.17598x
48x576_start512_rope64 0.118605 0.0232646 5.09811x
49x576_start512_rope64 0.119092 0.023336 5.10335x
95x576_start512_rope64 0.119298 0.0229754 5.19244x
96x576_start512_rope64 0.119057 0.0230834 5.15769x
97x576_start512_rope64 0.118857 0.0230322 5.16046x
128x576_start512_rope64 0.119686 0.0230274 5.19754x
512x576_start512_rope64 0.119086 0.025742 4.62613x
1024x576_start512_rope64 0.136578 0.0328556 4.1569x
2048x576_start512_rope64 0.143184 0.0464708 3.08116x

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new TileLang-based RoPE kernel for Ascend NPUs, including the necessary build system infrastructure, C++ wrappers, and tests. The changes are extensive and add significant new functionality. However, there are several critical and high-severity issues that should be addressed. The build process relies on a brittle script patching mechanism in setup.py that could easily break. The C++ wrapper for the kernel has a critical performance issue due to inefficient tensor broadcasting and an unsafe use of const_cast. Additionally, the CMake configuration for building the kernel is not flexible, with hardcoded dimensions and a fragile sed-based code modification step. Addressing these issues will improve the robustness, performance, and maintainability of the new kernel and its build process.

Comment thread setup.py Outdated
Comment thread xllm/core/kernels/npu/tilelang/rope_wrapper.cpp
Comment thread xllm/core/kernels/npu/tilelang/CMakeLists.txt Outdated
Comment thread xllm/core/kernels/npu/tilelang/CMakeLists.txt Outdated
Comment thread xllm/core/kernels/npu/tilelang/rope_wrapper.cpp Outdated
@XuZhang99 XuZhang99 changed the title feat: add rope_in_place tilelang kernel. feat: add rope_in_place tilelang kernel for npu device. Feb 28, 2026
@zhang-minchao zhang-minchao marked this pull request as ready for review March 19, 2026 15:15
@zhang-minchao
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new TileLang kernel for in-place RoPE (Rotary Positional Embedding) on NPU devices, significantly improving performance as demonstrated by the provided benchmarks. The changes include adding a new submodule for tilelang-ascend, comprehensive documentation for kernel development in both English and Chinese, and the necessary Python and CMake infrastructure to compile and integrate these kernels. The implementation includes robust validation checks and a well-structured dispatch mechanism for different kernel variants. Unit tests are also provided to ensure correctness and measure performance.

One high-severity issue was identified related to the patching of a third-party script, which could lead to build fragility.

Comment thread xllm/compiler/tilelang/bootstrap.py Outdated
Comment thread xllm/core/kernels/npu/tilelang/tilelang_ops_api.h
Comment thread xllm/core/kernels/npu/tilelang/rope_wrapper.cpp Outdated
yingxudeng
yingxudeng previously approved these changes Mar 20, 2026
Copy link
Copy Markdown
Collaborator

@yingxudeng yingxudeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition. The included skill enables quick integration trials downstream.

Comment thread .codex/skills/tilelang_ascend_kernel/SKILL.md Outdated
yingxudeng
yingxudeng previously approved these changes Mar 22, 2026
XuZhang99
XuZhang99 previously approved these changes Mar 23, 2026
yingxudeng
yingxudeng previously approved these changes Mar 24, 2026
XuZhang99
XuZhang99 previously approved these changes Mar 24, 2026
@zhang-minchao zhang-minchao dismissed stale reviews from XuZhang99 and yingxudeng via 1de275d March 24, 2026 10:06
yingxudeng
yingxudeng previously approved these changes Mar 25, 2026
yingxudeng
yingxudeng previously approved these changes Mar 25, 2026
Comment thread third_party/tilelang-ascend
yingxudeng
yingxudeng previously approved these changes Mar 25, 2026
yingxudeng
yingxudeng previously approved these changes Mar 26, 2026
XuZhang99
XuZhang99 previously approved these changes Mar 26, 2026
Comment thread .codex/skills/tilelang_ascend_kernel/agents/openai.yaml Outdated
@zhang-minchao zhang-minchao merged commit 1df961a into jd-opensource:main Mar 31, 2026
15 of 23 checks passed
DongheJin pushed a commit to DongheJin/xllm that referenced this pull request Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants