Add python API for flash-attn by YangKai0616 · Pull Request #558 · intel/sycl-tla

YangKai0616 · 2025-10-13T14:22:28Z

This PR provides a template for a flash-attn Python API. Currently, based on a sycl-tla kernel, I have reproduced part of flash-attn’s functionality, such as fwd and varlen_fwd.

I see that the current implementation of the kernel does not expose an external interface, and the test files are quite limited in scope. If possible, perhaps we can jointly maintain a common API, similar to what Dao-AILab/flash-attention does. Subsequent unit tests can be based on test_flash_attn.py to remain consistent with the CUDA official interface.

During the reproduction I found some edge-case issues when running tests. As a result, I modified xe_flash_attn_prefill_epilogue.hpp and xe_flash_attn_prefill.hpp.

If you are interested, we can discuss this further. If there are any issues with the current code, please let me know. Thanks!

Current method to build the API:

cd /workspace/sycl-tla/examples/06_bmg_flash_attention/flash-attn
CUTLASS_SYCL_SRC_DIR=/workspace/sycl-tla pip install --no-build-isolation .

After that, you can run tests using the same import statements as in test_flash_attn.py.

yao-matrix · 2025-10-15T19:45:32Z

@rolandschulz, could you help review? Thx very much. The context is we are integrating flash-attention-2 kernel to Hugging Face, so need align API w/ CUDA to make it easy to integrate, thx,

Antonyvance · 2025-10-17T21:43:27Z

Need to figure out how to place python package, torch extension and associated triton kernel. Let's hold this since it requires a redesign.

tdeng5 · 2026-03-31T07:28:05Z

We have refined Flash Attention implementation, if you still need this PR, please update it bases on the latest source code.

YangKai0616 · 2026-04-01T06:32:19Z

We have refined Flash Attention implementation, if you still need this PR, please update it bases on the latest source code.

Got it, thx!

YangKai0616 added 2 commits October 13, 2025 13:55

Add python API for flash-attn

f6e3e7e

Update makefiles

11627fe

Antonyvance added wontfix This will not be worked on redesign required Implementation require a redesign information required The PR requires more information to review them properly labels Oct 17, 2025

Merge branch 'main' into main

7ed710c

YangKai0616 closed this Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add python API for flash-attn#558

Add python API for flash-attn#558
YangKai0616 wants to merge 3 commits intointel:mainfrom
YangKai0616:main

YangKai0616 commented Oct 13, 2025 •

edited

Loading

Uh oh!

yao-matrix commented Oct 15, 2025 •

edited

Loading

Uh oh!

Antonyvance commented Oct 17, 2025

Uh oh!

tdeng5 commented Mar 31, 2026

Uh oh!

YangKai0616 commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

YangKai0616 commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yao-matrix commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Antonyvance commented Oct 17, 2025

Uh oh!

tdeng5 commented Mar 31, 2026

Uh oh!

YangKai0616 commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

YangKai0616 commented Oct 13, 2025 •

edited

Loading

yao-matrix commented Oct 15, 2025 •

edited

Loading