Skip to content

Add python API for flash-attn#558

Closed
YangKai0616 wants to merge 3 commits intointel:mainfrom
YangKai0616:main
Closed

Add python API for flash-attn#558
YangKai0616 wants to merge 3 commits intointel:mainfrom
YangKai0616:main

Conversation

@YangKai0616
Copy link
Copy Markdown

@YangKai0616 YangKai0616 commented Oct 13, 2025

This PR provides a template for a flash-attn Python API. Currently, based on a sycl-tla kernel, I have reproduced part of flash-attn’s functionality, such as fwd and varlen_fwd.

I see that the current implementation of the kernel does not expose an external interface, and the test files are quite limited in scope. If possible, perhaps we can jointly maintain a common API, similar to what Dao-AILab/flash-attention does. Subsequent unit tests can be based on test_flash_attn.py to remain consistent with the CUDA official interface.

During the reproduction I found some edge-case issues when running tests. As a result, I modified xe_flash_attn_prefill_epilogue.hpp and xe_flash_attn_prefill.hpp.

If you are interested, we can discuss this further. If there are any issues with the current code, please let me know. Thanks!

Current method to build the API:

cd /workspace/sycl-tla/examples/06_bmg_flash_attention/flash-attn
CUTLASS_SYCL_SRC_DIR=/workspace/sycl-tla pip install --no-build-isolation .

After that, you can run tests using the same import statements as in test_flash_attn.py.

@yao-matrix
Copy link
Copy Markdown

yao-matrix commented Oct 15, 2025

@rolandschulz, could you help review? Thx very much. The context is we are integrating flash-attention-2 kernel to Hugging Face, so need align API w/ CUDA to make it easy to integrate, thx,

@Antonyvance Antonyvance added wontfix This will not be worked on redesign required Implementation require a redesign information required The PR requires more information to review them properly labels Oct 17, 2025
@Antonyvance
Copy link
Copy Markdown

Need to figure out how to place python package, torch extension and associated triton kernel. Let's hold this since it requires a redesign.

@tdeng5
Copy link
Copy Markdown

tdeng5 commented Mar 31, 2026

We have refined Flash Attention implementation, if you still need this PR, please update it bases on the latest source code.

@YangKai0616
Copy link
Copy Markdown
Author

We have refined Flash Attention implementation, if you still need this PR, please update it bases on the latest source code.

Got it, thx!

@YangKai0616 YangKai0616 closed this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

information required The PR requires more information to review them properly redesign required Implementation require a redesign wontfix This will not be worked on

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants