Why is there 2 different splash attention kernels?

I notice you can use splash attention from [jax-ml](https://github.com/jax-ml/jax/blob/756c1bc5d2e6ce23887974ac2055aad2908c92f5/jax/experimental/pallas/ops/tpu/splash_attention/splash_attention_kernel.py#L855) (`flash`) or [tokamax](https://github.com/openxla/tokamax/blob/main/tokamax/_src/ops/experimental/tpu/splash_attention/splash_attention_kernel.py) (`tokamax_flash`). Im wondering why there's two different sources for it, and which would be recommended? Glancing at the two versions, the tokamax one seems to be a bit more up to date? But wanted to double check. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is there 2 different splash attention kernels? #361

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why is there 2 different splash attention kernels? #361

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions