Flash Attention on hipBLAS

This is more of a request for info than an actual issue. Running the Vulkan driver on default koboldCPP with Flash Attention enabled completely tanks processing speed. I noticed that the same thing happens with the hipBLAS version which is a bit surprising. Is there any work around for this? I would like to try messing with KV Cache quantization but the V Cache part requires Flash Attention. I haven't tested K Cache quantization on hipBLAS yet but on Vulkan at least, K Cache quantization doesn't even save any memory. (GPU is an RX 6650XT and I'm running 64-bit Windows 10, since from my understanding, some things work differently between Windows and Linux in this regard.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flash Attention on hipBLAS #143

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Flash Attention on hipBLAS #143

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions