Commit 3191462
authored
vulkan: improve partial offloading performance on AMD (ggml-org#19976)
* vulkan: fix and enable cpy_tensor_async function
* use transfer_queue for async transfers on AMD, synchronize with timeline semaphore
* update offload_op logic
* fix missing transfer submission
* disable async transfer queue on AMD GCN
* revert op batch size change
* fix cpy_tensor_async checks1 parent 66d65ec commit 3191462
1 file changed
Lines changed: 177 additions & 86 deletions
0 commit comments