[ExecuTorch][WebGPU] Add update_cache op (llama.update_cache)#20083
[ExecuTorch][WebGPU] Add update_cache op (llama.update_cache)#20083JulianCloudNTH wants to merge 6 commits into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20083
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Unclassified FailureAs of commit 53c3eb6 with merge base ff2bf9c ( UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
@claude review |
|
Claude finished @JulianCloudNTH's task in 1m 56s —— View job Code Review:
|
|
Claude finished @JulianCloudNTH's task in 1m 56s —— View job Code Review:
|
Stack from ghstack (oldest at bottom):
Add
llama.update_cache.default: an in-place KV-cache write. The shader scatters the new K/V ([1,S,H,D]) into the cache ([1,Cmax,H,D]) atdst_offset = input_pos*n_heads*head_dim, bounds-checked against the cache size. The handler validates shape (batch==1, matching n_heads/head_dim) and sizes the 1D dispatch from the device limit viaWebGPUUtilsbefore allocating. Mirrors the Vulkansdpa_kv_cache_updatereference. The export/delegation test is the follow-up diff stacked directly above. Authored with assistance from Claude.@exported-using-ghexport
Differential Revision: D107547308
Differential Revision: D107547308