[ExecuTorch][WebGPU] Add update_cache tests (native numeric + export) by JulianCloudNTH · Pull Request #20084 · pytorch/executorch

JulianCloudNTH · 2026-06-06T07:14:48Z

Stack from ghstack (oldest at bottom):

[ExecuTorch][WebGPU] SDPA test suite: replay + dynamic input_pos + in-graph KV cache #20087
[ExecuTorch][WebGPU] Add fused SDPA (sdpa_with_kv_cache) with dynamic input_pos #20086
[ExecuTorch][WebGPU] SymInt live-scalar mechanism + et_vk.select_as_symint #20085
-> [ExecuTorch][WebGPU] Add update_cache tests (native numeric + export) #20084
[ExecuTorch][WebGPU] Add update_cache op (llama.update_cache) #20083
[ExecuTorch][WebGPU] Add per-pass dispatch ordering + scratch buffer tests #20080
[ExecuTorch][WebGPU] Switch native backend from wgpu-native to Dawn (Tint) + SwiftShader #20079
[ExecuTorch][WebGPU] Graph-owned scratch buffers for fused-op intermediates #20073
[ExecuTorch][WebGPU] Per-pass compute dispatch ordering for fused multi-dispatch ops #20072

Tests for llama.update_cache.default, stacked on the op diff below. test/ops/sdpa/test_update_cache.py lowers the op through VulkanPartitioner (asserting it delegates to VulkanBackend) and exports per-case .ptes; test/native/test_update_cache.cpp runs them on-GPU and checks an integer-exact scatter golden against the returned cache. Coverage mirrors the Vulkan KV-cache test (VulkanSDPATest): single-shot writes at varied shapes/offsets, plus a multi-step advancing-input_pos replay that threads the returned cache across steps over the same GQA param sets (incl. llama3 head_dim=128). Comparing the cache directly is stronger than Vulkan, which checks it only indirectly via the SDPA output. Authored with assistance from Claude.
@exported-using-ghexport

Differential Revision: D107547307

[ghstack-poisoned]

pytorch-bot · 2026-06-06T07:14:52Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20084

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Unclassified Failure

As of commit c5ff88f with merge base ff2bf9c ():

UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:

Test WebGPU Native (Dawn) / test-webgpu-native / linux-job (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
/pytorch/executorch/backends/webgpu/test/native/test_scratch_buffer.cpp:67:3: error: use of undeclared identifier 'webgpu_wait'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-06T07:15:44Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

[ghstack-poisoned]

Pull Request resolved: #20084 Tests for `llama.update_cache.default`, stacked on the op diff below. `test/ops/sdpa/test_update_cache.py` lowers the op through `VulkanPartitioner` (asserting it delegates to VulkanBackend) and exports per-case `.pte`s; `test/native/test_update_cache.cpp` runs them on-GPU and checks an integer-exact scatter golden against the returned cache. Coverage mirrors the Vulkan KV-cache test (`VulkanSDPATest`): single-shot writes at varied shapes/offsets, plus a multi-step advancing-input_pos replay that threads the returned cache across steps over the same GQA param sets (incl. llama3 head_dim=128). Comparing the cache directly is stronger than Vulkan, which checks it only indirectly via the SDPA output. Authored with assistance from Claude. ghstack-source-id: 391097890 @exported-using-ghexport Differential Revision: [D107547307](https://our.internmc.facebook.com/intern/diff/D107547307/)

JulianCloudNTH · 2026-06-08T18:16:43Z

@claude review

claude · 2026-06-08T18:17:37Z

Claude finished @JulianCloudNTH's task in 1m 31s —— View job

Code Review: WebGPU update_cache tests

Read and understand all changed files
Review CMakeLists.txt changes
Review native C++ test file
Review Python export test file
Post review feedback

Overall this is a well-structured, thorough test suite. The coverage across single-shot writes, multi-step replay, and negative/guard-violation cases is solid. A few observations:

`backends/webgpu/test/native/test_update_cache.cpp`

Short-circuit on failure (line 263): ok = run_case(dir, tc) && ok will short-circuit — if run_case returns false, subsequent iterations still run (since ok is the RHS), but if you accidentally swap the order to ok && run_case(...), later cases would be skipped. The current form is correct but might trip someone up in a refactor. A slightly more explicit pattern would be if (!run_case(...)) ok = false; — though this is a stylistic nit, not a bug.
std::vector in ReplayCase prevents constexpr (line 116-119): kCases is constexpr but kReplays (line 266) cannot be due to std::vector<int> seq_lens. This is fine for a test, just noting the asymmetry. An std::initializer_list<int> or a fixed-size array + count would allow constexpr, but not worth the complexity for a test binary.
Negative test assumes load_forward surfaces the error (lines 225-226): The test expects DelegateInvalidCompatibility from module.load_forward(). I confirmed the WebGPU backend does return this error from its init when graph->build() throws. This coupling is correct for the current implementation — just noting it's an integration-level assertion (if the backend changes how it surfaces guard failures, this test would need updating).
No cleanup of WebGPUContext on early return (line 257): If create_webgpu_context() succeeds but a later run_case crashes (not returns false, but actually crashes), the context leaks. Not a real concern for a test binary, but an RAII wrapper would be more robust. Minor nit.

`backends/webgpu/test/ops/sdpa/test_update_cache.py`

Export helpers are not tested by unittest.main() (line 195-196): The file has two roles: (a) unittest cases (TestUpdateCache) that verify delegation, and (b) standalone export functions (export_update_cache_cases, export_update_cache_replay, export_update_cache_negative) that produce .pte files for the native test. This dual-purpose pattern is consistent with how the rest of the WebGPU tests work, but there's no indication of how the export functions get invoked (presumably a CI script or Makefile target). Consider adding a brief note or pointer in the module docstring about the invocation path.
VulkanPartitioner usage is intentional and consistent with other WebGPU op tests (e.g., test_add.py). The WebGPU backend reuses Vulkan's FlatBuffer format and partitioner, which is documented in the CMakeLists.txt comments.
_NEGATIVE_CASES comment (line 162): The inline comment # batch must be 1 and # fp32-only are helpful documentation of the guard being tested. Good practice.

`backends/webgpu/CMakeLists.txt`

Clean integration (lines 128-131): The new test target follows the established add_webgpu_native_test pattern exactly. No issues.

Summary

This is clean, well-organized test code. No bugs or correctness issues found. The test design — deterministic integer-exact scatter with inline golden computation, no external golden files — is a good pattern that avoids flaky tolerance issues while still exercising the full export-to-GPU pipeline. The negative tests are a nice touch for ensuring graceful error handling.

Verdict: Looks good to merge. The only actionable suggestion is (5) — documenting how the export helper functions get called in CI, but that could also live in a follow-up or in the CI script itself.

[ghstack-poisoned]

Pull Request resolved: #20084 Tests for `llama.update_cache.default`, stacked on the op diff below. `test/ops/sdpa/test_update_cache.py` lowers the op through `VulkanPartitioner` (asserting it delegates to VulkanBackend) and exports per-case `.pte`s; `test/native/test_update_cache.cpp` runs them on-GPU and checks an integer-exact scatter golden against the returned cache. Coverage mirrors the Vulkan KV-cache test (`VulkanSDPATest`): single-shot writes at varied shapes/offsets, plus a multi-step advancing-input_pos replay that threads the returned cache across steps over the same GQA param sets (incl. llama3 head_dim=128). Comparing the cache directly is stronger than Vulkan, which checks it only indirectly via the SDPA output. Authored with assistance from Claude. ghstack-source-id: 391115052 @exported-using-ghexport Differential Revision: [D107547307](https://our.internmc.facebook.com/intern/diff/D107547307/)

[ghstack-poisoned]

Update

133e019

[ghstack-poisoned]

JulianCloudNTH requested review from kirklandsign and larryliu0820 as code owners June 6, 2026 07:14

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 6, 2026

Update

e1c5b46

[ghstack-poisoned]

meta-codesync Bot added the meta-exported label Jun 8, 2026

Update

139fc65

[ghstack-poisoned]

SS-JIA approved these changes Jun 8, 2026

View reviewed changes

JulianCloudNTH added 4 commits June 8, 2026 22:26

Update

44e9b27

[ghstack-poisoned]

Update

947cfbb

[ghstack-poisoned]

Update

156937d

[ghstack-poisoned]

Update

c5ff88f

[ghstack-poisoned]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ExecuTorch][WebGPU] Add update_cache tests (native numeric + export)#20084

[ExecuTorch][WebGPU] Add update_cache tests (native numeric + export)#20084
JulianCloudNTH wants to merge 7 commits into
gh/JulianCloudNTH/17/basefrom
gh/JulianCloudNTH/17/head

JulianCloudNTH commented Jun 6, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 6, 2026

Uh oh!

JulianCloudNTH commented Jun 8, 2026

Uh oh!

claude Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JulianCloudNTH commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20084

❌ 1 Unclassified Failure

Uh oh!

github-actions Bot commented Jun 6, 2026

This PR needs a release notes: label

Uh oh!

JulianCloudNTH commented Jun 8, 2026

Uh oh!

claude Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: WebGPU update_cache tests

backends/webgpu/test/native/test_update_cache.cpp

backends/webgpu/test/ops/sdpa/test_update_cache.py

backends/webgpu/CMakeLists.txt

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JulianCloudNTH commented Jun 6, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 6, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude Bot commented Jun 8, 2026 •

edited

Loading

`backends/webgpu/test/native/test_update_cache.cpp`

`backends/webgpu/test/ops/sdpa/test_update_cache.py`

`backends/webgpu/CMakeLists.txt`