Skip to content

[ExecuTorch][WebGPU] Add update_cache tests (native numeric + export)#20084

Open
JulianCloudNTH wants to merge 7 commits into
gh/JulianCloudNTH/17/basefrom
gh/JulianCloudNTH/17/head
Open

[ExecuTorch][WebGPU] Add update_cache tests (native numeric + export)#20084
JulianCloudNTH wants to merge 7 commits into
gh/JulianCloudNTH/17/basefrom
gh/JulianCloudNTH/17/head

Conversation

@JulianCloudNTH

@JulianCloudNTH JulianCloudNTH commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Stack from ghstack (oldest at bottom):

Tests for llama.update_cache.default, stacked on the op diff below. test/ops/sdpa/test_update_cache.py lowers the op through VulkanPartitioner (asserting it delegates to VulkanBackend) and exports per-case .ptes; test/native/test_update_cache.cpp runs them on-GPU and checks an integer-exact scatter golden against the returned cache. Coverage mirrors the Vulkan KV-cache test (VulkanSDPATest): single-shot writes at varied shapes/offsets, plus a multi-step advancing-input_pos replay that threads the returned cache across steps over the same GQA param sets (incl. llama3 head_dim=128). Comparing the cache directly is stronger than Vulkan, which checks it only indirectly via the SDPA output. Authored with assistance from Claude.
@exported-using-ghexport

Differential Revision: D107547307

Differential Revision: D107547307

[ghstack-poisoned]
@pytorch-bot

pytorch-bot Bot commented Jun 6, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20084

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Unclassified Failure

As of commit c5ff88f with merge base ff2bf9c (image):

UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

[ghstack-poisoned]
JulianCloudNTH added a commit that referenced this pull request Jun 8, 2026
Pull Request resolved: #20084

Tests for `llama.update_cache.default`, stacked on the op diff below. `test/ops/sdpa/test_update_cache.py` lowers the op through `VulkanPartitioner` (asserting it delegates to VulkanBackend) and exports per-case `.pte`s; `test/native/test_update_cache.cpp` runs them on-GPU and checks an integer-exact scatter golden against the returned cache. Coverage mirrors the Vulkan KV-cache test (`VulkanSDPATest`): single-shot writes at varied shapes/offsets, plus a multi-step advancing-input_pos replay that threads the returned cache across steps over the same GQA param sets (incl. llama3 head_dim=128). Comparing the cache directly is stronger than Vulkan, which checks it only indirectly via the SDPA output. Authored with assistance from Claude.
ghstack-source-id: 391097890
@exported-using-ghexport

Differential Revision: [D107547307](https://our.internmc.facebook.com/intern/diff/D107547307/)
@JulianCloudNTH

Copy link
Copy Markdown
Contributor Author

@claude review

@claude

claude Bot commented Jun 8, 2026

Copy link
Copy Markdown

Claude finished @JulianCloudNTH's task in 1m 31s —— View job


Code Review: WebGPU update_cache tests

  • Read and understand all changed files
  • Review CMakeLists.txt changes
  • Review native C++ test file
  • Review Python export test file
  • Post review feedback

Overall this is a well-structured, thorough test suite. The coverage across single-shot writes, multi-step replay, and negative/guard-violation cases is solid. A few observations:

backends/webgpu/test/native/test_update_cache.cpp

  1. Short-circuit on failure (line 263): ok = run_case(dir, tc) && ok will short-circuit — if run_case returns false, subsequent iterations still run (since ok is the RHS), but if you accidentally swap the order to ok && run_case(...), later cases would be skipped. The current form is correct but might trip someone up in a refactor. A slightly more explicit pattern would be if (!run_case(...)) ok = false; — though this is a stylistic nit, not a bug.

  2. std::vector in ReplayCase prevents constexpr (line 116-119): kCases is constexpr but kReplays (line 266) cannot be due to std::vector<int> seq_lens. This is fine for a test, just noting the asymmetry. An std::initializer_list<int> or a fixed-size array + count would allow constexpr, but not worth the complexity for a test binary.

  3. Negative test assumes load_forward surfaces the error (lines 225-226): The test expects DelegateInvalidCompatibility from module.load_forward(). I confirmed the WebGPU backend does return this error from its init when graph->build() throws. This coupling is correct for the current implementation — just noting it's an integration-level assertion (if the backend changes how it surfaces guard failures, this test would need updating).

  4. No cleanup of WebGPUContext on early return (line 257): If create_webgpu_context() succeeds but a later run_case crashes (not returns false, but actually crashes), the context leaks. Not a real concern for a test binary, but an RAII wrapper would be more robust. Minor nit.

backends/webgpu/test/ops/sdpa/test_update_cache.py

  1. Export helpers are not tested by unittest.main() (line 195-196): The file has two roles: (a) unittest cases (TestUpdateCache) that verify delegation, and (b) standalone export functions (export_update_cache_cases, export_update_cache_replay, export_update_cache_negative) that produce .pte files for the native test. This dual-purpose pattern is consistent with how the rest of the WebGPU tests work, but there's no indication of how the export functions get invoked (presumably a CI script or Makefile target). Consider adding a brief note or pointer in the module docstring about the invocation path.

  2. VulkanPartitioner usage is intentional and consistent with other WebGPU op tests (e.g., test_add.py). The WebGPU backend reuses Vulkan's FlatBuffer format and partitioner, which is documented in the CMakeLists.txt comments.

  3. _NEGATIVE_CASES comment (line 162): The inline comment # batch must be 1 and # fp32-only are helpful documentation of the guard being tested. Good practice.

backends/webgpu/CMakeLists.txt

  1. Clean integration (lines 128-131): The new test target follows the established add_webgpu_native_test pattern exactly. No issues.

Summary

This is clean, well-organized test code. No bugs or correctness issues found. The test design — deterministic integer-exact scatter with inline golden computation, no external golden files — is a good pattern that avoids flaky tolerance issues while still exercising the full export-to-GPU pipeline. The negative tests are a nice touch for ensuring graceful error handling.

Verdict: Looks good to merge. The only actionable suggestion is (5) — documenting how the export helper functions get called in CI, but that could also live in a follow-up or in the CI script itself.


[ghstack-poisoned]
JulianCloudNTH added a commit that referenced this pull request Jun 8, 2026
Pull Request resolved: #20084

Tests for `llama.update_cache.default`, stacked on the op diff below. `test/ops/sdpa/test_update_cache.py` lowers the op through `VulkanPartitioner` (asserting it delegates to VulkanBackend) and exports per-case `.pte`s; `test/native/test_update_cache.cpp` runs them on-GPU and checks an integer-exact scatter golden against the returned cache. Coverage mirrors the Vulkan KV-cache test (`VulkanSDPATest`): single-shot writes at varied shapes/offsets, plus a multi-step advancing-input_pos replay that threads the returned cache across steps over the same GQA param sets (incl. llama3 head_dim=128). Comparing the cache directly is stronger than Vulkan, which checks it only indirectly via the SDPA output. Authored with assistance from Claude.
ghstack-source-id: 391115052
@exported-using-ghexport

Differential Revision: [D107547307](https://our.internmc.facebook.com/intern/diff/D107547307/)
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants