Commit 0a8917e
committed
[ExecuTorch][WebGPU] SDPA test suite: replay + dynamic input_pos + in-graph KV cache
Pull Request resolved: #20087
Adds the WebGPU SDPA test coverage as its own diff, stacked on the SDPA op (which already carries the dynamic-`input_pos` consumption) and the SymInt mechanism below it: multi-step prefill->mt->decode replay, runtime-dynamic `input_pos` (autoregressive decode), and an in-graph mutable KV cache, each compared against a torch `F.scaled_dot_product_attention` golden.
- `test/ops/sdpa/test_sdpa.py`: `ReplaySeq`/`REPLAY_SEQS` + per-step replay export/golden; `DynamicSdpaModule` + `export_dynamic_decode` (one `.pte`, `input_pos` supplied at runtime as a SymInt); `DecodeCacheModule` + `export_incache_decode` (KV cache as `register_buffer` mutable buffers, so the cache persists in-graph and forward() feeds only the new token + `input_pos`).
- `test/test_webgpu_native.cpp`: `test_sdpa_replay`, `test_sdpa_dynamic_decode` (+ negative control: a pinned `input_pos` diverges), `test_sdpa_incache_decode` (+ static control: a fresh Module per step diverges, proving in-graph accumulation is real), `test_symint_roundtrip`, `test_resize_hook`; shared per-element tolerance `sdpa_within_tol` (abs 1e-4 OR rel 1e-3).
- `test/test_build_webgpu.sh`: export the replay / dynamic / in-graph-cache models for the native test.
ghstack-source-id: 391373155
@exported-using-ghexport
Differential Revision: [D107595144](https://our.internmc.facebook.com/intern/diff/D107595144/)1 parent 7c2ed45 commit 0a8917e
3 files changed
Lines changed: 1607 additions & 0 deletions
0 commit comments