diff --git a/backends/webgpu/README.md b/backends/webgpu/README.md index c4886bbc64c..0efb7da279c 100644 --- a/backends/webgpu/README.md +++ b/backends/webgpu/README.md @@ -2,7 +2,26 @@ Run ExecuTorch models on the GPU via [WebGPU](https://www.w3.org/TR/webgpu/). The backend compiles delegated subgraphs into WGSL compute shaders executed natively through [wgpu-native](https://github.com/gfx-rs/wgpu-native) (Metal on macOS, Vulkan on Linux/Windows). -> **Status: Prototype.** The backend supports a single operator today and is under active development. See [TODO.md](TODO.md) for the roadmap. +> **Status: Prototype.** The backend supports `add` and `rms_norm` today and is under active development. See [Progress](#progress) for shipped milestones. + +## Progress + +Milestones landed on `main`: + +| Date | Milestone | Pull Request | +|---|---|---| +| 2026-04 | Made it possible to run ExecuTorch models on the GPU through WebGPU — built the backend from the ground up, including the runtime delegate that builds the GPU graph (buffers, pipelines, bind groups) and runs the model on Metal and Vulkan | [#18808](https://github.com/pytorch/executorch/pull/18808) | +| 2026-06 | Grew model support beyond element-wise operators — added the root-mean-square normalization operator (`rms_norm`) and named-data weight loading | [#19963](https://github.com/pytorch/executorch/pull/19963) | +| 2026-06 | Made sure every change is automatically tested — added WebGPU to ExecuTorch's standard backend test suite, running on Linux/x86 in CI | [#19964](https://github.com/pytorch/executorch/pull/19964) | +| 2026-06 | Removed a class of bugs and manual upkeep — the WGSL shaders are now generated automatically, with a build-time check that fails the build on shader/source drift | [#19981](https://github.com/pytorch/executorch/pull/19981) | +| 2026-06 | Got the test suite to actually run work on the GPU — added operator-allowlist delegation (unsupported operations fall back to the CPU) and a process-wide GPU device context, so models execute on the GPU during testing | [#20036](https://github.com/pytorch/executorch/pull/20036) | + +In review: + +| Milestone | Pull Request | +|---|---| +| Makes testing match the WebGPU standard exactly — switches the tests to Google's Dawn shader compiler (Tint, the source-of-truth WGSL implementation) running on SwiftShader for headless GPU execution | [#20079](https://github.com/pytorch/executorch/pull/20079) | +| Strengthens correctness for models that run in several GPU passes — adds dispatch-ordering and scratch-buffer (temporary GPU memory) tests | [#20080](https://github.com/pytorch/executorch/pull/20080) | ## Architecture @@ -36,8 +55,9 @@ Key design choices: | Operator | WGSL Shader | Notes | |---|---|---| | `aten.add.Tensor` | `binary_add.wgsl` | Element-wise with alpha: `out = in1 + alpha * in2` | +| `et_vk.rms_norm.default` | `rms_norm.wgsl` | Root-mean-square normalization | -**Planned:** `sub`, `mul`, `relu`, `linear` (matmul), `softmax`, `layer_norm` +**Planned:** scaled-dot-product attention (KV cache), quantized linear (4-bit weight-only and 8da4w post-training quantization), quantized embedding, RoPE, `mul`, `sigmoid`, and shape ops (`view`, `permute`, `slice`, `select`, `cat`, `squeeze`/`unsqueeze`). ## Quick Start @@ -83,27 +103,37 @@ This runs Python export tests, exports a .pte, builds the native runtime, and va backends/webgpu/ ├── CMakeLists.txt ├── README.md -├── TODO.md ├── runtime/ │ ├── WebGPUBackend.h/cpp # BackendInterface (init/execute) │ ├── WebGPUGraph.h/cpp # GPU graph: buffers, pipelines, dispatch │ ├── WebGPUDelegateHeader.h/cpp # VH00 header parser │ ├── WebGPUDevice.h/cpp # wgpu-native device abstraction +│ ├── WebGPUUtils.h # Workgroup-size helpers │ └── ops/ │ ├── OperatorRegistry.h/cpp # Op dispatch table -│ └── add/ -│ ├── BinaryOp.cpp # aten.add.Tensor implementation -│ ├── binary_add.wgsl # WGSL shader source -│ └── binary_add_wgsl.h # Shader as C++ string constant +│ ├── add/ +│ │ ├── BinaryOp.cpp # aten.add.Tensor implementation +│ │ ├── binary_add.wgsl # WGSL shader source +│ │ └── binary_add_wgsl.h # Shader as C++ string constant +│ └── rms_norm/ +│ ├── RmsNorm.cpp # et_vk.rms_norm implementation +│ ├── rms_norm.wgsl # WGSL shader source +│ └── rms_norm_wgsl.h # Shader as C++ string constant ├── scripts/ -│ └── setup-wgpu-native.sh # Download wgpu-native binaries +│ ├── setup-wgpu-native.sh # Download wgpu-native binaries +│ └── gen_wgsl_headers.py # Generate the embedded *_wgsl.h shader headers └── test/ ├── conftest.py + ├── tester.py # Partitioner stages + supported-op list ├── test_build_webgpu.sh # End-to-end build + test ├── test_webgpu_native.cpp # C++ native test runner - └── ops/ - └── add/ - └── test_add.py # Python export tests + ├── test_wgsl_codegen.py # Shader codegen check + ├── native/ # C++ operator tests + └── ops/ # Python export tests + ├── add/ + │ └── test_add.py # add export tests + └── rms_norm/ + └── test_rms_norm.py # rms_norm export tests ``` ## Requirements