Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 41 additions & 11 deletions backends/webgpu/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,26 @@

Run ExecuTorch models on the GPU via [WebGPU](https://www.w3.org/TR/webgpu/). The backend compiles delegated subgraphs into WGSL compute shaders executed natively through [wgpu-native](https://github.com/gfx-rs/wgpu-native) (Metal on macOS, Vulkan on Linux/Windows).

> **Status: Prototype.** The backend supports a single operator today and is under active development. See [TODO.md](TODO.md) for the roadmap.
> **Status: Prototype.** The backend supports `add` and `rms_norm` today and is under active development. See [Progress](#progress) for shipped milestones.

## Progress

Milestones landed on `main`:

| Date | Milestone | Pull Request |
|---|---|---|
| 2026-04 | Made it possible to run ExecuTorch models on the GPU through WebGPU — built the backend from the ground up, including the runtime delegate that builds the GPU graph (buffers, pipelines, bind groups) and runs the model on Metal and Vulkan | [#18808](https://github.com/pytorch/executorch/pull/18808) |
| 2026-06 | Grew model support beyond element-wise operators — added the root-mean-square normalization operator (`rms_norm`) and named-data weight loading | [#19963](https://github.com/pytorch/executorch/pull/19963) |
| 2026-06 | Made sure every change is automatically tested — added WebGPU to ExecuTorch's standard backend test suite, running on Linux/x86 in CI | [#19964](https://github.com/pytorch/executorch/pull/19964) |
| 2026-06 | Removed a class of bugs and manual upkeep — the WGSL shaders are now generated automatically, with a build-time check that fails the build on shader/source drift | [#19981](https://github.com/pytorch/executorch/pull/19981) |
| 2026-06 | Got the test suite to actually run work on the GPU — added operator-allowlist delegation (unsupported operations fall back to the CPU) and a process-wide GPU device context, so models execute on the GPU during testing | [#20036](https://github.com/pytorch/executorch/pull/20036) |

In review:

| Milestone | Pull Request |
|---|---|
| Makes testing match the WebGPU standard exactly — switches the tests to Google's Dawn shader compiler (Tint, the source-of-truth WGSL implementation) running on SwiftShader for headless GPU execution | [#20079](https://github.com/pytorch/executorch/pull/20079) |
| Strengthens correctness for models that run in several GPU passes — adds dispatch-ordering and scratch-buffer (temporary GPU memory) tests | [#20080](https://github.com/pytorch/executorch/pull/20080) |

## Architecture

Expand Down Expand Up @@ -36,8 +55,9 @@ Key design choices:
| Operator | WGSL Shader | Notes |
|---|---|---|
| `aten.add.Tensor` | `binary_add.wgsl` | Element-wise with alpha: `out = in1 + alpha * in2` |
| `et_vk.rms_norm.default` | `rms_norm.wgsl` | Root-mean-square normalization |

**Planned:** `sub`, `mul`, `relu`, `linear` (matmul), `softmax`, `layer_norm`
**Planned:** scaled-dot-product attention (KV cache), quantized linear (4-bit weight-only and 8da4w post-training quantization), quantized embedding, RoPE, `mul`, `sigmoid`, and shape ops (`view`, `permute`, `slice`, `select`, `cat`, `squeeze`/`unsqueeze`).

## Quick Start

Expand Down Expand Up @@ -83,27 +103,37 @@ This runs Python export tests, exports a .pte, builds the native runtime, and va
backends/webgpu/
├── CMakeLists.txt
├── README.md
├── TODO.md
├── runtime/
│ ├── WebGPUBackend.h/cpp # BackendInterface (init/execute)
│ ├── WebGPUGraph.h/cpp # GPU graph: buffers, pipelines, dispatch
│ ├── WebGPUDelegateHeader.h/cpp # VH00 header parser
│ ├── WebGPUDevice.h/cpp # wgpu-native device abstraction
│ ├── WebGPUUtils.h # Workgroup-size helpers
│ └── ops/
│ ├── OperatorRegistry.h/cpp # Op dispatch table
│ └── add/
│ ├── BinaryOp.cpp # aten.add.Tensor implementation
│ ├── binary_add.wgsl # WGSL shader source
│ └── binary_add_wgsl.h # Shader as C++ string constant
│ ├── add/
│ │ ├── BinaryOp.cpp # aten.add.Tensor implementation
│ │ ├── binary_add.wgsl # WGSL shader source
│ │ └── binary_add_wgsl.h # Shader as C++ string constant
│ └── rms_norm/
│ ├── RmsNorm.cpp # et_vk.rms_norm implementation
│ ├── rms_norm.wgsl # WGSL shader source
│ └── rms_norm_wgsl.h # Shader as C++ string constant
├── scripts/
│ └── setup-wgpu-native.sh # Download wgpu-native binaries
│ ├── setup-wgpu-native.sh # Download wgpu-native binaries
│ └── gen_wgsl_headers.py # Generate the embedded *_wgsl.h shader headers
└── test/
├── conftest.py
├── tester.py # Partitioner stages + supported-op list
├── test_build_webgpu.sh # End-to-end build + test
├── test_webgpu_native.cpp # C++ native test runner
└── ops/
└── add/
└── test_add.py # Python export tests
├── test_wgsl_codegen.py # Shader codegen check
├── native/ # C++ operator tests
└── ops/ # Python export tests
├── add/
│ └── test_add.py # add export tests
└── rms_norm/
└── test_rms_norm.py # rms_norm export tests
```

## Requirements
Expand Down
Loading