[ET-VK][conv1d] Implement height-packed depthwise conv1d operator by SS-JIA · Pull Request #18333 · pytorch/executorch

SS-JIA · 2026-03-19T19:15:04Z

Stack from ghstack (oldest at bottom):

Implement a depthwise conv1d operator using height-packed layout where channels
are the packed dimension (WHCN dim 1). Depthwise conv applies a separate filter
to each channel independently (groups=C), so 4 channels can be processed in
parallel using element-wise vec4 FMA over kernel positions.

Thread mapping: X=C/4, Y=L_out, Z=N. Each thread computes one output texel
(4 channels at one spatial position). Inner loop iterates over kernel positions
K with bounds-checked input access for padding.

Weight [C,1,K] is prepacked as channels-packed so each vec4 load gives 4
channels' weights at one kernel position. Supports both buffer and texture3d
storage, fp32/fp16, optional bias, and arbitrary stride/padding/dilation.
Registered as et_vk.conv1d_dw.default (standalone custom op).

Performance on Adreno 750 (S24):

[1,128,4096] K=31 buffer f16: 231 GFLOP/s
[1,128,4096] K=31 buffer f32: 155 GFLOP/s
[1,512,2048] K=5 buffer f32: 66 GFLOP/s

Differential Revision: D97344091

Implement a depthwise conv1d operator using height-packed layout where channels are the packed dimension (WHCN dim 1). Depthwise conv applies a separate filter to each channel independently (groups=C), so 4 channels can be processed in parallel using element-wise vec4 FMA over kernel positions. Thread mapping: X=C/4, Y=L_out, Z=N. Each thread computes one output texel (4 channels at one spatial position). Inner loop iterates over kernel positions K with bounds-checked input access for padding. Weight [C,1,K] is prepacked as channels-packed so each vec4 load gives 4 channels' weights at one kernel position. Supports both buffer and texture3d storage, fp32/fp16, optional bias, and arbitrary stride/padding/dilation. Registered as et_vk.conv1d_dw.default (standalone custom op). Performance on Adreno 750 (S24): - [1,128,4096] K=31 buffer f16: 231 GFLOP/s - [1,128,4096] K=31 buffer f32: 155 GFLOP/s - [1,512,2048] K=5 buffer f32: 66 GFLOP/s Differential Revision: [D97344091](https://our.internmc.facebook.com/intern/diff/D97344091/) [ghstack-poisoned]

pytorch-bot · 2026-03-19T19:15:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18333

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 2d4d76c with merge base 9076110 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / unittest / windows / windows-job (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-03-19T19:17:07Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…perator" Implement a depthwise conv1d operator using height-packed layout where channels are the packed dimension (WHCN dim 1). Depthwise conv applies a separate filter to each channel independently (groups=C), so 4 channels can be processed in parallel using element-wise vec4 FMA over kernel positions. Thread mapping: X=C/4, Y=L_out, Z=N. Each thread computes one output texel (4 channels at one spatial position). Inner loop iterates over kernel positions K with bounds-checked input access for padding. Weight [C,1,K] is prepacked as channels-packed so each vec4 load gives 4 channels' weights at one kernel position. Supports both buffer and texture3d storage, fp32/fp16, optional bias, and arbitrary stride/padding/dilation. Registered as et_vk.conv1d_dw.default (standalone custom op). Performance on Adreno 750 (S24): - [1,128,4096] K=31 buffer f16: 231 GFLOP/s - [1,128,4096] K=31 buffer f32: 155 GFLOP/s - [1,512,2048] K=5 buffer f32: 66 GFLOP/s Differential Revision: [D97344091](https://our.internmc.facebook.com/intern/diff/D97344091/) [ghstack-poisoned]

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 19, 2026

This was referenced Mar 19, 2026

[ET-VK][conv1d] Implement height-packed pointwise conv1d operator #18332

Open

[ET-VK][conv1d] Route conv1d to height-packed implementations in export pipeline #18334

Open

[ET-VK][CI] Add test-vulkan-genai job for Parakeet on NVIDIA GPU runner #18335

Open

meta-codesync bot added fb-exported meta-exported labels Mar 19, 2026

ssjia added 2 commits March 19, 2026 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK][conv1d] Implement height-packed depthwise conv1d operator#18333

[ET-VK][conv1d] Implement height-packed depthwise conv1d operator#18333
SS-JIA wants to merge 3 commits intogh/SS-JIA/495/basefrom
gh/SS-JIA/495/head

SS-JIA commented Mar 19, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SS-JIA commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18333

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

github-actions bot commented Mar 19, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SS-JIA commented Mar 19, 2026 •

edited

Loading

pytorch-bot bot commented Mar 19, 2026 •

edited

Loading

This PR needs a `release notes:` label