Cortex-M: Add depthwise conv2d operator #16233

rascani · 2025-12-12T23:03:52Z

Summary

Add quantized depthwise convolution operator for the Cortex-M backend using CMSIS-NN's optimized arm_depthwise_conv_wrapper_s8 function.

Fixes #16105

Test plan

./backends/cortex_m/test/build_test_runner.sh
pytest --config-file=backends/arm/test/pytest.ini backends/cortex_m/test/ops/test_conv.py

pytorch-bot · 2025-12-12T23:03:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16233

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 3 Unrelated Failures

As of commit 577364c with merge base 662b973 ():

NEW FAILURE - The following job has failed:

pull / test-openvino-linux / linux-job (gh)
RuntimeError: Command docker exec -t c84d535fa942a7921e76048226bf8177403d82afce1fee9c3e17cf78d1ea0c3e /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_8a4w_recipe
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_8a4w_recipe

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / android / run-emulator (gh) (#16137)
Timeout waiting for emulator to boot.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Add quantized depthwise convolution operator for the Cortex-M backend using CMSIS-NN's optimized arm_depthwise_conv_wrapper_s8 function. Key changes: - New op_quantized_depthwise_conv2d.cpp with CMSIS-NN implementation - Python operator registration in operators.py with reference implementation - Operator schema definition in operators.yaml - Updated ConvertToCortexMPass to automatically detect and route depthwise convolutions (where groups == input_channels) to the specialized operator - Comprehensive test coverage with 5 test cases covering different depthwise convolution scenarios (stride, padding, bias, depth multiplier) The implementation validates the depthwise constraint (groups must equal input channels) and supports NHWC layout, int8 quantization, per-channel requantization, and configurable stride/padding/dilation parameters.

…lidations Key changes: - Move depth_multiplier calculation from runtime to AOT pass (eliminates runtime division by computing depth_multiplier = output_channels / input_channels in the graph transformation pass) - Add critical defensive validations in validate_depthwise_conv2d_arguments(): * Validate IHWO weight layout (dimension 0 must be 1) * Validate dilation == 1 (CMSIS-NN constraint) * Validate depth_multiplier consistency with channel counts - Fix CMSIS-NN API usage: * Use arm_depthwise_conv_wrapper_s8_get_buffer_size() with correct parameters * Improve buffer allocation error handling with detailed error messages - Add _compute_depthwise_conv2d_output_shape() to read channels from correct dimension (dim 3 for IHWO layout vs dim 0 for OHWI) - Update operator schema to use depth_multiplier parameter instead of groups This ensures proper validation of CMSIS-NN constraints and moves computation to compile-time where possible.

CMSIS-NN arm_depthwise_conv_wrapper_s8 only supports batch size 1. Add validation in both AOT pass (fail during compilation) and runtime (defensive check). Add 6 test cases covering edge cases: - Combined stride/padding/bias - 1x1 kernels (common in mobile networks) - Higher depth_multiplier (4) - Asymmetric kernels (1x3) - Asymmetric stride/padding - Larger kernels (5x5) Fix depthwise_conv2d_stride test to use batch size 1.

mansnils

Thanks for this @rascani ! It looks good, just a couple of comments.

mansnils · 2025-12-16T09:23:56Z

backends/cortex_m/passes/convert_to_cortex_m_pass.py

+        # Detect depthwise convolution:
+        # PyTorch depthwise weight is [out_ch, 1, H, W] where dimension 1 is 1
+        # and groups == input_channels (groups > 1)
+        is_depthwise = weight_tensor.shape[1] == 1 and groups > 1


I think groups could be 1 for a DW conv?
So a better condition is then, is_depthwise = (in_channels == groups) and (out_channels % in_channels) == 0

mansnils · 2025-12-16T09:30:39Z

backends/cortex_m/passes/convert_to_cortex_m_pass.py

+        # and groups == input_channels (groups > 1)
+        is_depthwise = weight_tensor.shape[1] == 1 and groups > 1
+
+        if is_depthwise:


Here we actually have the benefit of choosing between a regular and DW conv. It is likely but not certain that the un-optimized CMSIS-NN DW conv or the one without any SIMD is less efficient that the corresponding CMSIS-NN conv. We don't know exactly until we measure. We could then add something like this for now with a TODO comment:
optimal_dw_conv_constraints = (
in_channels == out_channels and dilation == [1, 1]
) or in_channels == 1

rascani added the release notes: none Do not include this in the release notes label Dec 12, 2025

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 12, 2025

rascani force-pushed the cmsis_depthwise_conv branch from db21fc0 to 3de2c83 Compare December 15, 2025 19:48

RJ Ascani added 7 commits December 15, 2025 11:49

Fix formatting

bce750c

Use is_channels_last_tensor helper function

4434153

Remove invalid dilation check

fb34949

Shorten error messages

577364c

rascani force-pushed the cmsis_depthwise_conv branch from 3de2c83 to 577364c Compare December 15, 2025 19:49

rascani requested review from AdrianLundell, digantdesai and psiddh December 15, 2025 19:55

rascani marked this pull request as ready for review December 15, 2025 22:05

rascani requested review from kirklandsign and larryliu0820 as code owners December 15, 2025 22:06

mansnils requested changes Dec 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cortex-M: Add depthwise conv2d operator #16233

Cortex-M: Add depthwise conv2d operator #16233

rascani commented Dec 12, 2025

Uh oh!

pytorch-bot bot commented Dec 12, 2025 •

edited

Loading

Uh oh!

mansnils left a comment

Uh oh!

mansnils Dec 16, 2025

Uh oh!

mansnils Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Cortex-M: Add depthwise conv2d operator #16233

Are you sure you want to change the base?

Cortex-M: Add depthwise conv2d operator #16233

Conversation

rascani commented Dec 12, 2025

Summary

Test plan

Uh oh!

pytorch-bot bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16233

❌ 1 New Failure, 3 Unrelated Failures

Uh oh!

mansnils left a comment

Choose a reason for hiding this comment

Uh oh!

mansnils Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

mansnils Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Dec 12, 2025 •

edited

Loading