Support per-op partitioning in XNNPACK delegate for NHWC ops

### 🚀 The feature, motivation and pitch

We currently support per-op partitioning in the XNNPACK delegate, which allows for all activation tensor memory to be owned by ExecuTorch and thus overlapped with other ExecuTorch-owned activation memory. However, this isn't currently practical for ops that run in channels-last (NHWC) dim order. This is because the delegate currently assumes that tensors that are inputs or outputs to the partition are always channels first (NCHW) and thus inserts dim order conversions around every op. This is perf issue, but more importantly, it means that XNNPACK ends up owning some of the activation memory.

Ideally, we can leverage the recent dim order support in the core runtime to let the framework manage the dim order conversion, at least in single-op mode. How this interacts with partitioning is not entirely clear, since this would have to happen after partitioning. For initial purposes, it's likely fine to let the dim order conversions not be delegated. This likely needs a bit more design discussion, but it is a high ROI feature and may be necessary for memory parity with LI in some cases, even with workspace sharing.

### Alternatives

_No response_

### Additional context

_No response_

### RFC (Optional)

_No response_

cc @digantdesai @mcr229

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support per-op partitioning in XNNPACK delegate for NHWC ops #8265

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support per-op partitioning in XNNPACK delegate for NHWC ops #8265

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions