Skip to content

Conversation

@catcor01
Copy link
Contributor

No description provided.

@catcor01 catcor01 force-pushed the multihead_attention branch 2 times, most recently from a98526f to cf45a2e Compare November 25, 2025 08:59
Copy link
Collaborator

@Lallapallooza Lallapallooza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for patch, few comments.

- Legalize Torch scaled_dot_product_attention into TOSA by adding the necessary patterns
  in TorchToTosa.cpp plus backend type-conversion hooks.
- Introduce a detailed decomposition path for multi-head attention within DecomposeComplexOps.cpp,
  preparing inputs for TOSA lowering.
- Expands the PT1 e2e suite with a dedicated multi-head attention MLIR/Python test and
  drop the corresponding xfails now that the path works.

Signed-off-by: Cathal Corbett <cathal.corbett@arm.com>
Change-Id: I96c17aefd25b979f1cf6e897d91d5a29f0a2fa85
PatternRewriter &rewriter);

namespace {
// Decompose scaled dot product attention into matmul/softmax pipeline when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this decomposition producing any different IR compared to leveraging the decomposition of sdpa with ExportedProgram.run_decompositions https://docs.pytorch.org/docs/stable/export.html#export-ir-decompositions -- see https://discord.com/channels/636084430946959380/742573221882364009/1446121930922004623 for reference.

I am wondering if the sdpa op should be added to the default decomposition list in

DEFAULT_DECOMPOSITIONS = [
instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants