Support block_ptr/TensorDescriptor with extra_mask for loads by hinriksnaer · Pull Request #1768 · pytorch/helion

hinriksnaer · 2026-03-20T17:18:07Z

Addresses #97

This is definitely somewhat of a new area for me and changes are based on my interpretation of the #97 description. Looking for some valuable feedback here to ensure there isn't something I missed or should approach from a different angle.

Approach

This enables block_ptr and TensorDescriptor support for hl.load(..., extra_mask=...). Previously, this would fallback to pointer indexing but this approach decomposes the mask into a separate epilogue (along with aten.view for cases where rank differs).

Before:

%load = call_function[target=hl.load](%x, [%tile_m, %tile_n, %tile_k], extra_mask=%mask)
%store = call_function[target=hl.store](%out, ..., %load)

After:

%load = call_function[target=hl.load](%x, [%tile_m, %tile_n, %tile_k], extra_mask=None)
%where = call_function[target=aten.where.ScalarOther](%mask, %load, 0)
%store = call_function[target=hl.store](%out, ..., %where)

For non-matching mask rank

Before:

%load = call_function[target=hl.load](%x, [%tile_m, %tile_n, %tile_k], extra_mask=%mask)
%store = call_function[target=hl.store](%out, ..., %load)

After:

%load = call_function[target=hl.load](%x, [%tile_m, %tile_n, %tile_k], extra_mask=None)
%view = call_function[target=aten.view](%mask, [block_M, 1, 1])
%where = call_function[target=aten.where.ScalarOther](%view, %load, 0)
%store = call_function[target=hl.store](%out, ..., %where)

jansel

I think if we always decompose these it will be a performance regression for non-block ptrs.

I also worry this will interact badly with the existing mask_to op and optimizations to remove masks. (We propagate masking information.)

Maybe we should also do this pass at codegen time and only if block pointers are chosen for the given op.

hinriksnaer · 2026-03-20T19:23:25Z

These are all very good points and I appreciate the broader context.

Sounds to me like next steps in the right direction would be:

Remove the decomposition in the fx graph
move the functionality to the generated triton code if TensorDescriptor / block ptrs are selected
modify the is_supported to not reject masked loads with non pointer indexing strategy
update tests

anything you would like to add?

hinriksnaer · 2026-03-23T16:47:25Z

Made changes based on your feedback @jansel.

Assuming this looks good, would you want me to add support to lower rank masks in a future PR? e.g.

out[tile_m, tile_n] = hl.load(
    x, [tile_m, tile_n], extra_mask=row_mask[tile_m]
)

hinriksnaer · 2026-03-25T14:34:53Z

@jansel all tests are passing.

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 20, 2026

added advanced indexing support for load

09ecc7c

hinriksnaer force-pushed the extra-load-mask branch from 5f730cf to 09ecc7c Compare March 20, 2026 17:20

hinriksnaer added 2 commits March 20, 2026 14:07

Merge branch 'main' into extra-load-mask

93e852a

Merge branch 'main' into extra-load-mask

5a2f30e

jansel requested changes Mar 20, 2026

View reviewed changes

hinriksnaer added 2 commits March 20, 2026 15:03

Merge branch 'main' into extra-load-mask

4daf95f

Merge branch 'main' into extra-load-mask

0c9bdda

hinriksnaer added 3 commits March 22, 2026 20:02

Merge branch 'main' into extra-load-mask

cb6f9b3

Merge branch 'main' into extra-load-mask

2ad3e8d

moved functionality to codegen

1cfcf83

hinriksnaer requested a review from jansel March 23, 2026 16:42

hinriksnaer added 8 commits March 23, 2026 19:43

removed tileIR test

c9092bc

Merge branch 'main' into extra-load-mask

cb3e4f0

Merge branch 'main' into extra-load-mask

094722b

Merge branch 'main' into extra-load-mask

d368ff4

Merge branch 'main' into extra-load-mask

4248afa

Merge branch 'main' into extra-load-mask

e748ebf

Merge branch 'main' into extra-load-mask

56e0e42

Merge branch 'main' into extra-load-mask

841a7cb

jansel approved these changes Mar 26, 2026

View reviewed changes

jansel merged commit 733351e into pytorch:main Mar 26, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support block_ptr/TensorDescriptor with extra_mask for loads#1768

Support block_ptr/TensorDescriptor with extra_mask for loads#1768
jansel merged 16 commits intopytorch:mainfrom
hinriksnaer:extra-load-mask

hinriksnaer commented Mar 20, 2026

Uh oh!

jansel left a comment

Uh oh!

hinriksnaer commented Mar 20, 2026 •

edited

Loading

Uh oh!

hinriksnaer commented Mar 23, 2026

Uh oh!

hinriksnaer commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hinriksnaer commented Mar 20, 2026

Approach

Uh oh!

jansel left a comment

Choose a reason for hiding this comment

Uh oh!

hinriksnaer commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hinriksnaer commented Mar 23, 2026

Uh oh!

hinriksnaer commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hinriksnaer commented Mar 20, 2026 •

edited

Loading