Skip to content

feat: add attention mask support for padded token sequences#115

Open
richardk53 wants to merge 1 commit intomainfrom
feature-mixed-attention-attention-mask
Open

feat: add attention mask support for padded token sequences#115
richardk53 wants to merge 1 commit intomainfrom
feature-mixed-attention-attention-mask

Conversation

@richardk53
Copy link

@richardk53 richardk53 commented Mar 10, 2026

The mixed attention implementation currently does not support variable length sequences because attention masks are not supported. This PR adds support for masking that allows for padding sequences.

The main use-case is to batch multiple anchor token sequences of different lengths to sequences of the same length by padding them.

The mask is for keys only and broadcasted over queries, preventing attention to tokens that are not valid. For anchor attention, where anchors produce keys and values, this means no anchor or query can attend to anchors that are masked out.

I have read the CLA Document and I hereby sign the CLA

@github-actions
Copy link

github-actions bot commented Mar 10, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@github-actions
Copy link

Coverage

Tests Skipped Failures Errors Time
1101 21 💤 0 ❌ 0 🔥 27.430s ⏱️

@richardk53 richardk53 changed the title Add support for attention masks indicating valid versus padded tokens feat: add attention mask support for padded token sequences Mar 10, 2026
@richardk53
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

@richardk53
Copy link
Author

recheck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant