Skip to content

Remove head_mask and attention weights from VideoGPT#536

Open
stashuk-olek wants to merge 2 commits intofacebookresearch:mainfrom
stashuk-olek:export-D92927089
Open

Remove head_mask and attention weights from VideoGPT#536
stashuk-olek wants to merge 2 commits intofacebookresearch:mainfrom
stashuk-olek:export-D92927089

Conversation

@stashuk-olek
Copy link
Copy Markdown

Summary:
Remove dead head_mask, return_attn_weights, and attention_weights from the VideoGPT stack. These features were never used by any consumer — head_mask was always None or all-ones, and return_attn_weights was always False except in tests that verified the feature itself.

This removes:

  • attention_weights field from TransformerDecoderOutput and TransformerLayerOutput NamedTuples
  • head_mask and return_attn_weights params from MultimodalGPT, MultimodalTransformerDecoder, TransformerDecoder, and TransformerDecoderLayer
  • head_mask param from AxialAttention.forward in video_vqvae.py
  • return_attn_weights param from GenerationUtil.sample
  • All head_mask and return_attn_weights usage from tests

Differential Revision: D92927089

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 11, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Feb 11, 2026

@stashuk-olek has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92927089.

stashuk-olek added a commit to stashuk-olek/multimodal that referenced this pull request Feb 11, 2026
…h#536)

Summary:

Remove dead `head_mask`, `return_attn_weights`, and `attention_weights` from the VideoGPT stack. These features were never used by any consumer — `head_mask` was always `None` or all-ones, and `return_attn_weights` was always `False` except in tests that verified the feature itself.

This removes:
- `attention_weights` field from `TransformerDecoderOutput` and `TransformerLayerOutput` NamedTuples
- `head_mask` and `return_attn_weights` params from `MultimodalGPT`, `MultimodalTransformerDecoder`, `TransformerDecoder`, and `TransformerDecoderLayer`
- `head_mask` param from `AxialAttention.forward` in video_vqvae.py
- `return_attn_weights` param from `GenerationUtil.sample`
- All `head_mask` and `return_attn_weights` usage from tests

Differential Revision: D92927089
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Feb 11, 2026

@stashuk-olek has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92927089.

stashuk-olek added a commit to stashuk-olek/multimodal that referenced this pull request Feb 12, 2026
…h#536)

Summary:

Remove dead `head_mask`, `return_attn_weights`, and `attention_weights` from the VideoGPT stack. These features were never used by any consumer — `head_mask` was always `None` or all-ones, and `return_attn_weights` was always `False` except in tests that verified the feature itself.

Differential Revision: D92927089
stashuk-olek added a commit to stashuk-olek/multimodal that referenced this pull request Feb 13, 2026
…h#536)

Summary:

Remove dead `head_mask`, `return_attn_weights`, and `attention_weights` from the VideoGPT stack. These features were never used by any consumer — `head_mask` was always `None` or all-ones, and `return_attn_weights` was always `False` except in tests that verified the feature itself.

Reviewed By: OmarPavel

Differential Revision: D92927089
stashuk-olek added a commit to stashuk-olek/multimodal that referenced this pull request Feb 13, 2026
…h#536)

Summary:

Remove dead `head_mask`, `return_attn_weights`, and `attention_weights` from the VideoGPT stack. These features were never used by any consumer — `head_mask` was always `None` or all-ones, and `return_attn_weights` was always `False` except in tests that verified the feature itself.

Reviewed By: OmarPavel

Differential Revision: D92927089
stashuk-olek added a commit to stashuk-olek/multimodal that referenced this pull request Feb 13, 2026
…h#536)

Summary:

Remove dead `head_mask`, `return_attn_weights`, and `attention_weights` from the VideoGPT stack. These features were never used by any consumer — `head_mask` was always `None` or all-ones, and `return_attn_weights` was always `False` except in tests that verified the feature itself.

Reviewed By: OmarPavel

Differential Revision: D92927089
stashuk-olek added a commit to stashuk-olek/multimodal that referenced this pull request Feb 13, 2026
…h#536)

Summary:

Remove dead `head_mask`, `return_attn_weights`, and `attention_weights` from the VideoGPT stack. These features were never used by any consumer — `head_mask` was always `None` or all-ones, and `return_attn_weights` was always `False` except in tests that verified the feature itself.

Reviewed By: OmarPavel

Differential Revision: D92927089
stashuk-olek added a commit to stashuk-olek/multimodal that referenced this pull request Feb 13, 2026
…h#536)

Summary:

Remove dead `head_mask`, `return_attn_weights`, and `attention_weights` from the VideoGPT stack. These features were never used by any consumer — `head_mask` was always `None` or all-ones, and `return_attn_weights` was always `False` except in tests that verified the feature itself.

Reviewed By: OmarPavel

Differential Revision: D92927089
… weights in FLAVA (facebookresearch#535)

Summary:

The `attentions` field on `TransformerOutput` and `return_attn_weights`/`head_mask` parameters in the FLAVA encoder stack were never used by any consumer. 

This diffs cleans it up. Later the intent is to simplify attention usage / use common API for them.

Reviewed By: OmarPavel

Differential Revision: D92927086
…h#536)

Summary:

Remove dead `head_mask`, `return_attn_weights`, and `attention_weights` from the VideoGPT stack. These features were never used by any consumer — `head_mask` was always `None` or all-ones, and `return_attn_weights` was always `False` except in tests that verified the feature itself.

Reviewed By: OmarPavel

Differential Revision: D92927089
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant