Disable Sage Attention sm90 backend due to confetti/noisy output #12785

arrdel · 2025-12-03T17:16:53Z

What does this PR do?

This PR temporarily disables the Sage Attention sm90 backend which is causing confetti/noisy output on SM 9.0+ (Hopper) GPUs.

The Problem

The _SAGE_QK_INT8_PV_FP8_CUDA_SM90 backend was automatically being selected on SM 9.0+ GPUs (Hopper architecture) due to the constraint:

constraints=[_check_device_cuda_atleast_smXY(9, 0), _check_shape]

However, this backend is producing incorrect output (described as "confetti" or "noisy" output), indicating a bug in the underlying sageattention library's sm90 implementation.

The Solution

Temporarily disabled the sm90 backend by commenting out its registration:

Users on SM 9.0+ GPUs will now fall back to the standard Sage Attention backends
Added a comment referencing issue Sage Attention sm90 causes confetti/noisy output #12783 for future reference
This is a temporary workaround until the upstream sageattention library fixes the sm90 implementation

Impact

✅ Fixes the confetti/noisy output issue on Hopper GPUs
✅ Users can still use other Sage Attention backends
✅ No breaking changes for users not on SM 9.0+ devices
⚠️ SM 9.0+ users won't get sm90-specific optimizations until upstream fixes the bug

Future Work

This backend can be re-enabled once the sageattention library fixes the sm90 implementation bug.

The _SAGE_QK_INT8_PV_FP8_CUDA_SM90 backend is causing confetti/noisy output on SM 9.0+ GPUs. Temporarily disabling this backend by commenting out its registration until the upstream sageattention library fixes the issue. Fixes huggingface#12783

Copilot

Pull request overview

This PR temporarily disables the Sage Attention sm90 backend to fix confetti/noisy output issues on SM 9.0+ (Hopper) GPUs by commenting out the backend registration and implementation.

Key Changes:

Commented out the _sage_qk_int8_pv_fp8_cuda_sm90_attention function registration to prevent automatic selection on Hopper GPUs
Added a comment referencing issue #12783 to track re-enabling when upstream fixes the bug

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-03T17:25:11Z

src/diffusers/models/attention_dispatch.py

+#     AttentionBackendName._SAGE_QK_INT8_PV_FP8_CUDA_SM90,
+#     constraints=[_check_device_cuda_atleast_smXY(9, 0), _check_shape],
+# )
+# def _sage_qk_int8_pv_fp8_cuda_sm90_attention(
+#     query: torch.Tensor,
+#     key: torch.Tensor,
+#     value: torch.Tensor,
+#     is_causal: bool = False,
+#     scale: Optional[float] = None,
+#     return_lse: bool = False,
+#     _parallel_config: Optional["ParallelConfig"] = None,
+# ) -> torch.Tensor:
+#     return sageattn_qk_int8_pv_fp8_cuda_sm90(
+#         q=query,
+#         k=key,
+#         v=value,
+#         tensor_layout="NHD",
+#         is_causal=is_causal,
+#         sm_scale=scale,
+#         return_lse=return_lse,
+#     )


Commenting out code is not the recommended approach for temporarily disabling functionality. This creates several issues:

The AttentionBackendName._SAGE_QK_INT8_PV_FP8_CUDA_SM90 enum member still exists (line 195) but now has no registered implementation

The backend is still referenced in _check_attention_backend_requirements() (line 459) which could cause confusion

Commented code creates maintenance burden and can become stale

Recommended approach:
Instead of commenting out the function, keep it registered but add a runtime check that raises a clear error or logs a warning and falls back to another backend. For example:

@_AttentionBackendRegistry.register( AttentionBackendName._SAGE_QK_INT8_PV_FP8_CUDA_SM90, constraints=[_check_device_cuda_atleast_smXY(9, 0), _check_shape], ) def _sage_qk_int8_pv_fp8_cuda_sm90_attention( query: torch.Tensor, key: torch.Tensor, value: torch.Tensor, is_causal: bool = False, scale: Optional[float] = None, return_lse: bool = False, _parallel_config: Optional["ParallelConfig"] = None, ) -> torch.Tensor: # Temporarily disabled due to issue #12783 - sm90 backend causes confetti/noisy output raise NotImplementedError( "The sm90 backend for Sage Attention is temporarily disabled due to producing " "incorrect output (see issue #12783). Please use a different attention backend." )

Alternatively, if you want automatic fallback, modify the constraints to never match, or add a feature flag to control this behavior.

Suggested change

# AttentionBackendName._SAGE_QK_INT8_PV_FP8_CUDA_SM90,

# constraints=[_check_device_cuda_atleast_smXY(9, 0), _check_shape],

# )

# def _sage_qk_int8_pv_fp8_cuda_sm90_attention(

# query: torch.Tensor,

# key: torch.Tensor,

# value: torch.Tensor,

# is_causal: bool = False,

# scale: Optional[float] = None,

# return_lse: bool = False,

# _parallel_config: Optional["ParallelConfig"] = None,

# ) -> torch.Tensor:

# return sageattn_qk_int8_pv_fp8_cuda_sm90(

# q=query,

# k=key,

# v=value,

# tensor_layout="NHD",

# is_causal=is_causal,

# sm_scale=scale,

# return_lse=return_lse,

# )

@_AttentionBackendRegistry.register(

AttentionBackendName._SAGE_QK_INT8_PV_FP8_CUDA_SM90,

constraints=[_check_device_cuda_atleast_smXY(9, 0), _check_shape],

)

def _sage_qk_int8_pv_fp8_cuda_sm90_attention(

query: torch.Tensor,

key: torch.Tensor,

value: torch.Tensor,

is_causal: bool = False,

scale: Optional[float] = None,

return_lse: bool = False,

_parallel_config: Optional["ParallelConfig"] = None,

) -> torch.Tensor:

# Temporarily disabled due to issue #12783 - sm90 backend causes confetti/noisy output

raise NotImplementedError(

"The sm90 backend for Sage Attention is temporarily disabled due to producing "

"incorrect output (see issue #12783). Please use a different attention backend."

)

Copilot AI review requested due to automatic review settings December 3, 2025 17:16

Copilot started reviewing on behalf of arrdel December 3, 2025 17:17 View session

Copilot finished reviewing on behalf of arrdel December 3, 2025 17:18

Copilot AI reviewed Dec 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Disable Sage Attention sm90 backend due to confetti/noisy output #12785

Disable Sage Attention sm90 backend due to confetti/noisy output #12785

arrdel commented Dec 3, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Disable Sage Attention sm90 backend due to confetti/noisy output #12785

Are you sure you want to change the base?

Disable Sage Attention sm90 backend due to confetti/noisy output #12785

Conversation

arrdel commented Dec 3, 2025

What does this PR do?

The Problem

The Solution

Impact

Future Work

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant