Efficient Long-Context Modeling in Diffusion Language Models
π CVPR 2026 (Findings)
- π Up to 6.95Γ speedup over FlashAttention
- β‘ Training-free sparse attention (no finetuning required)
- π§ Maintains near full-attention performance at 50% sparsity
- π₯ Strong generalization across language, multimodal, and video generation
We propose Block Approximate Sparse Attention (BA-Att), a training-free block-sparse attention framework for Diffusion Language Models (DLMs).
Unlike prior works relying on fixed patterns, BA-Att:
- Performs selection in downsampled space
- Uses norm-based ranking to reduce approximation error
- Applies covariance compensation for accuracy recovery
π§ Code is coming soon!
We are currently cleaning and organizing the codebase.
