mismatch max_seqlen in _flash_attn_varlen_forward

Hi, thanks for your great work.

I noticed in the `MixedAttention` function that the following code first computes the query (`q`) and its interactions within the corresponding chunk. 
```python
# self attn
_, _, _, _, self_attn_out_sh, self_attn_lse_hs, _, _ = (
    _flash_attn_varlen_forward(
        q=q,
        k=k,
        v=v,
        cu_seqlens_q=self_attn_cu_seqlen,
        cu_seqlens_k=self_attn_cu_seqlen,
        max_seqlen_q=max_seqlen,
        max_seqlen_k=max_seqlen,
        softmax_scale=softmax_scale,
        causal=True,
        dropout_p=0.0,
    )
)
```

**However, the `max_seqlen` is clearly larger than the maximum value in `self_attn_cu_seqlen`.**
https://github.com/MoonshotAI/MoBA/blob/b5d58363311d3ca946f1ec444182727c15e338b5/moba/moba_efficient.py#L96

**I would like to know if this leads to any potential issues, such as reduced computational efficiency or unintended behavior in the attention computation?**

@hewr2010 @whitelez @xptree 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mismatch max_seqlen in _flash_attn_varlen_forward #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

mismatch max_seqlen in _flash_attn_varlen_forward #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions