Unexpected behaviour of helper function `_get_feat_extract_output_lengths` in qwen3_omni_moe

### System Info

- `transformers` version: 5.0.0
- Platform: Linux-6.6.113+-x86_64-with-glibc2.35
- Python version: 3.12.13
- Huggingface_hub version: 1.7.1
- Safetensors version: 0.7.0
- Accelerate version: 1.13.0
- Accelerate config: 	not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.10.0+cpu (NA)
- Using distributed or parallel set-up in script?: No

### Who can help?

_No response_

### Information

- [x] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction

https://github.com/huggingface/transformers/blob/9a9997fd73c5eb29fb3677d3c489f5d3cd0765f6/src/transformers/models/qwen3_omni_moe/modular_qwen3_omni_moe.py#L117
The implementation of above function computing the output length of the audio encoder does not align with the official formula of pytorch [Conv2d](https://docs.pytorch.org/docs/stable/generated/torch.nn.Conv2d.html). 
The audio encoder convolution is defined in 
https://github.com/huggingface/transformers/blob/9a9997fd73c5eb29fb3677d3c489f5d3cd0765f6/src/transformers/models/qwen3_omni_moe/modular_qwen3_omni_moe.py#L871

### Expected behavior

Current implementation is 
```python
def _get_feat_extract_output_lengths(input_lengths):
    """
    Computes the output length of the convolutional layers and the output length of the audio encoder
    """

    input_lengths_leave = input_lengths % 100
    feat_lengths = (input_lengths_leave - 1) // 2 + 1
    output_lengths = ((feat_lengths - 1) // 2 + 1 - 1) // 2 + 1 + (input_lengths // 100) * 13
    return output_lengths
```
and the expected implementation is 
```python
def _get_feat_extract_output_lengths(input_lengths):
    """
    Computes the output length of the convolutional layers and the output length of the audio encoder
    """

    feat_lengths = (input_lengths- 1) // 2 + 1
    output_lengths = ((feat_lengths - 1) // 2 + 1 - 1) // 2 + 1 
    return output_lengths
```
<img width="571" height="455" alt="Image" src="https://github.com/user-attachments/assets/f85260b3-d698-459a-a5ec-26faf85d899b" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected behaviour of helper function `_get_feat_extract_output_lengths` in qwen3_omni_moe #45083

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unexpected behaviour of helper function _get_feat_extract_output_lengths in qwen3_omni_moe #45083

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Unexpected behaviour of helper function `_get_feat_extract_output_lengths` in qwen3_omni_moe #45083