-
Notifications
You must be signed in to change notification settings - Fork 32.7k
Labels
Description
System Info
transformersversion: 5.0.0- Platform: Linux-6.6.113+-x86_64-with-glibc2.35
- Python version: 3.12.13
- Huggingface_hub version: 1.7.1
- Safetensors version: 0.7.0
- Accelerate version: 1.13.0
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.10.0+cpu (NA)
- Using distributed or parallel set-up in script?: No
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
| def _get_feat_extract_output_lengths(input_lengths): |
The implementation of above function computing the output length of the audio encoder does not align with the official formula of pytorch Conv2d.
The audio encoder convolution is defined in
| self.conv2d1 = nn.Conv2d(1, config.downsample_hidden_size, 3, 2, padding=1) |
Expected behavior
Current implementation is
def _get_feat_extract_output_lengths(input_lengths):
"""
Computes the output length of the convolutional layers and the output length of the audio encoder
"""
input_lengths_leave = input_lengths % 100
feat_lengths = (input_lengths_leave - 1) // 2 + 1
output_lengths = ((feat_lengths - 1) // 2 + 1 - 1) // 2 + 1 + (input_lengths // 100) * 13
return output_lengthsand the expected implementation is
def _get_feat_extract_output_lengths(input_lengths):
"""
Computes the output length of the convolutional layers and the output length of the audio encoder
"""
feat_lengths = (input_lengths- 1) // 2 + 1
output_lengths = ((feat_lengths - 1) // 2 + 1 - 1) // 2 + 1
return output_lengths
Reactions are currently unavailable