Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions pystoi/stoi.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,14 +46,18 @@ def stoi(x, y, fs_sig, extended=False):
IEEE Transactions on Audio, Speech and Language Processing, 2016.
"""
if x.shape != y.shape:
raise Exception('x and y should have the same length,' +
raise Exception('x and y should have the same length,'
'found {} and {}'.format(x.shape, y.shape))

# Resample is fs_sig is different than fs
if fs_sig != FS:
x = utils.resample_oct(x, FS, fs_sig)
y = utils.resample_oct(y, FS, fs_sig)

if min(x.shape[0], y.shape[0]) < N_FRAME:
raise Exception('x and y should at least {} miliseconds long'
.format(int(1000 * float(N_FRAME) / float(FS))))
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you take into account the overlap-add in this calculation?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bound on the input size for the silence removal.
Thinking about it, I think it is actually included in the other check after the removal.


# Remove silent frames
x, y = utils.remove_silent_frames(x, y, DYN_RANGE, N_FRAME, int(N_FRAME/2))

Expand All @@ -65,7 +69,10 @@ def stoi(x, y, fs_sig, extended=False):
if x_spec.shape[-1] < N:
warnings.warn('Not enough STFT frames to compute intermediate '
'intelligibility measure after removing silent '
'frames. Returning 1e-5. Please check you wav files',
'frames. At least {} seconds are required, '
'but only {} were found. '
'Returning 1e-5. Please check you wav files'
.format(((N - 1) * N_FRAME + NFFT) / FS, x.shape[0] / FS),
Comment on lines +72 to +75
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unrelated. It's after removing silent frames, so the minimum audio cannot be known a priori.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means "The size after silent frames removal", I though it was implied by the previous sentence in the comment.

RuntimeWarning)
return 1e-5

Expand Down