Hey,
when running utmosv2 with the same data, the results vary strongly between runs. According to their website https://github.com/sarulab-speech/UTMOSv2/blob/main/docs/inference.md they do some random selection of frames to compute the metric on. To get more reliable results, the number of folds could be increased.
I suggest adding a parameter for UTMOSv2 that can be used to increase the number of folds in versa. I attached the scores for 4 runs on the same data. The x-axis shows the file-index (same between runs). The files are sorted according to the utmosv2 value of run 1.
Best,
Wolfgang

Hey,
when running utmosv2 with the same data, the results vary strongly between runs. According to their website https://github.com/sarulab-speech/UTMOSv2/blob/main/docs/inference.md they do some random selection of frames to compute the metric on. To get more reliable results, the number of folds could be increased.
I suggest adding a parameter for UTMOSv2 that can be used to increase the number of folds in versa. I attached the scores for 4 runs on the same data. The x-axis shows the file-index (same between runs). The files are sorted according to the utmosv2 value of run 1.
Best,
Wolfgang