IDEA contains lots of samples of non-native English speech from various dialects/speaker backgrounds. Some of these have IPA transcriptions. The quality of annotations differ and the notation conventions are not entirely consistent. Would be great to have a phonologist go through and determine which samples if any we can use, and what work is needed to process and use them for training. Then we can buy training rights for that subset from IDEA.
IDEA contains lots of samples of non-native English speech from various dialects/speaker backgrounds. Some of these have IPA transcriptions. The quality of annotations differ and the notation conventions are not entirely consistent. Would be great to have a phonologist go through and determine which samples if any we can use, and what work is needed to process and use them for training. Then we can buy training rights for that subset from IDEA.