Skip to content

Unintelligible Voice after Training Malagasy Corpus #239

@Tiana-Andria

Description

@Tiana-Andria

Hello,
I have been working on training a FastSpeech2 model for the Malagasy language and encountered issues with the output quality. The synthesized voice is unintelligible despite successfully completing the training process. Below is an outline of the steps I've taken and the model configuration.

Steps Taken:

  • Created a corpus of Malagasy (~19 hours of audio).
  • Aligned the data using the Montreal Forced Aligner (MFA).
  • Used a custom text cleaner for the Malagasy language.
  • Ran the prepare_align and preprocess steps successfully.
  • Modified the pinyin.py and cmudict.py files to add Malagasy phonemes.
  • Trained the model for 21,000 steps.

Using HiFi-GAN as the vocoder with the universal speaker setting.
Configured pitch and energy features at the phoneme level with normalization set to true.

Pitch Losses ranged from 1.1 to 5.17.
Energy Losses ranged from 0.55 to 0.9.

Could the unintelligibility be caused by high pitch loss during training? If so, what would be the best way to address this in terms of configuration or data preparation?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions