From c6eeb06321a42a26cc5444b3f9977e3d8af1d0d0 Mon Sep 17 00:00:00 2001 From: Erjan Kalybek Date: Tue, 18 Jun 2019 07:23:28 +0200 Subject: [PATCH] Fix typo --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index f281ffe..ebc552c 100644 --- a/README.md +++ b/README.md @@ -105,7 +105,7 @@ You may have to change tfr_dir and model_dir to work on your settings. - For fp16 settings, you need 1 week to train 1M steps with 4 V100 GPUs. - I haven't tried fp32 training, so there might be some issues to train high quality models. - As fp16 training is not robust enough (at now), I usually train FiLM enabled model and unabled model consequently and choose one which survives. -- For a single speaker dataset(LJ Speech dataset), trained model vocoding quality is good enough compared to mel-spectrogram condtioned one. +- For a single speaker dataset(LJ Speech dataset), trained model vocoding quality is good enough compared to mel-spectrogram conditioned one. - For multi-speaker dataset(VCTK Corpus), disentangling between speaker identity and local condition does not work well (at now). I am investigating reasons though. - The next step would be training Text-to-LatentCodes model(as Transformer) so that fully TTS is possible. - If you're interested in this project, please improve models with me!