VITS sounds drunk on German #2834
Replies: 3 comments 5 replies
-
|
Hi @cschaefer26 , even i have no idea what the problem might be and want to congratulate you for your great forward_taco_melgan model 👏. It's sounding really good. |
Beta Was this translation helpful? Give feedback.
-
|
I am not sure but you can try disabling the blank_token in the config. Might make it more fluent. How large is your dataset? |
Beta Was this translation helpful? Give feedback.
-
|
Here is the train script if it helps: import os from trainer import Trainer, TrainerArgs from TTS.tts.configs.shared_configs import BaseDatasetConfig output_path = os.path.dirname(os.path.abspath(file)) model_args = VitsArgs() config = VitsConfig( ) tokenizer, config = TTSTokenizer.init_from_config(config, IPAPhonemes()) ap = AudioProcessor.init_from_config(config) train_samples, eval_samples = load_tts_samples( model = Vits(config, ap, tokenizer, speaker_manager=None) trainer = Trainer( |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, first of all thanks for your hard work to front the proprietary tts systems, love it. I am currently trying to train some German VITS models with coqui and I am finding that the prosody is really weird, here is a model output after about 200k steps:
Sentence:
Es ist schade, dass die EU als die humanste und moralischste aller Ländergruppierungen angesehen wird, aber sie wollen die Menschenrechte nicht aufrechterhalten und den Magnitsky Act nicht nutzen.
Phonemes:
ɛs ɪst ʃaːdə, das diː eːʔuː als diː humaːnstə ʊnt moʁaːlɪʃstə alɐ lɛndɐɡʁʊpiːʁʊŋən anɡəzeːən vɪʁt, aːbɐ ziː vɔlən diː mɛnʃn̩ʁɛçtə nɪçt aʊfʁɛçtʔɛɐhaltn̩ ʊnt deːn maɡnɪt͡ski ɛkt nɪçt nʊt͡sn̩.
noise=0.8
audio_noise_0.8.mp4
noise=0
audio_noise_0.mp4
For comparison here is our trained ForwardTacotron model (100k steps) + a modified MelGAN:
forward_taco_melgan.mp4
Any idea what could be the problem? I switched off phonemization and use IPAPhonemes as character set, the rest of the config is default. Any help would be appreciated :) - if you need I can of course post tensorboard graphs, configs etc.
Beta Was this translation helpful? Give feedback.
All reactions