Improved Controllable Multilingual
This release extends the toolkits functionality and provides new checkpoints.
- new sampling rate for the vocoder: Using 24kHz instead of 48kHz lowers the theoretical upper bound for quality, but produces fewer artifacts in practice.
- flow based postnet from portaspeech is included in the new TTS model which brings cleaner results at basically no expense
- new controllability options through artificial speaker generation in a lower dimensional space with a better embedding function
- quality of life changes, such as an integrated finetuning example and an arbiter for the train loops to be used and vocoder finetuning (although that should really not be necessary)
- divese bugfixes and speed increases
This release breaks backwards compatibility, please download the new models or stick to a prior release if you rely on your old models.
Future releaes will include one more change to the vocoder used (BigVGAN generator) and lots of changes to scale up the multi-lingual capabilities of a single model.