Skip to content

Improved Controllable Multilingual

Choose a tag to compare

@Flux9665 Flux9665 released this 22 Feb 17:08
· 7 commits to ImprovedControllableMultilingual since this release

This release extends the toolkits functionality and provides new checkpoints.

  • new sampling rate for the vocoder: Using 24kHz instead of 48kHz lowers the theoretical upper bound for quality, but produces fewer artifacts in practice.
  • flow based postnet from portaspeech is included in the new TTS model which brings cleaner results at basically no expense
  • new controllability options through artificial speaker generation in a lower dimensional space with a better embedding function
  • quality of life changes, such as an integrated finetuning example and an arbiter for the train loops to be used and vocoder finetuning (although that should really not be necessary)
  • divese bugfixes and speed increases

This release breaks backwards compatibility, please download the new models or stick to a prior release if you rely on your old models.

Future releaes will include one more change to the vocoder used (BigVGAN generator) and lots of changes to scale up the multi-lingual capabilities of a single model.