@@ -47,6 +47,12 @@ names.
4747 reduced, in practice, the vocoder produces much fewer artifacts.
4848- Lots of quality of life changes: Finetuning your own model from the provided pretrained checkpoints is easier than
4949 ever! Just follow the ` finetuning_example.py ` pipeline.
50+ - We now have the option of choosing between Avocodo and [ BigVGAN] ( https://arxiv.org/abs/2206.04658 ) , which has an improved
51+ generator over HiFiGAN. It is significantly slower on CPU, but the quality is the best I have heard at the time of writing
52+ this. The speed on GPU is fine, just for CPU inference you might want to stick with Avocodo.
53+ - We compile a bunch of quality enhancements from all our previous works so far into one very stable and nice sounding
54+ architecture, which we call ** ToucanTTS** . We submitted a system based on this architecture to the Blizzard Challenge 2023,
55+ you can try out our system [ speaking French here] ( https://huggingface.co/spaces/Flux9665/Blizzard2023IMS ) .
5056
5157### 2022
5258
@@ -149,16 +155,16 @@ appropriate names.
149155
150156## Creating a new Pipeline 🦆
151157
152- ### Build an Avocodo Pipeline
158+ ### Build an Avocodo/BigVGAN Pipeline
153159
154160This should not be necessary, because we provide a pretrained model and one of the key benefits of vocoders in general
155161is how incredibly speaker independent they are. But in case you want to train your own anyway, here are the
156162instructions: You will need a function to return the list of all the absolute paths to each of
157- the audio files in your dataset as strings. If you already have a * path_to_transcript_dict* of your data for FastSpeech
158- 2 training, you can simply take the keys of the dict and transform them into a list.
163+ the audio files in your dataset as strings. If you already have a * path_to_transcript_dict* of your data for ToucanTTS training,
164+ you can simply take the keys of the dict and transform them into a list.
159165
160166Then go to the directory
161- * TrainingInterfaces/TrainingPipelines* . In there, make a copy of any existing pipeline that has Avocodo in its name. We
167+ * TrainingInterfaces/TrainingPipelines* . In there, make a copy of any existing pipeline that has Avocodo or BigVGAN in its name. We
162168will use this as reference and only make the necessary changes to use the new dataset. Look out for a variable called
163169* model_save_dir* . This is the default directory that checkpoints will be saved into, unless you specify another one when
164170calling the training script. Change it to whatever you like. Then pass the list of paths to the instanciation of the
@@ -173,7 +179,8 @@ Now you need to add your newly created pipeline to the pipeline dictionary in th
173179
174180What we call ToucanTTS is actually mostly FastSpeech 2, but with a couple of changes, such as the normalizing flow
175181based PostNet that was introduced in PortaSpeech. We found the VAE used in PortaSpeech too unstable for low-resource
176- cases, so we continue experimenting with those in experimental branches of the toolkit.
182+ cases, so we continue experimenting with those in experimental branches of the toolkit. There are a bunch of other
183+ changes that mostly relate to low-resource scenarios. For more info, have a look at the ToucanTTS docstring.
177184
178185In the directory called
179186* Utility* there is a file called
0 commit comments