XTTSv2 Finetuning Guide for New Languages

This guide provides instructions for finetuning XTTSv2 on a new language, using Vietnamese (vi) as an example.

[UPDATE] A finetuned model for Vietnamese is now available at anhnh2002/vnTTS on Hugging Face

1. Installation

本 repo 已经包含了 Coqui TTS 的所有必要代码，直接下载本 repo 即可，无需下载 Coqui TTS 到本地，也无需安装 Coqui TTS

git clone https://github.com/EL-CL/XTTSv2-Finetuning-for-New-Languages.git
（现在连接 GitHub 常失败，可下载本 repo 的 ZIP，将其上传至服务器再解压）

cd XTTSv2-Finetuning-for-New-Languages
conda create -n coqui python=3.10 -y
conda activate coqui
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia -y
pip install -r requirements.txt

2. Data Preparation

数据集的格式：所有音频放在一个文件夹内，文件夹可以放在任意地方。采样率无需调整，训练时会自动转换。每条音频最好不要超过 12 s，否则训练时会遇到数组越界报错

转写文件的格式——bel_tts_formatter（用于单发音人或者不区分发音人）：

001.wav|How do you do?
002.wav|Nice to meet you.
003.wav|Good to see you.
...

转写文件的格式——coqui（用于多发音人）：

audio_file|text|speaker_name
001.wav|How do you do?|@X
002.wav|Nice to meet you.|@Y
003.wav|Good to see you.|@Z
...

（格式出处：TTS/tts/datasets/formatters.py）

转写中的标点不是必要的。train_gpt_xtts.py 第 70 行 formatter= 需改为对应的格式名

转写文件可以保存在任意地方（如 models/metadata.txt），无需和数据集放在一个文件夹下

如果新增多个语言，需要每个语言准备一个转写文件

3. Pretrained Model Download

Hugging Face 墙内无法访问，可自己下载预训练模型后，上传至服务器（如存放在本目录下的 original_model 中）

预训练模型在 https://huggingface.co/coqui/XTTS-v2/tree/main，需要下载的文件见 download_checkpoint.py 中的链接

4. Vocabulary Extension and Configuration Adjustment

词典扩充：运行 extend_vocab.py 以根据新语言的标注扩充词典（详见文件末尾）

配置文件扩充：复制一份预训练模型的 config.json（如复制到 models/config.json），在第 130 行 "languages" 列表中添加要训练的新语言的语言代码/名称（与 extend_vocab.py 中的一致）

5. DVAE Finetuning (Optional)

（本项跳过）

To finetune the DVAE, run:

CUDA_VISIBLE_DEVICES=0 python train_dvae_xtts.py \
--output_path=checkpoints/ \
--train_csv_path=datasets/metadata_train.csv \
--eval_csv_path=datasets/metadata_eval.csv \
--language="vi" \
--num_epochs=5 \
--batch_size=512 \
--lr=5e-6

Update: If you have enough short texts in your datasets (about 20 hours), you do not need to finetune DVAE.

6. GPT Finetuning

训练参数的设置详见 train_gpt_xtts.py 开头的各项赋值

单卡训练：

TRAINER_TELEMETRY=0 CUDA_VISIBLE_DEVICES=0 python train_gpt_xtts.py

多卡训练：

TRAINER_TELEMETRY=0 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m trainer.distribute --script train_gpt_xtts.py

其中 CUDA_VISIBLE_DEVICES 是需要用到的显卡号。TRAINER_TELEMETRY 是在训练途中向 Coqui 反馈匿名数据，不需要开启，否则会因为无法连接外网而训练失败

训练过程中，best model 只会保存一个，可在途中手动复制出来一些保存

Note: Finetuning the HiFiGAN decoder was attempted but resulted in worse performance. DVAE and GPT finetuning are sufficient for optimal results.

7. 合成音频

通过脚本

运行 run_tts.py，相关输入和参数见文件

通过 Web UI

安装 Streamlit（pip install streamlit==1.40.1 numpy==1.22.0），然后执行 streamlit run run_tts_webui.py

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
TTS		TTS
recipes		recipes
targets		targets
.gitignore		.gitignore
Readme.md		Readme.md
download_checkpoint.py		download_checkpoint.py
extend_vocab.py		extend_vocab.py
requirements.txt		requirements.txt
run_tts.py		run_tts.py
run_tts_lib.py		run_tts_lib.py
run_tts_webui.py		run_tts_webui.py
train_dvae_xtts.py		train_dvae_xtts.py
train_dvae_xtts.sh		train_dvae_xtts.sh
train_gpt_xtts.py		train_gpt_xtts.py
train_gpt_xtts.sh		train_gpt_xtts.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

XTTSv2 Finetuning Guide for New Languages

Table of Contents

1. Installation

2. Data Preparation

3. Pretrained Model Download

4. Vocabulary Extension and Configuration Adjustment

5. DVAE Finetuning (Optional)

6. GPT Finetuning

7. 合成音频

通过脚本

通过 Web UI

About

Uh oh!

Releases

Packages

Languages

EL-CL/XTTSv2-Finetuning-for-New-Languages

Folders and files

Latest commit

History

Repository files navigation

XTTSv2 Finetuning Guide for New Languages

Table of Contents

1. Installation

2. Data Preparation

3. Pretrained Model Download

4. Vocabulary Extension and Configuration Adjustment

5. DVAE Finetuning (Optional)

6. GPT Finetuning

7. 合成音频

通过脚本

通过 Web UI

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages