EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

Overview

EmoVoice is a emotion-controllable TTS model that exploits large language models (LLMs) to enable fine-grained freestyle natural language emotion control. EmoVoice achieves SOTA performance on English EmoVoice-DB and Chinese Secap test sets.

Model

Performance

Environmental Setup

### Create a separate environment if needed

conda create -n EmoVoice python=3.10
conda activate EmoVoice
pip install -r requirements.txt

Train and Inference

Infer with checkpoints

bash examples/tts/scripts/inference_EmoVoice.sh
bash examples/tts/scripts/inference_EmoVoice-PP.sh
bash examples/tts/scripts/inference_EmoVoice_1.5B.sh

Train from scratch

# First Stage: Pretrain TTS
bash examples/tts/scripts/pretrain_EmoVoice.sh
bash examples/tts/scripts/pretrain_EmoVoice-PP.sh
bash examples/tts/scripts/pretrain_EmoVoice_1.5B.sh

# Second Stage: Finetune Emotional TTS
bash examples/tts/scripts/ft_EmoVoice.sh
bash examples/tts/scripts/ft_EmoVoice-PP.sh
bash examples/tts/scripts/ft_EmoVoice_1.5B.sh

Checkpoints

Model Checkpoints can be found on hugging face: https://huggingface.co/yhaha/EmoVoice.

Datasets

Datasets for Pretraining TTS: VoiceAssistant and Belle.
Datasets for Finetuning Emotional TTS: EmoVoice-DB and part of laions_got_talent(the part we use is also uploaded to EmoVoice-DB).

Acknowledgements

Our codes is built on SLAM-LLM.
CosyVoice valuable repo.

Citation

If our work is useful for you, please cite as:

@article{yang2025emovoice,
  title={EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting},
  author={Yang, Guanrou and Yang, Chen and Chen, Qian and Ma, Ziyang and Chen, Wenxi and Wang, Wen and Wang, Tianrui and Yang, Yifan and Niu, Zhikang and Liu, Wenrui and others},
  journal={arXiv preprint arXiv:2504.12867},
  year={2025}
}

License

Our code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license due to the training data Emilia, which is an in-the-wild dataset. Sorry for any inconvenience this may cause.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
audios		audios
css		css
examples/tts		examples/tts
pics		pics
scripts		scripts
src/slam_llm		src/slam_llm
.gitignore		.gitignore
README.md		README.md
index.html		index.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

Overview

Model

Performance

Environmental Setup

Train and Inference

Infer with checkpoints

Train from scratch

Checkpoints

Datasets

Acknowledgements

Citation

License

About

Uh oh!

Uh oh!

Contributors 2

Languages

yanghaha0908/EmoVoice

Folders and files

Latest commit

History

Repository files navigation

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

Overview

Model

Performance

Environmental Setup

Train and Inference

Infer with checkpoints

Train from scratch

Checkpoints

Datasets

Acknowledgements

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Languages