GitHub

TTS.cpp

Purpose and Goals

The general purpose of this repository is to support real time generation with open source TTS (text to speech) models across common device architectures using the GGML tensor library. Rapid STT (speach to text), embedding generation, and LLM generation are well supported on GGML (via whisper.cpp and llama.cpp respectively). As such, this repo seeks to compliment those functionalities with a similarly optimized and portable TTS library.

In this endeavor, MacOS and metal support will be treated as the primary platform, and, as such, functionality will initially be developed for MacOS and later extended to other OS.

Supported Functionality

Warning! Currently TTS.cpp should be treated as a proof of concept and is subject to further development. Existing functionality has not be tested outside of a MacOS X environment.

Model Support

Models	CPU	Metal Acceleration	Quantization	GGUF files
Parler TTS Mini	✓	✓	✓	here
Parler TTS Large	✓	✓	✓	here
Kokoro	✓	✗	✓	here
Dia	✓	✓	✓	here
Orpheus	✓	✗	✗	here

Additional Model support will initially be added based on open source model performance in both the old TTS model arena and new TTS model arena as well as the availability of said models' architectures and checkpoints.

Functionality

Planned Functionality	OS X	Linux	Windows
Basic CPU Generation	✓	✓	✗
Metal Acceleration	✓	_	_
CUDA support	_	✗	✗
Quantization	✓*	✗	✗
Layer Offloading	✗	✗	✗
Server Support	✓	✓	✗
Vulkan Support	_	✗	✗
Kompute Support	_	✗	✗
Streaming Audio	✗	✗	✗

* Currently only the generative model supports these.

Installation

WARNING! This library is only currently supported on OS X

Requirements:

Local GGUF format model file (see py-gguf for information on how to convert the hugging face models to GGUF).
C++17 and C17
- XCode Command Line Tools (via xcode-select --install) should suffice for OS X
CMake (>=3.14)
GGML pulled locally
- this can be accomplished via git clone -b support-for-tts [email protected]:mmwillet/ggml.git

GGML Patch

The local GGML library includes several required patches to the main branch of GGML (making the current TTS ggml branch out of date with modern GGML). Specifically these patches include major modifications to the convolutional transposition operation as well as several new GGML operations which have been implemented for TTS specific purposes; these include ggml_reciprocal, ggml_round, ggml_mod, ggml_cumsum, STFT, and iSTFT operations.

We are currently working on upstreaming some of these operations inorder to deprecate this patch requirement going forward.

Build:

Assuming that the above requirements are met the library and basic CLI example can be built by running the following command in the repository's base directory:

cmake -B build                                           
cmake --build build --config Release

The CLI executable and other exceutables will be in the ./build directory (e.g. ./build/cli) and the compiled library will be in the ./build/src (currently it is named parler as that is the only supported model).

If you wish to install TTS.cpp with Espeak-ng phonemization support, first install Espeak-ng. Depending on your installation method the path of the installed library will vary. Upon identifying the installation path to espeak-ng (it should contain ./lib, ./bin, ./include, and ./share directories), you can compile TTS.cpp with espeak phonemization support by running the follwing in the repositories base directory:

export ESPEAK_INSTALL_DIR=/absolute/path/to/espeak-ng/dir
cmake -B build
cmake --build build --config Release

On Linux, you don't need to manually download or export anything. Our build system will automatically detect the development packages installed on your machine:

# Change `apt` and the package names to match your distro
sudo apt install build-essential cmake # Minimum requirements
sudo apt install git libespeak-ng-dev libsdl2-dev pkg-config # Optional requirements
cmake -B build
cmake --build build --config Release

Usage

See the CLI example readme for more details on its general usage.

Quantization and Lower Precision Models

See the quantization cli readme for more details on its general usage and behavior. Please note Quantization and lower precision conversion is currently only supported for Parler TTS models.

Performance

Given that the central goal of this library is to support real time speech generation on OS X, generation speed has only been rigorously tested in that environment with supported models (i.e. Parler Mini version 1.0).

With the introduction of metal acceleration support for the DAC audio decoder model, text to speech generation is nearly possible in real time on a standard Apple M1 Max with ~3GB memory overhead. The best real time factor for accelerated models is currently 1.112033. This means that for every second of generated audio, the accelerated models require approximately 1.112033 seconds of generation time (with Q5_0 quantization applied to the generative model). For the latest stats via the performance battery see the readme therein.

Name		Name	Last commit message	Last commit date
Latest commit History 227 Commits
.github/workflows		.github/workflows
cmake		cmake
examples		examples
ggml @ 136da02		ggml @ 136da02
include		include
phonemization_training		phonemization_training
py-gguf		py-gguf
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
get-flags.mk		get-flags.mk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TTS.cpp

Purpose and Goals

Supported Functionality

Model Support

Functionality

Installation

Requirements:

GGML Patch

Build:

Usage

Quantization and Lower Precision Models

Performance

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Languages

License

mmwillet/TTS.cpp

Folders and files

Latest commit

History

Repository files navigation

TTS.cpp

Purpose and Goals

Supported Functionality

Model Support

Functionality

Installation

Requirements:

GGML Patch

Build:

Usage

Quantization and Lower Precision Models

Performance

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Languages

Packages