IMHO: This is what we need. #1507

Ratteler · 2022-04-19T00:48:19Z

Ratteler
Apr 19, 2022

Amazing things can be done with the TTS stuff, but what we really need is the ability to train our a voice from one source and then overlay it on performance.
Altered.ai is now offering that, but starting at $160 USD a month, for a 3 month minimum commitment, and going UP from there, they have ensured that ONLY the bad actors have access to this technology.

I don't know about you, but I don't have $500 to commit a project, with yet another gatekeeper telling me that THEIR ethics might allow me to use their tech for my villain, or even my hero!!!
We need the FOSS community to break the open and democratize this tech so it's THEIR Genie in THEIR bottle.

It's really only with the framework of THIS kind of performance technology that the TTS abilities make sense. No matter whose voice is currently being used to speak text, the text is still missing too much tone, cadence, and authenticity to work as a performance.

A character "Font" needs to be able to recreate an audio performance, and then tweek it though TTS, so the TTS portion is limited to fixing a few lines. I would also like to see these voice "Fonts" be downloadable by anyone so we can start a library of virtual actors.
Ideally, those who don't have the equipment to create the Voice Font would still be able use them once the training data set is done, without we access or network. A Font needs to be a completely local resource.

Is there any chance this project could move in this direction? Or is any one doing this research working on THIS angle?

https://www.altered.ai/
https://youtu.be/AALf9w37COM

domcross · 2022-04-19T06:26:38Z

domcross
Apr 19, 2022

Is YourTTS going the direction you are looking for?

2 replies

Ratteler Apr 19, 2022
Author

Thanks for the reply. I find all this stuff fascinating. I've been saying for a decade what we need is a "Photoshop" for audio, but it seems more like we are getting a MS Word for audio.
We need a smart voice changer. I do a little voice acting work, and I've been a 3D animator for over 30 years. While I've been able to do a human caricature in various levels a convincing for over 25 years, the only voice I've ever had to work with was my own. No matter what kind of "effects" I can process, it never sound like someone else.
What the AI learns about how someone speaks should be used to alter their voice first, and then for text.

My ideal application would never require the text to be useful. I could record MY performance and sample the voice I wanted so it was there voice with MY performance.

That is my HOPE for where this tech is going.

erogol May 3, 2022
Maintainer

We're getting there. Just hang tight!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IMHO: This is what we need. #1507

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

IMHO: This is what we need. #1507

Uh oh!

Ratteler Apr 19, 2022

Replies: 1 comment · 2 replies

Uh oh!

domcross Apr 19, 2022

Uh oh!

Ratteler Apr 19, 2022 Author

Uh oh!

erogol May 3, 2022 Maintainer

Ratteler
Apr 19, 2022

Replies: 1 comment 2 replies

domcross
Apr 19, 2022

Ratteler Apr 19, 2022
Author

erogol May 3, 2022
Maintainer