Emphasis on syllables – How to choose?

Hi there, 

during the last days I've been trying out the Thorsten-voice in a python virtual environment setup, as described in the German language video [Freie Thorsten Stimme in LINUX lokal nutzen Text-to-Speech TTS Tutorial](https://www.youtube.com/watch?v=uyG1Sx7_3Yg).

I'm amazed by the very naturally sounding voice quality. Only in some words I found the emphasis put on syllables that, in spoken German language, don't usually receive it there. 

In some test phrase there was, for example, the originally English derived word "Marketing", which now got stressed on the second syllable. 

Now I wondered, whether there might be any way to instruct the tts program or tts-server to put the emphasis on the first syllable. 

On my web search I came across a [question](https://stackoverflow.com/questions/69561420/how-to-use-change-stress-in-words-azure-speech) where the original poster said: 
> I know that some voice engines use special characters like + or 'in front of a stressed  vowel.


I tried this suggestion several times (mainly referring to syllables, though), with different methods: 

1) directly by executing following commands: 
`tts --text "Marketing." --model_name tts_models/de/thorsten/vits --out_path marketing1.wav`
`tts --text "+Marketing." --model_name tts_models/de/thorsten/vits --out_path marketing2.wav`
`tts --text "'Marketing." --model_name tts_models/de/thorsten/vits --out_path marketing3.wav`

2) Starting a server 
`tts-server --model_name tts_models/de/thorsten/vits`

and subsequently using: 

a) the browser at localhost:5002/, inserting the strings
    "`+Marketing.`" (saved as marketing4.wav) and 
    "`'Marketing.`" (saved as marketing5.wav).

b) curl:
`curl -o marketing6.wav http://localhost:5002/api/tts?text=+Marketing.`
`curl -o marketing7.wav http://localhost:5002/api/tts?text=\'Marketing.`

c) [cTTS](https://pypi.org/project/cTTS/) (Python3):
`import cTTS` 
`cTTS.synthesizeToFile("marketing8.wav", "+Marketing.")`
`cTTS.synthesizeToFile("marketing9.wav", "'Marketing.")`


You can find the resulting sound files attached, packed in a zip file. 

To my ears, there is not really much difference in them, though. The emphasis seems to rest mainly on the second syllable. 


Now I'm wondering, what else I might be able to try. In case you have got any ideas or suggestions I would greatly appreciate getting to know. 

Maybe I should mention, I am only doing some first steps into programming. As to my system, I am working on an up-to-date linux system (a derivative of Debian 11, without systemd). It's an older machine, though. That's probably why, at the moment, I can only use the vits model. 

Thanks in advance

[marketing_wav.zip](https://github.com/thorstenMueller/Thorsten-Voice/files/11934349/marketing_wav.zip)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Emphasis on syllables – How to choose? #53

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Emphasis on syllables – How to choose? #53

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions