Help with f5-tts_infer-cli Configuration (pt/br model) #774

marvinbelfort · 2025-02-10T23:09:10Z

marvinbelfort
Feb 10, 2025

Hello! When I use the model without REF_AUDIO and REF_TEXT, it works fine with the default voice, generating audio in Brazilian Portuguese.
However, when I try to use it with REF_AUDIO and REF_TEXT, it manages to clone the voice's timbre, but the words become unrecognizable (gibberish).

I'm using it like this:

#!/bin/bash
f5-tts_infer-cli  \
   --REF_AUDIO "samples/refaudio/myvoice.mp3"  \ # contains audio in Brazilian Portuguese
   --REF_TEXT "samples/refaudio/myvoice.txt"  \ # contains the transcription
   --CKPT_FILE "modelos/firepixel/ptbr/model_last.pt"  \
   --GEN_TEXT "Isso é um teste de geração de áudio em português Brasileiro"  \
   -w resultado.wav

Could you guide me on what I might be doing wrong?
Thank you!

Answered by SWivid

Feb 11, 2025

two possible reasons:

ref_text need to be text but not file path
https://github.com/SWivid/F5-TTS/tree/main/src/f5_tts/infer
if your finetuned model is trained with shorter audio samples, need to make sure total length (ref + gen audio length) shorter than the max length seen during finetuning.
modify in e.g.
https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/infer/utils_infer.py

F5-TTS/src/f5_tts/infer/utils_infer.py

Line 291 in f062403

def preprocess_ref_audio_text(ref_audio_orig, ref_text, clip_short=True, show_info=print, device=device):

and

F5-TTS/src/f5_tts/infer/utils_infer.py

Lines 385 to 386 in f062403

max_chars = int(len(ref_text.encode("utf-8")…

View full answer

SWivid · 2025-02-11T08:01:36Z

SWivid
Feb 11, 2025
Maintainer

two possible reasons:

ref_text need to be text but not file path
https://github.com/SWivid/F5-TTS/tree/main/src/f5_tts/infer

if your finetuned model is trained with shorter audio samples, need to make sure total length (ref + gen audio length) shorter than the max length seen during finetuning.
modify in e.g.
https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/infer/utils_infer.py

F5-TTS/src/f5_tts/infer/utils_infer.py

Line 291 in f062403

    
           def preprocess_ref_audio_text(ref_audio_orig, ref_text, clip_short=True, show_info=print, device=device):

and

F5-TTS/src/f5_tts/infer/utils_infer.py

Lines 385 to 386 in f062403

    
           max_chars = int(len(ref_text.encode("utf-8")) / (audio.shape[-1] / sr) * (25 - audio.shape[-1] / sr)) 
        
           gen_text_batches = chunk_text(gen_text, max_chars=max_chars)

1 reply

marvinbelfort Feb 11, 2025
Author

Thanks!

pedrokaco · 2025-02-26T21:43:47Z

pedrokaco
Feb 26, 2025

Fala Marvin, ta conseguindo usar o F5 em portugues com sua voz? Queria uma ajuda tb, tem um tutorial do que voce fez?

0 replies

jba-eng · 2025-04-27T23:14:08Z

jba-eng
Apr 27, 2025

Fala galera, deu certo? Tambem tou recebendo "gibberish" como output, nao consigo gerar com esse "model_last.pt". Por favor avisa se tiver conseguido.

0 replies

jba-eng · 2025-04-28T22:06:00Z

jba-eng
Apr 28, 2025

@SWivid Do we need a specific vocab.txt file to run this model in Portuguese? I am able to run successfully, but the audio comes out as gibberish using the default "Emilia_ZH_EN_pinyin" vocab.txt file. I am guessing that this is the issue, but not sure.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Help with f5-tts_infer-cli Configuration (pt/br model) #774

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Help with f5-tts_infer-cli Configuration (pt/br model) #774

Uh oh!

marvinbelfort Feb 10, 2025

Replies: 4 comments · 1 reply

Uh oh!

SWivid Feb 11, 2025 Maintainer

Uh oh!

marvinbelfort Feb 11, 2025 Author

Uh oh!

pedrokaco Feb 26, 2025

Uh oh!

jba-eng Apr 27, 2025

Uh oh!

jba-eng Apr 28, 2025

marvinbelfort
Feb 10, 2025

Replies: 4 comments 1 reply

SWivid
Feb 11, 2025
Maintainer

marvinbelfort Feb 11, 2025
Author

pedrokaco
Feb 26, 2025

jba-eng
Apr 27, 2025

jba-eng
Apr 28, 2025