Skip to content
Discussion options

You must be logged in to vote

two possible reasons:

  1. ref_text need to be text but not file path
    https://github.com/SWivid/F5-TTS/tree/main/src/f5_tts/infer

  2. if your finetuned model is trained with shorter audio samples, need to make sure total length (ref + gen audio length) shorter than the max length seen during finetuning.
    modify in e.g.
    https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/infer/utils_infer.py

    def preprocess_ref_audio_text(ref_audio_orig, ref_text, clip_short=True, show_info=print, device=device):
    and
    max_chars = int(len(ref_text.encode("utf-8")…

Replies: 4 comments 1 reply

Comment options

You must be logged in to vote
1 reply
@marvinbelfort
Comment options

Answer selected by marvinbelfort
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants