Triton runtime produces repetitive non-speech audio with a custom F5-TTS checkpoint (SWivid default works) #1189
Unanswered
spacearound404
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Environment
Dockerfile.serverinsrc/f5_tts/runtime/triton_trtllmcuda-python==12.6(otherwise gotImportError: cannot import name 'cudart'; see reference)Models
SWivid/F5-TTS— works correctly via Tritonhttps://huggingface.co/SWivid/F5-TTS/tree/mainMisha24-10/F5-TTS_RUSSIAN/F5TTS_v1_Base_v2/model_last.pthttps://huggingface.co/Misha24-10/F5-TTS_RUSSIAN/tree/main/F5TTS_v1_Base_v2Expected behavior
Actual behavior
Repro steps
docker build . -f Dockerfile.server -t soar97/triton-f5-tts:24.12Triton config highlights
model_repo/f5_tts/config.pbtxtpoints to:model_path: ./F5-TTS/F5TTS_v1_Base_v2/model_last.ptvocab_file: ./F5-TTS/F5TTS_v1_Base/vocab.txt(I also tried replacing this with the exact Russianvocab.txtthat works in my CLI; see below)tllm_model_dir: ./f5_trt_llm_enginevocosWhat I tried
vocab.txt:vocab.txteverywhere (Triton and local) that gives good results in the CLI.config.pbtxtactually points to that file.nfe_stepsto 32 — as expected, did not fix the “nonsense audio” (it affects quality/stability but not token correctness).model_pathusesmodel_last.pt(EMA), notmodel_last_inference.safetensors.Logs/artifacts
Parameter dtype is None, using default dtype: DataType.FLOAT ...Provided but not required tensors: {... text_embed.* ...}f5_ttsandvocoderbecome READY, HTTP on 8000, gRPC on 8001, metrics 8002.Why I suspect TRT-LLM conversion
vocab.txt). If the vocab is correct and CLI works with it, but only TRT produces nonsense, conversion or engine may be at fault.Hypotheses
scripts/convert_checkpoint.py(name mapping, EMA handling, dtype, head scaling etc.) for this custom checkpoint, leading to an invalid DiT engine that outputs garbage mels.Beta Was this translation helpful? Give feedback.
All reactions