Hello,
I’m experiencing unintelligible outputs when training FastSpeech2 on a low-resource language.
The model trains successfully, but the synthesized speech is not understandable.
Could this be related to pitch modeling, alignment issues, or phoneme representation?
Any guidance would be appreciated.
Hello,
I’m experiencing unintelligible outputs when training FastSpeech2 on a low-resource language.
The model trains successfully, but the synthesized speech is not understandable.
Could this be related to pitch modeling, alignment issues, or phoneme representation?
Any guidance would be appreciated.