Unintelligible Voice after Training

Hello,

I’m experiencing unintelligible outputs when training FastSpeech2 on a low-resource language.
The model trains successfully, but the synthesized speech is not understandable.

Could this be related to pitch modeling, alignment issues, or phoneme representation?

Any guidance would be appreciated.