High duration loss

Hi @p0p4k, thanks for making this repo!

I am currently trying to train a 44.1kHz English model, but my model is struggling with a rather high duration loss when compared against your TensorBoard logs. It currently looks as follows:

![image](https://github.com/p0p4k/pflowtts_pytorch/assets/23167175/38684e71-deb4-4aa1-ad38-d69ef9e41791)

It seems like the other loss terms are correct.

Also, when the generated mel-spectrogram is passed to a vocoder, the audio is very much wrong in pronunciation -- maybe only half right.

My P-Flow config can be found [here](https://github.com/bookbot-hive/pflowtts_pytorch/blob/eng/configs/data/en_au_dean2zak.yaml), and the corresponding HiFi-GAN vocoder config can be found [here](https://github.com/bookbot-hive/hifi-gan/blob/master/config_v4.json).

Could you please let me know where I might be wrong? Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

High duration loss #40

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

High duration loss #40

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions