Replies: 1 comment
-
|
>>> erogol |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
>>> geneing
[May 21, 2019, 4:49pm]
I need help understanding something.
I removed the linear spectrogram part of the loss function, along with
postnet that generates it. I didn't need the linear spectrogram for the
vocoder and removing the linear spectrogram part save a LOT of GPU
memory during training. However, the reduced model doesn't produce
reasonable attention even after 50K steps. For the full model, attention
was reasonable after only a few thousand steps.
Why isn't mel spectrogram part of the loss not enough to train the
attention?
[This is an archived TTS discussion thread from discourse.mozilla.org/t/no-alignment-without-linear-spectrograms]
Beta Was this translation helpful? Give feedback.
All reactions