Description
I'm tryining to undestand the intuition of the input normalization using layer norm like this:
waveforms = nn.functional.layer_norm(waveforms, waveforms.shape)
link
If the input is [B, L], this code will normalize it accross batch elements. I.e. to compute the mean, it will sum up all values regardless of the batch element they belong to. The same for variance. Is this really the intended behaviour that one batch element can inluence another one?
The original paper states: The raw waveform input to the encoder is normalized to zero mean and unit
variance. There is nothing about the normalization accross the batch.
I think, the right way is to normalize each batch element independently, and the code should be changed to: waveforms = nn.functional.layer_norm(waveforms, waveforms.shape[1:])