Skip to content

Wav2Vec2 pipeline feature extractor normalizes input over batch dimension, is it a feature or bug in design? #5609

Open
@ivan-alles

Description

@ivan-alles

I'm tryining to undestand the intuition of the input normalization using layer norm like this:

waveforms = nn.functional.layer_norm(waveforms, waveforms.shape) link

If the input is [B, L], this code will normalize it accross batch elements. I.e. to compute the mean, it will sum up all values regardless of the batch element they belong to. The same for variance. Is this really the intended behaviour that one batch element can inluence another one?

The original paper states: The raw waveform input to the encoder is normalized to zero mean and unit
variance. There is nothing about the normalization accross the batch.

I think, the right way is to normalize each batch element independently, and the code should be changed to: waveforms = nn.functional.layer_norm(waveforms, waveforms.shape[1:])

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions