Skip to content

BatchNorm on GPU without affine or tracking statistics #1810

Open
@avik-pal

Description

@avik-pal

Currently, if BatchNorm is being performed on GPU we assert that the parameters must be trainable and statistics must be tracked. Which is a reasonable assumption given CUDNN requires an explicit mean and variance during inference.

However, there are quite a few cases where we might want to disable these (we typically don't set track_stats=true when inside a Deep Equilibrium Model). Considering this I feel, if any of these parameters are disabled, we should fall back to the CPU implementation which relies on broadcasting and simple Linear Algebra operations. (We use those for GroupNorm and LayerNorm so we might as well use it for batchnorm)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions