BatchNorm on GPU without affine or tracking statistics

Currently, if BatchNorm is being performed on GPU we assert that the parameters must be trainable and statistics must be tracked. Which is a reasonable assumption given CUDNN requires an explicit mean and variance during inference.

However, there are quite a few cases where we might want to disable these (we typically don't set `track_stats=true` when inside a Deep Equilibrium Model). Considering this I feel, if any of these parameters are disabled, we should fall back to the CPU implementation which relies on broadcasting and simple Linear Algebra operations. (We use those for GroupNorm and LayerNorm so we might as well use it for batchnorm)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BatchNorm on GPU without affine or tracking statistics #1810

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

BatchNorm on GPU without affine or tracking statistics #1810

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions