Skip to content

SGD optimizer crashes on 0-d (scalar) parameters with momentum #322

@lvyufeng

Description

@lvyufeng

Description

torch.optim.SGD with momentum > 0 crashes when optimizing scalar (0-dimensional) parameters. The momentum buffer is allocated via zeros_like(param) which creates a 0-d tensor, and then the CPU kernel tries to slice-assign into a 0-d numpy array.

Repro

import candle as torch

p = torch.nn.Parameter(torch.randn(()))
optimizer = torch.optim.SGD([p], lr=1e-3, momentum=0.9)

loss = (p * 2).sum()
loss.backward()
optimizer.step()   # IndexError

Error

IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed [op=_sgd_step, device=cpu]

Root cause

src/candle/_backends/cpu/optim_ops.py line 37:

buf_data[:] = momentum * buf_data + (1 - dampening) * g_data

For 0-d numpy arrays, arr[:] is invalid. Needs to use np.copyto(buf_data, ...) or buf_data.fill(...) or buf_data[()] = ... to handle the scalar case.

Same pattern may affect _write_back() (line 21) and other optimizer kernels that use arr[:] on buffers that could be 0-d.

Context

Discovered running the pytorch_with_examples.py tutorial (DynamicNet section) where all model parameters are nn.Parameter(torch.randn(())) — scalar parameters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions