Description
torch.optim.SGD with momentum > 0 crashes when optimizing scalar (0-dimensional) parameters. The momentum buffer is allocated via zeros_like(param) which creates a 0-d tensor, and then the CPU kernel tries to slice-assign into a 0-d numpy array.
Repro
import candle as torch
p = torch.nn.Parameter(torch.randn(()))
optimizer = torch.optim.SGD([p], lr=1e-3, momentum=0.9)
loss = (p * 2).sum()
loss.backward()
optimizer.step() # IndexError
Error
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed [op=_sgd_step, device=cpu]
Root cause
src/candle/_backends/cpu/optim_ops.py line 37:
buf_data[:] = momentum * buf_data + (1 - dampening) * g_data
For 0-d numpy arrays, arr[:] is invalid. Needs to use np.copyto(buf_data, ...) or buf_data.fill(...) or buf_data[()] = ... to handle the scalar case.
Same pattern may affect _write_back() (line 21) and other optimizer kernels that use arr[:] on buffers that could be 0-d.
Context
Discovered running the pytorch_with_examples.py tutorial (DynamicNet section) where all model parameters are nn.Parameter(torch.randn(())) — scalar parameters.
Description
torch.optim.SGDwithmomentum > 0crashes when optimizing scalar (0-dimensional) parameters. The momentum buffer is allocated viazeros_like(param)which creates a 0-d tensor, and then the CPU kernel tries to slice-assign into a 0-d numpy array.Repro
Error
Root cause
src/candle/_backends/cpu/optim_ops.pyline 37:For 0-d numpy arrays,
arr[:]is invalid. Needs to usenp.copyto(buf_data, ...)orbuf_data.fill(...)orbuf_data[()] = ...to handle the scalar case.Same pattern may affect
_write_back()(line 21) and other optimizer kernels that usearr[:]on buffers that could be 0-d.Context
Discovered running the pytorch_with_examples.py tutorial (DynamicNet section) where all model parameters are
nn.Parameter(torch.randn(()))— scalar parameters.