SGD optimizer crashes on 0-d (scalar) parameters with momentum

## Description

`torch.optim.SGD` with `momentum > 0` crashes when optimizing scalar (0-dimensional) parameters. The momentum buffer is allocated via `zeros_like(param)` which creates a 0-d tensor, and then the CPU kernel tries to slice-assign into a 0-d numpy array.

## Repro

```python
import candle as torch

p = torch.nn.Parameter(torch.randn(()))
optimizer = torch.optim.SGD([p], lr=1e-3, momentum=0.9)

loss = (p * 2).sum()
loss.backward()
optimizer.step()   # IndexError
```

## Error

```
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed [op=_sgd_step, device=cpu]
```

## Root cause

`src/candle/_backends/cpu/optim_ops.py` line 37:

```python
buf_data[:] = momentum * buf_data + (1 - dampening) * g_data
```

For 0-d numpy arrays, `arr[:]` is invalid. Needs to use `np.copyto(buf_data, ...)` or `buf_data.fill(...)` or `buf_data[()] = ...` to handle the scalar case.

Same pattern may affect `_write_back()` (line 21) and other optimizer kernels that use `arr[:]` on buffers that could be 0-d.

## Context

Discovered running the **pytorch_with_examples.py** tutorial (DynamicNet section) where all model parameters are `nn.Parameter(torch.randn(()))` — scalar parameters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SGD optimizer crashes on 0-d (scalar) parameters with momentum #322

Description

Repro

Error

Root cause

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

SGD optimizer crashes on 0-d (scalar) parameters with momentum #322

Description

Description

Repro

Error

Root cause

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions