mps: quickstart tutorial is SIGKILLed late in training

## Summary
- `quickstart_tutorial.py` with `USE_CANDLE=1` on local MPS learns normally for several epochs, but the process is still killed by the host (`Killed: 9`) late in epoch 5.
- Other tutorial blockers on this branch were fixed already (`default_collate`, MPS `nll_loss` backward, legacy `_rebuild_tensor` compatibility, and full-module `Parameter` pickling), so this appears to be the remaining MPS-specific tutorial/runtime issue.
- This is no longer the previous zero-gradient bug: training semantics look reasonable before the process is killed.

## Repro
```bash
source ~/miniconda3/etc/profile.d/conda.sh
USE_CANDLE=1 MPLBACKEND=Agg \
PYTHONPATH="/Users/lvyufeng/Projects/candle/.worktrees/test-pytorch-basics-mps:/Users/lvyufeng/Projects/candle/.worktrees/test-pytorch-basics-mps/src" \
conda run -n candle311-tutorials \
python /Users/lvyufeng/Projects/candle/.worktrees/test-pytorch-basics-mps/.tmp_tutorials/quickstart_tutorial.py
```

Observed tail:
```text
Killed: 9
ERROR conda.cli.main_run:execute(127): `conda run python quickstart_tutorial.py` failed.
```

## Evidence gathered
A 5-epoch diagnostic replica of the tutorial reached:
- epoch 1: test loss `2.1940`, acc `0.3491`
- epoch 2: test loss `1.9793`, acc `0.5190`
- epoch 3: test loss `1.6272`, acc `0.5989`
- epoch 4: test loss `1.3211`, acc `0.6321`
- then the process was killed during epoch 5 after step 800

Tutorial-style object counting (train + test loop, no manual cleanup) showed:
- baseline: `tensors=12`, `nodes=0`, `saved=0`, `mps_storages=6`, RSS `~288 MB`
- during training: `tensors=106`, `nodes=18`, `saved=23`, `mps_storages=28`, RSS `~344 MB`
- during test: `tensors=108`, `nodes=18`, `saved=23`, `mps_storages=30`
- after epoch end: `tensors=42`, `nodes=18`, `saved=23`, `mps_storages=30`

A separate train-only diagnostic with explicit per-step cleanup did **not** show unbounded growth:
- CPU stayed around `7` live tensors
- MPS stayed around `6` live tensors / `4` MPS storages

So this does **not** look like a simple linear leak in the core train step. It seems specific to the full tutorial-style long-running train+eval process on MPS.

## Suspected area
Remaining MPS-specific runtime/resource-management issue in the full quickstart train+eval loop, possibly involving long-lived tensors around the tutorial evaluation path or host/runtime limits reached only in the longer run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mps: quickstart tutorial is SIGKILLed late in training #327

Summary

Repro

Evidence gathered

Suspected area

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

mps: quickstart tutorial is SIGKILLed late in training #327

Description

Summary

Repro

Evidence gathered

Suspected area

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions