For both CPU and CUDA https://github.com/microsoft/onnxruntime-training-examples/pull/190#issuecomment-2197924047 https://github.com/microsoft/onnxruntime-training-examples/issues/189