You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 1, 2023. It is now read-only.
#758 adds Python TensorFlow reference implementations for optimizer numerical correctness.
This issue tracks numerical differences between Swift optimizer implementations and the reference implementations. See references to TF-759 in Tests/TensorFlowTests/OptimizerTests.swift for occurrences.
Some differences are larger than others. I think we should strive for exact numerical equality if possible, for the same optimizer parameters.
Current examples:
SGD(for: values, learningRate: 1e-3): big difference
Swift: [0.49999535, -0.10000112, -3.000017]
Python: [ 0.49999967, -0.00999999, -0.01999998]
AdaGrad(for: values, learningRate: 1e-3, epsilon: 1e-7): big difference for the third value
Swift: [0.061354622, -0.057095252, -0.061786927]
Python: [ 0.06179592, -0.05709525, -0.05987222]
AdaMax(for: values, learningRate: 1e-3, epsilon: 1e-7): small difference