Investigate optimizer numerical correctness vs Python reference implementations

https://github.com/tensorflow/swift-apis/pull/758 adds Python TensorFlow reference implementations for optimizer numerical correctness.

This issue tracks numerical differences between Swift optimizer implementations and the reference implementations. See references to TF-759 in `Tests/TensorFlowTests/OptimizerTests.swift` for occurrences.

---

Some differences are larger than others. I think we should strive for exact numerical equality if possible, for the same optimizer parameters.

Current examples:

- `SGD(for: values, learningRate: 1e-3)`: big difference
  - Swift: `[0.49999535, -0.10000112, -3.000017]`
  - Python: `[ 0.49999967, -0.00999999, -0.01999998]`
- `AdaGrad(for: values, learningRate: 1e-3, epsilon: 1e-7)`: big difference for the third value
  - Swift: `[0.061354622, -0.057095252, -0.061786927]`
  - Python: `[ 0.06179592, -0.05709525, -0.05987222]`
- `AdaMax(for: values, learningRate: 1e-3, epsilon: 1e-7)`: small difference
  - Swift: `[0.9999907, -0.99999064, -0.9999907]`
  - Python: `[ 0.99999076, -0.99999064, -0.99999064]`
- `Adam(for: values, learningRate: 1e-3, epsilon: 1e-7)`: smallest difference
  - Swift: `[0.9999906, -0.9999898, -0.99999064]`
  - Python: `[ 0.9999907, -0.9999898, -0.9999904]`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigate optimizer numerical correctness vs Python reference implementations #759

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigate optimizer numerical correctness vs Python reference implementations #759

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions