Skip to content

The comparison in the Gradient Centralization example might not be fair #2111

Open
@albertoesmp

Description

@albertoesmp

Issue Type

Documentation Bug

Source

source

Keras Version

Keras 3.9.2

Custom Code

Yes

OS Platform and Distribution

Linux Ubuntu 24.04

Python version

3.12

GPU model and memory

RTX 4090 with 24564 MiB

Current Behavior?

I think the example Gradient Centralization for Better Training Performance (https://keras.io/examples/vision/gradient_centralization/#train-the-model-with-gc) contains a wrong comparison between GC and no-GC models. I've prepared a modification of the Colab notebook that wraps the model architecture logic into a make_model() function. Then I call this function twice, once before compiling the no-GC model and once before compiling the GC one. When doing the comparison this way, the GC model is trained independently. I think the original comparison was wrong because the GC model was a recompiled no-GC model that was trained before, thus explaining its 71% ACC already in the first epoch.

My corrected colab example can be seen here: https://colab.research.google.com/drive/1HyTaaKevI1Izv6XsswO2NnktodO6_tuv?usp=sharing

Standalone code to reproduce the issue or tutorial link

This is the original Colab with the wrong comparison due to the GC model training starting at the state of the trained no-GC model: https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/vision/ipynb/gradient_centralization.ipynb

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions