Implementation of the GaussianNoise transform for uint8 inputs #9169
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implements #9148, aka an implementation for the GaussianNoise transform for uint8 inputs.
Several configurations have been both benchmarked and validated to do this. Two main approaches have been tested:
float
data type, adds the gaussian noise and transforms the output back intouint8
.[0, 255]
to[0, 1]
and back, as those conversions are performed per-pixel and can be slow on big images. Instead, as the noise is going to be multiplied by thesigma
parameter anyways, we can use thesigma * 255
as a coefficient instead (and then addmean * 255
), thus saving two floating-point array-wide operations.int16
) to then perform the addition and finally transform the result back intouint8
.sigma * 255
and addmean * 255
before transforming the output toint16
.[-255, 255]
range that the noise tensor would theoretically generate, and also have a margin to be able to clamp pixels that might lie outside said range.in16
offers a range of[-32_768, 32_767]
, which is more than enough for our use case.int16
before adding the mean (thus performing addition of ints instead of floats), rounding the result (to have more accurate results at a performance cost) and performing implicit conversions of data types instead of an explicit ones.I've created a separate repo to host both the benchmark and the validation code (and results!) for these implementations. Here's the TL;DR:
int16
is almost always the best intermediate int dtype.int16
is slightly slower on all hardware (0.93x speed), and outputs are indistinguishable in spite of this better numerical precision. Leaving this off only leaves a very slight bias that lowers the effective standard deviation by about0.3 * (1/255)
.int16
before adding a rounded mean parameter doesn't get us a significant speedup (even some slowdown was measured depending on hardware).Given these results, the solution provided uses:
float32
dtype. It's faster on CPUs, slightly slower on GPUs. Decided to prioritize the first one given the typical use case of offloading augmentation to it, and the fact that it's slower by orders of magnitude.255*sigma
, added by255*mean
and converted toint16
to add to the input image.uint8
.uint8
.The repo I mentioned before contains all the results that validate this methodology, as well as visualizations that display the very slight differences in results from the proposed implementation w.r.t. a more traditional one.
The PR also updates the documentation on the
GaussianNoise
class (validated the HTML doc render), updates one exception message, implements a new suite of tests for this new functionality and passes all the ones that were already there (and also the flake8 ones). There are two tests that fail, but I've checked and those are test configuration errors for theRandomIoUCrop
class, which it has not been touched in these changes.Also important to mention that it's basically inevitable not to cause integer overflowing when setting the
clamp
flag to False, but I've documented that too!Very open to criticism and/or improvements, hope you consider this PR!!