Skip to content

Implementation of the GaussianNoise transform for uint8 inputs #9169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

diaz-esparza
Copy link

Implements #9148, aka an implementation for the GaussianNoise transform for uint8 inputs.

Several configurations have been both benchmarked and validated to do this. Two main approaches have been tested:

  1. The first method tested transforms the input to the float data type, adds the gaussian noise and transforms the output back into uint8.
    • Note that we don't necessarily have to move the input image's representation range from [0, 255] to [0, 1] and back, as those conversions are performed per-pixel and can be slow on big images. Instead, as the noise is going to be multiplied by the sigma parameter anyways, we can use the sigma * 255 as a coefficient instead (and then add mean * 255), thus saving two floating-point array-wide operations.
  2. The second method involves less floating-point operations, opting to convert both the noise and the input image to an intermediate data type (int16) to then perform the addition and finally transform the result back into uint8.
    • Again, we multiply the noise by sigma * 255 and add mean * 255 before transforming the output to int16.
    • Using a signed, bigger integer dtype is essential here, as we need to both cover the legitimate [-255, 255] range that the noise tensor would theoretically generate, and also have a margin to be able to clamp pixels that might lie outside said range. in16 offers a range of [-32_768, 32_767], which is more than enough for our use case.
    • We have also tested some other configurations for this setup, like converting the noise to int16 before adding the mean (thus performing addition of ints instead of floats), rounding the result (to have more accurate results at a performance cost) and performing implicit conversions of data types instead of an explicit ones.

I've created a separate repo to host both the benchmark and the validation code (and results!) for these implementations. Here's the TL;DR:

  1. The 'intermediate int' approach is significantly faster on the consumer CPUs tested (ryzen 2600 & ryzen 5600), being at least 1.3x faster than converting the input image to a floating-point format and back.
    • Note: the speedup here is done with the largest input image my computer was able to process (8500x8500). Differences with tiny inputs (100x100) are negligible (1.05x), but results with 1000x1000 images can exceed a 2x speedup (!?!?)
    • Also, we've tested different dtypes for every method (e.g. {float16, float32, float64} for the floats and {int16, int32, int64} for the ints). We're taking the best result out of every combination of dtypes.
  2. Results on the GPUs tested (gtx1070ti & rtx3060ti) are a bit more nuanced. On the 1070ti, this same method is 1.17x faster, while on the 3060ti it's 0.95x as fast (so just a bit slower).
    • That's @ 8500x8500. On 1000x1000 once again results are better (1.3x on 1070ti, 1.06x on 3060ti).
  3. Generating the noise with 32-bit floats is fastest on CPUs. On GPUs, using 16-bit floats is generally a bit more performant. int16 is almost always the best intermediate int dtype.
  4. Implicit conversion for some reason is faster on GPUs, while there's next to no difference on CPUs. Outputs are identical on both settings.
  5. In the 'intermediate int' approach, rounding the noise before converting to int16 is slightly slower on all hardware (0.93x speed), and outputs are indistinguishable in spite of this better numerical precision. Leaving this off only leaves a very slight bias that lowers the effective standard deviation by about 0.3 * (1/255).
  6. Finally, also in the 'intermediate int' approach, converting the noise to int16 before adding a rounded mean parameter doesn't get us a significant speedup (even some slowdown was measured depending on hardware).

Given these results, the solution provided uses:

  1. Generated noise in the float32 dtype. It's faster on CPUs, slightly slower on GPUs. Decided to prioritize the first one given the typical use case of offloading augmentation to it, and the fact that it's slower by orders of magnitude.
  2. An approach where the N(0, 1)-distributed noise gets multiplied by 255*sigma, added by 255*mean and converted to int16 to add to the input image.
  3. An implicit conversion of the input image from uint8.
  4. After adding the noise, an optional clamp and a mandatory cast back to uint8.

The repo I mentioned before contains all the results that validate this methodology, as well as visualizations that display the very slight differences in results from the proposed implementation w.r.t. a more traditional one.

The PR also updates the documentation on the GaussianNoise class (validated the HTML doc render), updates one exception message, implements a new suite of tests for this new functionality and passes all the ones that were already there (and also the flake8 ones). There are two tests that fail, but I've checked and those are test configuration errors for the RandomIoUCrop class, which it has not been touched in these changes.

Also important to mention that it's basically inevitable not to cause integer overflowing when setting the clamp flag to False, but I've documented that too!

Very open to criticism and/or improvements, hope you consider this PR!!

Copy link

pytorch-bot bot commented Aug 6, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9169

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link

meta-cla bot commented Aug 6, 2025

Hi @diaz-esparza!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

Copy link

meta-cla bot commented Aug 6, 2025

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@diaz-esparza diaz-esparza changed the title Implements #9148 Implementation of the GaussianNoise transform for uint8 inputs Aug 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant