Skip to content

Reimplement gradient L2 norm computation with correct math and unit test#990

Open
mandira15 wants to merge 3 commits intofossasia:masterfrom
mandira15:add-gradient-norm-logger-v3
Open

Reimplement gradient L2 norm computation with correct math and unit test#990
mandira15 wants to merge 3 commits intofossasia:masterfrom
mandira15:add-gradient-norm-logger-v3

Conversation

@mandira15
Copy link

@mandira15 mandira15 commented Feb 27, 2026

Description

This PR reworks the gradient L2 norm computation to ensure correct mathematical implementation, clean API behavior, and proper test coverage.

Motivation and Context

The previous implementation had issues related to incorrect L2 norm calculation (double square root), duplicate initialization, and unclear return type behavior. This update fixes the mathematical computation and ensures the function returns a consistent float value.

Changes

  • Correct implementation of global L2 norm:
    sqrt(sum(||g_i||^2))
  • Removed duplicate initialization
  • Removed unnecessary extra sqrt operation
  • Avoided usage of .data
  • Ensured return type is float
  • Added unit test for validation

How Has This Been Tested?

  • Added unit test verifying:
    • Return type is float
    • Norm is positive after backward pass
  • Manually verified behavior with a simple nn.Linear model

Types of changes

  • Bug fix
  • Code refactor
  • Added tests

Summary by Sourcery

Implement a correct global L2 gradient norm helper and add coverage for its behavior.

New Features:

  • Add a compute_gradient_l2_norm helper to compute the global L2 norm of all gradients in a PyTorch model.

Bug Fixes:

  • Fix incorrect gradient L2 norm calculation by using the mathematically correct aggregation over parameter gradients and returning a scalar float value.

Tests:

  • Add a unit test to verify compute_gradient_l2_norm returns a positive float after a backward pass.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Feb 27, 2026

Reviewer's Guide

Reimplements gradient L2 norm computation as a standalone helper using correct global L2 math, avoids deprecated/unsafe tensor APIs, and adds a focused unit test to validate return type and positivity after backprop.

File-Level Changes

Change Details Files
Introduce a standalone helper to compute the global L2 norm of all gradients in a PyTorch model using correct math and cleaner tensor handling.
  • Added compute_gradient_l2_norm(model) that iterates over model.parameters(), accumulates the squared L2 norms of existing gradients, and returns sqrt(sum(
Add a unit test validating the gradient L2 norm helper behavior after a backward pass.
  • Created a simple nn.Linear model and MSE loss setup, runs forward and backward to populate gradients.
  • Calls compute_gradient_l2_norm and asserts that the return value is a float and strictly positive after backpropagation.
tests/test_gradient_norm.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • The new compute_gradient_l2_norm is defined at the same indentation level as other instance methods but takes only model and no self, so it should either be moved to module-level or annotated as a @staticmethod to avoid confusion and potential misuse.
  • The docstring for compute_gradient_l2_norm is currently commented out with #; consider converting it into a proper triple-quoted docstring so that tooling and users can see the function description.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `compute_gradient_l2_norm` is defined at the same indentation level as other instance methods but takes only `model` and no `self`, so it should either be moved to module-level or annotated as a `@staticmethod` to avoid confusion and potential misuse.
- The docstring for `compute_gradient_l2_norm` is currently commented out with `#`; consider converting it into a proper triple-quoted docstring so that tooling and users can see the function description.

## Individual Comments

### Comment 1
<location path="py/visdom/__init__.py" line_range="1748-1762" />
<code_context>
+        if len(parameters) == 0:
+            return 0.0
+
+        device = parameters[0].device
+        total_norm = torch.zeros(1, device=device)
+
+        for param in parameters:
+            if param.grad is not None:
+                param_norm = param.grad.detach().norm(2)
</code_context>
<issue_to_address>
**suggestion:** Assuming all parameters and their gradients are on the same device may break in mixed-device setups.

`device` is taken from `parameters[0]` and used for `total_norm`, but you then iterate over all params. In mixed-device setups (model-parallel, sharded, etc.), this can cause device mismatch when accumulating norms. Consider either asserting all params/grads are on the same device, or computing norms per device and combining them safely.

```suggestion
        import torch
        parameters = list(model.parameters())
        if len(parameters) == 0:
            return 0.0

        # Accumulate on CPU to safely handle mixed-device (model-parallel/sharded) setups.
        total_norm_sq = 0.0
        for param in parameters:
            if param.grad is not None:
                # Compute norm on the parameter's device, then move the scalar to CPU.
                param_norm = param.grad.detach().norm(2).item()
                total_norm_sq += param_norm ** 2

        total_norm = total_norm_sq ** 0.5
        return float(total_norm)
```
</issue_to_address>

### Comment 2
<location path="tests/test_gradient_norm.py" line_range="5" />
<code_context>
+import torch.nn as nn
+from visdom import compute_gradient_l2_norm
+
+def test_compute_gradient_l2_norm_returns_float():
+    model = nn.Linear(5, 1)
+    x = torch.randn(3, 5)
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for the case where no `.backward()` has been called so all gradients are `None` and the function should return `0.0`.

Please also add a test that calls `compute_gradient_l2_norm(model)` before any backward pass and asserts it returns `0.0`, confirming that parameters with `grad is None` are handled correctly.
</issue_to_address>

### Comment 3
<location path="tests/test_gradient_norm.py" line_range="15-18" />
<code_context>
+    loss = criterion(output, y)
+    loss.backward()
+
+    norm = compute_gradient_l2_norm(model)
+
+    assert isinstance(norm, float)
+    assert norm > 0
\ No newline at end of file
</code_context>
<issue_to_address>
**suggestion (testing):** Add a deterministic test that checks the actual numeric value of the L2 norm for a small model.

This only checks type and positivity and doesn’t validate that the implementation actually computes `sqrt(sum(||g_i||^2))`. Please add a deterministic test (with a fixed seed) using a tiny model (e.g., one Linear layer) where you:
- run a forward/backward pass,
- manually compute the norm from `param.grad` tensors,
- and assert `compute_gradient_l2_norm(model)` matches that value (e.g., with `pytest.approx`).
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

import torch.nn as nn
from visdom import compute_gradient_l2_norm

def test_compute_gradient_l2_norm_returns_float():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a test for the case where no .backward() has been called so all gradients are None and the function should return 0.0.

Please also add a test that calls compute_gradient_l2_norm(model) before any backward pass and asserts it returns 0.0, confirming that parameters with grad is None are handled correctly.

Comment on lines +15 to +18
norm = compute_gradient_l2_norm(model)

assert isinstance(norm, float)
assert norm > 0 No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a deterministic test that checks the actual numeric value of the L2 norm for a small model.

This only checks type and positivity and doesn’t validate that the implementation actually computes sqrt(sum(||g_i||^2)). Please add a deterministic test (with a fixed seed) using a tiny model (e.g., one Linear layer) where you:

  • run a forward/backward pass,
  • manually compute the norm from param.grad tensors,
  • and assert compute_gradient_l2_norm(model) matches that value (e.g., with pytest.approx).

@mandira15
Copy link
Author

Hi @norbusan ,

I have addressed all the requested changes and pushed the updates.
Could you please review it once when you have time?

Thank you for your guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant