Skip to content

Add gradient norm logger with PyTorch Lightning integration, tests, and hook profiler#1003

Open
Jayantparashar10 wants to merge 6 commits intofossasia:masterfrom
Jayantparashar10:feat/gradient-norm-logger-with-lightning
Open

Add gradient norm logger with PyTorch Lightning integration, tests, and hook profiler#1003
Jayantparashar10 wants to merge 6 commits intofossasia:masterfrom
Jayantparashar10:feat/gradient-norm-logger-with-lightning

Conversation

@Jayantparashar10
Copy link

Description

This PR adds gradient norm logging and PyTorch Lightning integration to Visdom.

Right now there's no way to monitor gradient norms without writing the math yourself and calling viz.line() manually. If you're using Lightning, there's no logger at all, so you end up duplicating the same boilerplate in every project. This adds proper support for both.

New files:

  • py/visdom/grad_norm.py contains two utility functions: compute_grad_norm() for the global Lp norm (same formula as torch.nn.utils.clip_grad_norm_) and compute_layer_grad_norms() for per-parameter norms with named keys.

  • py/visdom/lightning_logger.py has the Lightning integration. VisdomLogger is a full Logger subclass so you just pass it to Trainer and every self.log() call in your LightningModule automatically gets plotted in Visdom. Each run gets its own versioned environment so nothing gets overwritten. GradientNormCallback registers autograd tensor hooks during training so gradient norms are captured and logged without touching your model code at all.

  • example/pytorch_mnist_demo.py is a manual MNIST training loop using viz.line() directly, as a baseline.

  • example/demo_lightning_mnist_grad.py is the same training setup rewritten with Lightning, showing what the workflow looks like when you use VisdomLogger and GradientNormCallback together.

  • example/hook_overhead_profile.py runs a quick benchmark across five hook scenarios (no hook, forward_pre, forward, backward, grad norm) to show how much overhead each adds.

  • tests/test_loggers.py has 53 unit tests covering the norm math, the log_gradient_norm method, logger behavior, and the callback lifecycle. Everything is mocked so no live server is needed to run them.

Modified:

  • py/visdom/__init__.py gets a new log_gradient_norm() method on the Visdom client and lazy imports so from visdom import VisdomLogger, GradientNormCallback works out of the box without breaking anything existing.

Motivation and Context

Visdom has no built-in gradient norm logging and no Lightning integration, so users end up writing viz.line() calls manually for every metric. VisdomLogger works the same way TensorBoard or W&B loggers do, versioned environments per run so previous runs are not overwritten. GradientNormCallback hooks into autograd so grad norms are captured during the backward pass with no changes needed in the model. For users not using Lightning, log_gradient_norm() is a simple one-liner on the existing Visdom client.

How Has This Been Tested?

Ran the full test suite:

python -m pytest tests/test_loggers.py -q --rootdir=. --import-mode=importlib

Tests cover the grad norm math, the log_gradient_norm method, logger metric and hyperparam logging, and the callback's hook lifecycle, log_every throttle, and per_layer mode.

Also ran the end-to-end demo:

python -m visdom.server
python example/demo_lightning_mnist_grad.py

Opened http://localhost:8097, switched to the mnist_lightning-v* environment, and confirmed all windows showed up: train_loss, val_loss, val_acc, grad_norm, per-layer grad_norm/model/*, grad_hook_ms, and hparams. Training ran for 5 epochs and finished at val_acc 0.992.

Hook benchmark:

python example/hook_overhead_profile.py

The gradient norm hook adds less than 1% overhead on average.

Installed from source (pip install -e .) and verified no regressions in basic manual logging.

Tested on Python 3.12, PyTorch 2.10.0, lightning 2.x.

Screenshots (if appropriate):

Visdom dashboard after running demo_lightning_mnist_grad.py (env mnist_lightning-v20260303_012236):

image image
  • Per-layer grad norm charts for conv1/weight, conv2/weight, fc1/weight, etc., all decaying over roughly 4700 steps
  • Global grad_norm chart following the same trend
  • val_loss dropping from around 0.05 to 0.025 over 5 epochs, val_acc reaching 0.992
  • train_loss dropping from around 0.15 to 0.03
  • grad_hook_ms staying near zero, confirming negligible hook overhead
  • hparams table with lr=0.001

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Code refactor or cleanup (changes to existing code for improved readability or performance)

Checklist:

  • I adapted the version number under py/visdom/VERSION according to Semantic Versioning
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

Trains a LeNet-style CNN (2 conv + 2 fc layers) on MNIST for 5 epochs
and logs four metrics to Visdom live during training via explicit
viz.line() calls:

  - training loss per epoch
  - test accuracy per epoch
  - learning rate per epoch
  - global L2 gradient norm per epoch

Also displays a batch of 8 sample MNIST images before training starts
using viz.images().
Copilot AI review requested due to automatic review settings March 2, 2026 20:46
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @Jayantparashar10, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds gradient norm logging utilities and a PyTorch Lightning integration to Visdom. It introduces standalone math helpers in grad_norm.py, a VisdomLogger / GradientNormCallback pair in lightning_logger.py, a new log_gradient_norm() method on the existing Visdom client, lazy top-level exports via __getattr__, and a test suite covering all new components (53 tests, fully mocked).

Changes:

  • New py/visdom/grad_norm.py with compute_grad_norm and compute_layer_grad_norms utilities
  • New py/visdom/lightning_logger.py with VisdomLogger (Lightning Logger subclass) and GradientNormCallback (autograd hook-based callback)
  • py/visdom/__init__.py gains log_gradient_norm() on Visdom, module-level lazy exports, and minor formatter-driven style cleanups

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
py/visdom/grad_norm.py Core gradient-norm math shared between manual and Lightning code paths
py/visdom/lightning_logger.py Lightning Logger and Callback implementations; main feature code
py/visdom/__init__.py Adds log_gradient_norm() to Visdom, lazy imports for Lightning exports, minor style changes
tests/test_loggers.py 53-test suite covering math utilities, Visdom client method, and Lightning components
example/pytorch_mnist_demo.py Manual MNIST training baseline showing explicit viz.line() calls
example/demo_lightning_mnist_grad.py Lightning-based MNIST demo using VisdomLogger + GradientNormCallback
example/hook_overhead_profile.py Benchmark comparing hook overhead across five scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Jayantparashar10 Jayantparashar10 force-pushed the feat/gradient-norm-logger-with-lightning branch from 8fb3ee2 to 0c0eefa Compare March 3, 2026 05:55
@Jayantparashar10 Jayantparashar10 force-pushed the feat/gradient-norm-logger-with-lightning branch from 0c0eefa to 233d5ca Compare March 3, 2026 06:04
Copilot AI review requested due to automatic review settings March 3, 2026 06:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants