Add gradient norm logger with PyTorch Lightning integration, tests, and hook profiler#1003
Conversation
Trains a LeNet-style CNN (2 conv + 2 fc layers) on MNIST for 5 epochs and logs four metrics to Visdom live during training via explicit viz.line() calls: - training loss per epoch - test accuracy per epoch - learning rate per epoch - global L2 gradient norm per epoch Also displays a batch of 8 sample MNIST images before training starts using viz.images().
There was a problem hiding this comment.
Sorry @Jayantparashar10, you have reached your weekly rate limit of 500000 diff characters.
Please try again later or upgrade to continue using Sourcery
There was a problem hiding this comment.
Pull request overview
This PR adds gradient norm logging utilities and a PyTorch Lightning integration to Visdom. It introduces standalone math helpers in grad_norm.py, a VisdomLogger / GradientNormCallback pair in lightning_logger.py, a new log_gradient_norm() method on the existing Visdom client, lazy top-level exports via __getattr__, and a test suite covering all new components (53 tests, fully mocked).
Changes:
- New
py/visdom/grad_norm.pywithcompute_grad_normandcompute_layer_grad_normsutilities - New
py/visdom/lightning_logger.pywithVisdomLogger(LightningLoggersubclass) andGradientNormCallback(autograd hook-based callback) py/visdom/__init__.pygainslog_gradient_norm()onVisdom, module-level lazy exports, and minor formatter-driven style cleanups
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
py/visdom/grad_norm.py |
Core gradient-norm math shared between manual and Lightning code paths |
py/visdom/lightning_logger.py |
Lightning Logger and Callback implementations; main feature code |
py/visdom/__init__.py |
Adds log_gradient_norm() to Visdom, lazy imports for Lightning exports, minor style changes |
tests/test_loggers.py |
53-test suite covering math utilities, Visdom client method, and Lightning components |
example/pytorch_mnist_demo.py |
Manual MNIST training baseline showing explicit viz.line() calls |
example/demo_lightning_mnist_grad.py |
Lightning-based MNIST demo using VisdomLogger + GradientNormCallback |
example/hook_overhead_profile.py |
Benchmark comparing hook overhead across five scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
8fb3ee2 to
0c0eefa
Compare
0c0eefa to
233d5ca
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Description
This PR adds gradient norm logging and PyTorch Lightning integration to Visdom.
Right now there's no way to monitor gradient norms without writing the math yourself and calling
viz.line()manually. If you're using Lightning, there's no logger at all, so you end up duplicating the same boilerplate in every project. This adds proper support for both.New files:
py/visdom/grad_norm.pycontains two utility functions:compute_grad_norm()for the global Lp norm (same formula astorch.nn.utils.clip_grad_norm_) andcompute_layer_grad_norms()for per-parameter norms with named keys.py/visdom/lightning_logger.pyhas the Lightning integration.VisdomLoggeris a fullLoggersubclass so you just pass it toTrainerand everyself.log()call in yourLightningModuleautomatically gets plotted in Visdom. Each run gets its own versioned environment so nothing gets overwritten.GradientNormCallbackregisters autograd tensor hooks during training so gradient norms are captured and logged without touching your model code at all.example/pytorch_mnist_demo.pyis a manual MNIST training loop usingviz.line()directly, as a baseline.example/demo_lightning_mnist_grad.pyis the same training setup rewritten with Lightning, showing what the workflow looks like when you useVisdomLoggerandGradientNormCallbacktogether.example/hook_overhead_profile.pyruns a quick benchmark across five hook scenarios (no hook, forward_pre, forward, backward, grad norm) to show how much overhead each adds.tests/test_loggers.pyhas 53 unit tests covering the norm math, thelog_gradient_normmethod, logger behavior, and the callback lifecycle. Everything is mocked so no live server is needed to run them.Modified:
py/visdom/__init__.pygets a newlog_gradient_norm()method on theVisdomclient and lazy imports sofrom visdom import VisdomLogger, GradientNormCallbackworks out of the box without breaking anything existing.Motivation and Context
Visdom has no built-in gradient norm logging and no Lightning integration, so users end up writing
viz.line()calls manually for every metric.VisdomLoggerworks the same way TensorBoard or W&B loggers do, versioned environments per run so previous runs are not overwritten.GradientNormCallbackhooks into autograd so grad norms are captured during the backward pass with no changes needed in the model. For users not using Lightning,log_gradient_norm()is a simple one-liner on the existingVisdomclient.How Has This Been Tested?
Ran the full test suite:
Tests cover the grad norm math, the
log_gradient_normmethod, logger metric and hyperparam logging, and the callback's hook lifecycle,log_everythrottle, andper_layermode.Also ran the end-to-end demo:
Opened
http://localhost:8097, switched to themnist_lightning-v*environment, and confirmed all windows showed up:train_loss,val_loss,val_acc,grad_norm, per-layergrad_norm/model/*,grad_hook_ms, andhparams. Training ran for 5 epochs and finished atval_acc0.992.Hook benchmark:
The gradient norm hook adds less than 1% overhead on average.
Installed from source (
pip install -e .) and verified no regressions in basic manual logging.Tested on Python 3.12, PyTorch 2.10.0, lightning 2.x.
Screenshots (if appropriate):
Visdom dashboard after running
demo_lightning_mnist_grad.py(envmnist_lightning-v20260303_012236):conv1/weight,conv2/weight,fc1/weight, etc., all decaying over roughly 4700 stepsgrad_normchart following the same trendval_lossdropping from around 0.05 to 0.025 over 5 epochs,val_accreaching 0.992train_lossdropping from around 0.15 to 0.03grad_hook_msstaying near zero, confirming negligible hook overheadhparamstable withlr=0.001Types of changes
Checklist:
py/visdom/VERSIONaccording to Semantic Versioning