Skip to content

Commit d49e9f0

Browse files
Igor Shilovfacebook-github-bot
authored andcommitted
Functorch added to grad sample README (#497)
Summary: As we've landed functorch-backed GradSampleModule, we also want to update the README which helps people navigate different grad samplers For the content of this readme I've also run benchmarks for all the options - some results are surprising and hard to interpret, but we have mostly consistent picture ## tl;dr * There's no difference on CPU * functorch performance depends on the exact GPU setup: same benchmarks show could be up to 4x slower or 2x faster than the baseline depending on the GPU * EW are consistently 25-30% faster for linear, but not conv ## benchmarks | device | benchmark | hooks | functorch | ExpandedWeights |:-------:|:-------:|:-------:|:-------:|:-------:| | cpu | nn.Conv2d | 1x | 0.9x | 1x | | cpu | nn.Linear | 1x | 1x | 0.9x | | cpu | full epoch on CIFAR10 example | 1x | 1.5x | 1x | | Tesla T4 (Google Colab) | nn.Conv2d | 1x | 4x | 0.9x | | Tesla T4 (Google Colab) | nn.Linear | 1x | 1.25x | 0.75x | | A100 (AWS) | nn.Conv2d | 1x | 0.5x | 1x | | A100 (AWS) | nn.Linear | 1x | 1.5x | 0.75x | | A100 (AWS) | full epoch on CIFAR10 example | 1x | 1.1x | 0.75x | FYI samdow Pull Request resolved: #497 Reviewed By: karthikprasad Differential Revision: D39352067 Pulled By: ffuuugor fbshipit-source-id: 19b4fff80fe3c1963fab24e1292ae625200bc749
1 parent 5eb3ae0 commit d49e9f0

File tree

2 files changed

+44
-15
lines changed

2 files changed

+44
-15
lines changed

examples/char-lstm_README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,14 @@ Download the training zip from https://download.pytorch.org/tutorial/data.zip an
99
Run with dp:
1010

1111
```
12-
python char-lstm-classification.py --epochs=50 --learning-rate=2.0 --hidden-size=128 --delta=8e-5 --sample-rate=0.05 --n-lstm-layers=1 --sigma=1.0 --max-per-sample-grad-norm=1.5 --device=cuda:0 --data-root="/my/folder/data/names/" --test-every 5
12+
python char-lstm-classification.py --epochs=50 --learning-rate=2.0 --hidden-size=128 --delta=8e-5 --batch-size 64 --n-layers=1 --sigma=1.0 --max-per-sample-grad-norm=1.5 --device=cuda:0 --data-root="/my/folder/data/names/" --test-every 5
1313
```
1414

1515
You should get something like this: Test Accuracy: 0.739542 (ε = 11.83, δ = 8e-05) for α = 2.7
1616

1717
Run without dp:
1818

1919
```
20-
python char-lstm-classification.py --epochs=50 --learning-rate=0.5 --hidden-size=128 --sample-rate=0.05 --n-lstm-layers=1 --disable-dp --device=cuda:1 --data-root="/my/folder/data/names/" --test-every 5
20+
python char-lstm-classification.py --epochs=50 --learning-rate=0.5 --hidden-size=128 --batch-size 64 --n-layers=1 --disable-dp --device=cuda:1 --data-root="/my/folder/data/names/" --test-every 5
2121
```
2222
You should get something like this: Test Accuracy: 0.760716

opacus/grad_sample/README.md

Lines changed: 42 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,11 @@ which one to use.
1414
improves upon ``GradSampleModule`` on performance and functionality.
1515

1616
**TL;DR:** If you want stable implementation, use ``GradSampleModule`` (`grad_sample_mode="hooks"`).
17-
If you want to experiment with the new functionality - try ``GradSampleModuleExpandedWeights``(`grad_sample_mode="ew"`)
18-
and switch back to ``GradSampleModule`` if you encounter strange errors or unexpexted behaviour.
17+
If you want to experiment with the new functionality, you have two options. Try
18+
``GradSampleModuleExpandedWeights``(`grad_sample_mode="ew"`) for better performance and `grad_sample_mode=functorch`
19+
if your model is not supported by ``GradSampleModule``.
20+
21+
Please switch back to ``GradSampleModule``(`grad_sample_mode="hooks"`) if you encounter strange errors or unexpexted behaviour.
1922
We'd also appreciate it if you report these to us
2023

2124
## Hooks-based approach
@@ -26,6 +29,23 @@ Computes per-sample gradients for a model using backward hooks. It requires cust
2629
trainable layer in the model. We provide such methods for most popular PyTorch layers. Additionally, client can
2730
provide their own grad sampler for any new unsupported layer (see [tutorial](https://github.com/pytorch/opacus/blob/main/tutorials/guide_to_grad_sampler.ipynb))
2831

32+
## Functorch approach
33+
- Model wrapping class: ``opacus.grad_sample.grad_sample_module.GradSampleModule (force_functorch=True)``
34+
- Keyword argument for ``PrivacyEngine.make_private()``: `grad_sample_mode="functorch"`
35+
36+
[functorch](https://pytorch.org/functorch/stable/) is JAX-like composable function transforms for PyTorch.
37+
With functorch we can compute per-sample-gradients efficiently by using function transforms. With the efficient
38+
parallelization provided by `vmap`, we can obtain per-sample gradients for any function function (i.e. any model) by
39+
doing essentially `vmap(grad(f(x)))`.
40+
41+
Our experiments show, that `vmap` computations in most cases are as fast as manually written grad samplers used in
42+
hooks-based approach.
43+
44+
With the current implementation `GradSampleModule` will use manual grad samplers for known modules (i.e. maintain the
45+
old behaviour for all previously supported models) and will only use functorch for unknown modules.
46+
47+
With `force_functorch=True` passed to the constructor `GradSampleModule` will rely exclusively on functorch.
48+
2949
## ExpandedWeigths approach
3050
- Model wrapping class: ``opacus.grad_sample.gsm_exp_weights.GradSampleModuleExpandedWeights``
3151
- Keyword argument for ``PrivacyEngine.make_private()``: `grad_sample_mode="ew"`
@@ -42,14 +62,23 @@ is roughly the same.
4262
Please note that these are known limitations and we plan to improve Expanded Weights and bridge the gap in feature completeness
4363

4464

45-
| xxx | Hooks | Expanded Weights |
46-
|:-----:|:-------:|:------------------:|
47-
| Required PyTorch version | 1.8+ | 1.13+ |
48-
| Development status | Underlying mechanism deprecated | Beta |
49-
| Performance | - | ✅ Likely up to 2.5x faster |
50-
| torchscript models | Not supported | ✅ Supported |
51-
| Client-provided grad sampler | ✅ Supported | Not supported |
52-
| `batch_first=False` | ✅ Supported | Not supported |
53-
| Most popular nn.* layers | ✅ Supported | ✅ Supported |
54-
| Recurrent networks | ✅ Supported | Not supported |
55-
| Padding `same` in Conv | ✅ Supported | Not supported |
65+
| xxx | Hooks | Expanded Weights | Functorch |
66+
|:----------------------------:|:-------------------------------:|:----------------:|:------------:|
67+
| Required PyTorch version | 1.8+ | 1.13+ | 1.12 (to be updated) |
68+
| Development status | Underlying mechanism deprecated | Beta | Beta |
69+
| Runtime Performance† | baseline |~25% faster | 🟨 0-50% slower |
70+
| Any DP-allowed†† layers | Not supported | Not supported | ✅ Supported |
71+
| Most popular nn.* layers | ✅ Supported | ✅ Supported | ✅ Supported |
72+
| torchscripted models | Not supported | ✅ Supported | Not supported |
73+
| Client-provided grad sampler | ✅ Supported | Not supported | ✅ Not needed |
74+
| `batch_first=False` | ✅ Supported | Not supported | ✅ Supported |
75+
| Recurrent networks | ✅ Supported | Not supported | ✅ Supported |
76+
| Padding `same` in Conv | ✅ Supported | Not supported | ✅ Supported |
77+
78+
† Note, that performance differences are unstable and can vary a lot depending on the exact model and batch size.
79+
Numbers above are averaged over benchmarks with small models consisting of convolutional and linear layers.
80+
Note, that performance differences are only observed on GPU training, CPU performance seem to be almost identical
81+
for all approaches.
82+
83+
†† Layers that produce joint computations on batch samples (e.g. BatchNorm) are not allowed under any approach
84+

0 commit comments

Comments
 (0)