Skip to content

Commit 543de98

Browse files
authored
Fix formatting and typos in modifiers.
1 parent 99e231e commit 543de98

File tree

11 files changed

+179
-159
lines changed

11 files changed

+179
-159
lines changed

src/llmcompressor/modifiers/autoround/base.py

Lines changed: 31 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -62,35 +62,39 @@ class AutoRoundModifier(Modifier, QuantizationMixin):
6262
This modifier leverages signed gradient descent (SignSGD) optimizer and
6363
block-wise loss to optimize rounding values and weight clipping in a few steps.
6464
65-
| Sample yaml:
66-
| test_stage:
67-
| modifiers:
68-
| AutoRoundModifier:
69-
| iters: 200
70-
| config_groups:
71-
| group_0:
72-
| targets:
73-
| - "Linear"
74-
| input_activations: null
75-
| output_activations: null
76-
| weights:
77-
| num_bits: 4
78-
| type: "int"
79-
| symmetric: true
80-
| strategy: group
81-
| group_size: 128
65+
Sample yaml:
66+
67+
```yaml
68+
test_stage:
69+
modifiers:
70+
AutoRoundModifier:
71+
iters: 200
72+
config_groups:
73+
group_0:
74+
targets:
75+
- "Linear"
76+
input_activations: null
77+
output_activations: null
78+
weights:
79+
num_bits: 4
80+
type: "int"
81+
symmetric: true
82+
strategy: group
83+
group_size: 128
84+
```
8285
8386
Lifecycle:
84-
- on_initialize
85-
- apply config to model
86-
- on_start
87-
- add input capture hooks to decoding layers
88-
- on_sequential_epoch_end
89-
- apply_autoround
90-
- post_autoround_cleanup
91-
- on_finalize
92-
- remove_hooks()
93-
- model.apply(freeze_module_quantization)
87+
88+
- on_initialize
89+
- apply config to model
90+
- on_start
91+
- add input capture hooks to decoding layers
92+
- on_sequential_epoch_end
93+
- apply_autoround
94+
- post_autoround_cleanup
95+
- on_finalize
96+
- remove_hooks()
97+
- model.apply(freeze_module_quantization)
9498
9599
:param config_groups: dictionary specifying quantization schemes to apply to target
96100
modules. Modules not matching a scheme target will NOT be quantized.

src/llmcompressor/modifiers/awq/base.py

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,6 @@ class AWQModifier(Modifier, QuantizationMixin):
5858
balance_layers: ["re:.*q_proj", "re:.*k_proj", "re:.*v_proj"]
5959
- smooth_layer: "re:.*final_layer_norm"
6060
balance_layers: ["re:.*fc1"]
61-
]
6261
ignore: ["lm_head"]
6362
config_groups:
6463
group_0:
@@ -75,25 +74,26 @@ class AWQModifier(Modifier, QuantizationMixin):
7574
```
7675
7776
Lifecycle:
78-
- on_initialize
79-
- resolve mappings
80-
- capture kwargs needed for forward passes into modules
81-
- on_start
82-
- set up activation cache hooks to capture input activations
83-
to balance layers
84-
- on sequential epoch end
85-
- apply smoothing to each smoothing layer
86-
- consume cached activations across all batches
87-
- clear cached activations as they are used
88-
- find best smoothing scale for each smoothing layer
89-
- apply to model weights
90-
- raise error if any unused activations remain
91-
- on_end
92-
- re-run logic of sequential epoch end (in case of basic pipeline)
93-
- set scales and zero points
94-
- remove activation hooks
95-
- on_finalize
96-
- clear resolved mappings and captured activations
77+
78+
- on_initialize
79+
- resolve mappings
80+
- capture kwargs needed for forward passes into modules
81+
- on_start
82+
- set up activation cache hooks to capture input activations
83+
to balance layers
84+
- on sequential epoch end
85+
- apply smoothing to each smoothing layer
86+
- consume cached activations across all batches
87+
- clear cached activations as they are used
88+
- find best smoothing scale for each smoothing layer
89+
- apply to model weights
90+
- raise error if any unused activations remain
91+
- on_end
92+
- re-run logic of sequential epoch end (in case of basic pipeline)
93+
- set scales and zero points
94+
- remove activation hooks
95+
- on_finalize
96+
- clear resolved mappings and captured activations
9797
9898
:param sequential_targets: list of module names to compress in
9999
the same calibration pass

src/llmcompressor/modifiers/pruning/sparsegpt/base.py

Lines changed: 21 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -26,24 +26,28 @@ class SparseGPTModifier(SparsityModifierBase):
2626
"""
2727
Modifier for applying the one-shot SparseGPT algorithm to a model
2828
29-
| Sample yaml:
30-
| test_stage:
31-
| obcq_modifiers:
32-
| SparseGPTModifier:
33-
| sparsity: 0.5
34-
| mask_structure: "2:4"
35-
| dampening_frac: 0.001
36-
| block_size: 128
37-
| targets: ['Linear']
38-
| ignore: ['re:.*lm_head']
29+
Sample yaml:
30+
31+
```yaml
32+
test_stage:
33+
obcq_modifiers:
34+
SparseGPTModifier:
35+
sparsity: 0.5
36+
mask_structure: "2:4"
37+
dampening_frac: 0.001
38+
block_size: 128
39+
targets: ['Linear']
40+
ignore: ['re:.*lm_head']
41+
```
3942
4043
Lifecycle:
41-
- on_initialize
42-
- register_hook(module, calibrate_module, "forward")
43-
- on_sequential_batch_end
44-
- sparsify_weight
45-
- on_finalize
46-
- remove_hooks()
44+
45+
- on_initialize
46+
- register_hook(module, calibrate_module, "forward")
47+
- on_sequential_batch_end
48+
- sparsify_weight
49+
- on_finalize
50+
- remove_hooks()
4751
4852
:param sparsity: Sparsity to compress model to
4953
:param sparsity_profile: Can be set to 'owl' to use Outlier Weighed
@@ -92,7 +96,7 @@ def calibrate_module(
9296
9397
:param module: module being calibrated
9498
:param args: inputs to the module, the first element of which is the
95-
cannonical input
99+
canonical input
96100
:param _output: uncompressed module output, unused
97101
"""
98102
# Assume that the first argument is the input

src/llmcompressor/modifiers/pruning/wanda/base.py

Lines changed: 20 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -26,23 +26,27 @@ class WandaPruningModifier(SparsityModifierBase):
2626
Modifier for applying the one-shot WANDA algorithm to a model
2727
from the paper: https://arxiv.org/abs/2306.11695
2828
29-
| Sample yaml:
30-
| test_stage:
31-
| sparsity_modifiers:
32-
| WandaPruningModifier:
33-
| sparsity: 0.5
34-
| mask_structure: "2:4"
29+
Sample yaml:
30+
31+
```yaml
32+
test_stage:
33+
sparsity_modifiers:
34+
WandaPruningModifier:
35+
sparsity: 0.5
36+
mask_structure: "2:4"
37+
```
3538
3639
Lifecycle:
37-
- on_initialize
38-
- register_hook(module, calibrate_module, "forward")
39-
- run_sequential / run_basic
40-
- make_empty_row_scalars
41-
- accumulate_row_scalars
42-
- on_sequential_batch_end
43-
- sparsify_weight
44-
- on_finalize
45-
- remove_hooks()
40+
41+
- on_initialize
42+
- register_hook(module, calibrate_module, "forward")
43+
- run_sequential / run_basic
44+
- make_empty_row_scalars
45+
- accumulate_row_scalars
46+
- on_sequential_batch_end
47+
- sparsify_weight
48+
- on_finalize
49+
- remove_hooks()
4650
4751
:param sparsity: Sparsity to compress model to
4852
:param sparsity_profile: Can be set to 'owl' to use Outlier Weighed
@@ -78,7 +82,7 @@ def calibrate_module(
7882
7983
:param module: module being calibrated
8084
:param args: inputs to the module, the first element of which is the
81-
cannonical input
85+
canonical input
8286
:param _output: uncompressed module output, unused
8387
"""
8488
# Assume that the first argument is the input

src/llmcompressor/modifiers/quantization/gptq/base.py

Lines changed: 38 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -36,40 +36,44 @@ class GPTQModifier(Modifier, QuantizationMixin):
3636
"""
3737
Implements the GPTQ algorithm from https://arxiv.org/abs/2210.17323. This modifier
3838
uses activations to calibrate a hessian matrix, which is then used to determine
39-
optimal quantizion values and orderings for the model weights.
40-
41-
| Sample yaml:
42-
| test_stage:
43-
| obcq_modifiers:
44-
| GPTQModifier:
45-
| block_size: 128
46-
| dampening_frac: 0.001
47-
| offload_hessians: False
48-
| actorder: static
49-
| config_groups:
50-
| group_0:
51-
| targets:
52-
| - "Linear"
53-
| input_activations: null
54-
| output_activations: null
55-
| weights:
56-
| num_bits: 8
57-
| type: "int"
58-
| symmetric: true
59-
| strategy: group
60-
| group_size: 128
39+
optimal quantization values and orderings for the model weights.
40+
41+
Sample yaml:
42+
43+
```yaml
44+
test_stage:
45+
obcq_modifiers:
46+
GPTQModifier:
47+
block_size: 128
48+
dampening_frac: 0.001
49+
offload_hessians: False
50+
actorder: static
51+
config_groups:
52+
group_0:
53+
targets:
54+
- "Linear"
55+
input_activations: null
56+
output_activations: null
57+
weights:
58+
num_bits: 8
59+
type: "int"
60+
symmetric: true
61+
strategy: group
62+
group_size: 128
63+
```
6164
6265
Lifecycle:
63-
- on_initialize
64-
- apply config to model
65-
- on_start
66-
- add activation calibration hooks
67-
- add gptq weight calibration hooks
68-
- on_sequential_epoch_end
69-
- quantize_weight
70-
- on_finalize
71-
- remove_hooks()
72-
- model.apply(freeze_module_quantization)
66+
67+
- on_initialize
68+
- apply config to model
69+
- on_start
70+
- add activation calibration hooks
71+
- add gptq weight calibration hooks
72+
- on_sequential_epoch_end
73+
- quantize_weight
74+
- on_finalize
75+
- remove_hooks()
76+
- model.apply(freeze_module_quantization)
7377
7478
:param sequential_targets: list of layer names to compress during GPTQ, or
7579
'__ALL__' to compress every layer in the model
@@ -99,7 +103,7 @@ class GPTQModifier(Modifier, QuantizationMixin):
99103
the kv_cache_scheme gets converted into a QuantizationScheme that:
100104
- targets the `q_proj` and `k_proj` modules of the model. The outputs
101105
of those modules are the keys and values that might be cached
102-
- quantizes the outputs of the aformentioned layers, so that
106+
- quantizes the outputs of the aforementioned layers, so that
103107
keys and values are compressed before storing them in the cache
104108
There is an explicit assumption that the model contains modules with
105109
`k_proj` and `v_proj` in their names. If this is not the case
@@ -220,7 +224,7 @@ def calibrate_module(
220224
221225
:param module: module being calibrated
222226
:param args: inputs to the module, the first element of which is the
223-
cannonical input
227+
canonical input
224228
:param _output: uncompressed module output, unused
225229
"""
226230
# Assume that first argument is the input

src/llmcompressor/modifiers/quantization/gptq/gptq_quantize.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -286,7 +286,7 @@ def _apply_activation_ordering(
286286
W: torch.Tensor, H: torch.Tensor
287287
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
288288
"""
289-
Permute weight and hessian in order of greatest outupt activations
289+
Permute weight and hessian in order of greatest output activations
290290
291291
:param W: weight to permute
292292
:param H: hessian used to determine activation ordering

src/llmcompressor/modifiers/quantization/quantization/base.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ class QuantizationModifier(Modifier, QuantizationMixin):
3737
the kv_cache_scheme gets converted into a QuantizationScheme that:
3838
- targets the `q_proj` and `k_proj` modules of the model. The outputs
3939
of those modules are the keys and values that might be cached
40-
- quantizes the outputs of the aformentioned layers, so that
40+
- quantizes the outputs of the aforementioned layers, so that
4141
keys and values are compressed before storing them in the cache
4242
There is an explicit assumption that the model contains modules with
4343
`k_proj` and `v_proj` in their names. If this is not the case

src/llmcompressor/modifiers/quantization/quantization/mixin.py

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -43,26 +43,28 @@
4343

4444
class QuantizationMixin(HooksMixin):
4545
"""
46-
Mixin which enables a Modifier to act as a quantization config, attching observers,
46+
Mixin which enables a Modifier to act as a quantization config, attaching observers,
4747
calibration hooks, and compression wrappers to modifiers
4848
4949
Lifecycle:
50-
- on_initialize: QuantizationMixin.initialize_quantization
51-
- Attach schemes to modules
52-
- Attach observers to modules
53-
- Disable quantization until calibration starts/finishes
54-
- on_start: QuantizationMixin.start_calibration
55-
- Attach calibration hooks
56-
- Apply calibration status
57-
- Enable quantization during calibration
58-
- on_end: QuantizationMixin.end_calibration
59-
- Remove calibration hooks
60-
- Apply freeze status
61-
- Keep quantization enabled for future steps
62-
NOTE: QuantizationMixin does not update scales and zero-points on its own,
63-
as this is not desired for all Modifiers inheriting from it. Modifier must
64-
explicitly call `update_weight_zp_scale`.
65-
See QuantizationModifier.on_start method for example
50+
51+
- on_initialize: QuantizationMixin.initialize_quantization
52+
- Attach schemes to modules
53+
- Attach observers to modules
54+
- Disable quantization until calibration starts/finishes
55+
- on_start: QuantizationMixin.start_calibration
56+
- Attach calibration hooks
57+
- Apply calibration status
58+
- Enable quantization during calibration
59+
- on_end: QuantizationMixin.end_calibration
60+
- Remove calibration hooks
61+
- Apply freeze status
62+
- Keep quantization enabled for future steps
63+
64+
NOTE: QuantizationMixin does not update scales and zero-points on its own,
65+
as this is not desired for all Modifiers inheriting from it. Modifier must
66+
explicitly call `update_weight_zp_scale`.
67+
See QuantizationModifier.on_start method for example
6668
6769
:param config_groups: dictionary specifying quantization schemes to apply to target
6870
modules. Modules not matching a scheme target will NOT be quantized.
@@ -85,7 +87,7 @@ class QuantizationMixin(HooksMixin):
8587
the kv_cache_scheme gets converted into a QuantizationScheme that:
8688
- targets the `q_proj` and `k_proj` modules of the model. The outputs
8789
of those modules are the keys and values that might be cached
88-
- quantizes the outputs of the aformentioned layers, so that
90+
- quantizes the outputs of the aforementioned layers, so that
8991
keys and values are compressed before storing them in the cache
9092
There is an explicit assumption that the model contains modules with
9193
`k_proj` and `v_proj` in their names. If this is not the case

0 commit comments

Comments
 (0)