Skip to content

Commit a454f0f

Browse files
authored
Merge branch 'main' into 97_fix_saving
2 parents b5e2dd1 + b77175d commit a454f0f

File tree

82 files changed

+259
-3269
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

82 files changed

+259
-3269
lines changed

.github/workflows/test-check-transformers.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ jobs:
6262
steps:
6363
- uses: actions/setup-python@v5
6464
with:
65-
python-version: '3.10'
65+
python-version: '3.12'
6666
- uses: actions/checkout@v4
6767
with:
6868
fetch-depth: 0
@@ -73,7 +73,7 @@ jobs:
7373
run: uv pip install .[dev]
7474
- uses: actions/checkout@v4
7575
with:
76-
repository: "neuralmagic/compressed-tensors"
76+
repository: "vllm-project/compressed-tensors"
7777
path: "compressed-tensors"
7878
fetch-depth: 0
7979
fetch-tags: true
@@ -93,10 +93,10 @@ jobs:
9393
if: (success() || failure()) && steps.install.outcome == 'success'
9494
run: |
9595
pytest -v tests/llmcompressor/transformers/compression
96-
- name: Run Finetune Tests
96+
- name: Run Data Tests
9797
if: (success() || failure()) && steps.install.outcome == 'success'
9898
run: |
99-
pytest -v tests/llmcompressor/transformers/finetune
99+
pytest -v tests/llmcompressor/transformers/data
100100
- name: Running GPTQ Tests
101101
if: (success() || failure()) && steps.install.outcome == 'success'
102102
run: |

.github/workflows/test-check.yaml

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,14 @@ jobs:
2222
runs-on: ubuntu-22.04
2323
env:
2424
COVERAGE_FILE: ".coverage.base"
25+
strategy:
26+
matrix:
27+
python: ["3.10", "3.13"]
2528
steps:
26-
- uses: actions/setup-python@v5
29+
- name: Set up Python
30+
uses: actions/setup-python@v5
2731
with:
28-
python-version: '3.12'
32+
python-version: ${{ matrix.python }}
2933
- uses: actions/checkout@v4
3034
with:
3135
fetch-depth: 0
@@ -36,7 +40,7 @@ jobs:
3640
run: uv pip install .[dev]
3741
- uses: actions/checkout@v4
3842
with:
39-
repository: "neuralmagic/compressed-tensors"
43+
repository: "vllm-project/compressed-tensors"
4044
path: "compressed-tensors"
4145
fetch-depth: 0
4246
fetch-tags: true
@@ -73,10 +77,14 @@ jobs:
7377
runs-on: ubuntu-22.04
7478
env:
7579
COVERAGE_FILE: ".coverage.pytorch"
80+
strategy:
81+
matrix:
82+
python: ["3.10", "3.13"]
7683
steps:
77-
- uses: actions/setup-python@v5
84+
- name: Set up Python
85+
uses: actions/setup-python@v5
7886
with:
79-
python-version: '3.11'
87+
python-version: ${{ matrix.python }}
8088
- uses: actions/checkout@v4
8189
with:
8290
fetch-depth: 0
@@ -87,7 +95,7 @@ jobs:
8795
run: uv pip install .[dev]
8896
- uses: actions/checkout@v4
8997
with:
90-
repository: "neuralmagic/compressed-tensors"
98+
repository: "vllm-project/compressed-tensors"
9199
path: "compressed-tensors"
92100
fetch-depth: 0
93101
fetch-tags: true

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -805,6 +805,9 @@ timings/
805805
output_finetune/
806806
env_log.json
807807

808+
# LM Eval cache
809+
.lmeval_cache/
810+
808811
# uv artifacts
809812
uv.lock
810813
.venv/

examples/quantization_2of4_sparse_w4a16/2of4_w4a16_group-128_recipe.yaml

Lines changed: 0 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,19 +5,6 @@ sparsity_stage:
55
mask_structure: "2:4"
66
targets: ["Linear"]
77
ignore: ["re:.*lm_head"]
8-
finetuning_stage:
9-
finetuning_modifiers:
10-
ConstantPruningModifier:
11-
targets: [
12-
're:.*q_proj.weight',
13-
're:.*k_proj.weight',
14-
're:.*v_proj.weight',
15-
're:.*o_proj.weight',
16-
're:.*gate_proj.weight',
17-
're:.*up_proj.weight',
18-
're:.*down_proj.weight',
19-
]
20-
start: 0
218
quantization_stage:
229
quantization_modifiers:
2310
GPTQModifier:

examples/quantization_2of4_sparse_w4a16/README.md

Lines changed: 11 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,10 @@
44

55
> `2:4 sparisty + int4/int8` mixed precision computation is supported in vLLM on Nvidia capability > 8.0 (Ampere, Ada Lovelace, Hopper).
66
7-
## NOTE:
8-
Fine tuning can require more steps than is shown in the example.
9-
See the Axolotl integration blog post for best fine tuning practices
7+
## NOTE: The following example no longer includes finetuning as training
8+
Training support has been deprecated as of v0.9.0. To apply finetuning
9+
to your sparse model, see the Axolotl integration blog post for best
10+
fine tuning practices
1011
https://developers.redhat.com/articles/2025/06/17/axolotl-meets-llm-compressor-fast-sparse-open
1112

1213

@@ -78,22 +79,11 @@ output_path = Path(output_dir)
7879
splits = {"calibration": "train_gen[:5%]", "train": "train_gen"}
7980
max_seq_length = 512
8081
num_calibration_samples = 512
81-
82-
# set training parameters for finetuning
83-
# increase num_train_epochs for longer training
84-
num_train_epochs = 0.01
85-
logging_steps = 500
86-
save_steps = 5000
87-
gradient_checkpointing = True # saves memory during training
88-
learning_rate = 0.0001
89-
bf16 = False # using full precision for training
90-
lr_scheduler_type = "cosine"
91-
warmup_ratio = 0.1
9282
preprocessing_num_workers = 8
9383
```
9484

95-
## Step 2: Run `sparsification`, `fine-tuning`, and `quantization`
96-
The compression process now runs in three stages: sparsification, fine-tuning, and quantization.
85+
## Step 2: Run `sparsification` and `quantization`
86+
The compression process now runs in two stages: sparsification and quantization.
9787
Each stage saves the intermediate model outputs to the `output_llama7b_2of4_w4a16_channel` directory.
9888

9989
```python
@@ -106,47 +96,19 @@ output_path = Path(output_dir)
10696
# 1. Oneshot sparsification: apply pruning
10797
oneshot(
10898
model=model,
109-
dataset=dataset,
110-
recipe=recipe,
111-
splits=splits,
112-
num_calibration_samples=num_calibration_samples,
113-
preprocessing_num_workers=preprocessing_num_workers,
99+
**oneshot_kwargs,
114100
output_dir=output_dir,
115101
stage="sparsity_stage",
116102
)
117103

118-
# 2. Sparse fine-tuning: improve accuracy on pruned model
119-
train(
120-
model=output_path / "sparsity_stage",
121-
dataset=dataset,
122-
recipe=recipe,
123-
splits=splits,
124-
num_calibration_samples=num_calibration_samples,
125-
preprocessing_num_workers=preprocessing_num_workers,
126-
bf16=bf16,
127-
max_seq_length=max_seq_length,
128-
num_train_epochs=num_train_epochs,
129-
logging_steps=logging_steps,
130-
save_steps=save_steps,
131-
gradient_checkpointing=gradient_checkpointing,
132-
learning_rate=learning_rate,
133-
lr_scheduler_type=lr_scheduler_type,
134-
warmup_ratio=warmup_ratio,
135-
output_dir=output_dir,
136-
stage="finetuning_stage",
137-
)
138104

139-
# 3. Oneshot quantization: compress model weights to lower precision
105+
# 2. Oneshot quantization: compress model weights to lower precision
140106
quantized_model = oneshot(
141-
model=output_path / "finetuning_stage",
142-
dataset=dataset,
143-
recipe=recipe,
144-
splits=splits,
145-
num_calibration_samples=num_calibration_samples,
146-
preprocessing_num_workers=preprocessing_num_workers,
147-
output_dir=output_dir,
107+
model=(output_path / "sparsity_stage"),
108+
**oneshot_kwargs,
148109
stage="quantization_stage",
149110
)
111+
150112
# skip_sparsity_compression_stats is set to False
151113
# to account for sparsity in the model when compressing
152114
quantized_model.save_pretrained(

examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py

Lines changed: 10 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
1-
# NOTE: Fine tuning can require more steps than is shown in the example
2-
# See the Axolotl integration blog post for best fine tuning practices
1+
# NOTE: The following example no longer includes finetuning as training.
2+
3+
# Training support has been deprecated as of v0.9.0. To apply finetuning
4+
# to your sparse model, see the Axolotl integration blog post for best
5+
# fine tuning practices
36
# https://developers.redhat.com/articles/2025/06/17/axolotl-meets-llm-compressor-fast-sparse-open
47

58
from pathlib import Path
@@ -8,7 +11,7 @@
811
from loguru import logger
912
from transformers import AutoModelForCausalLM, AutoTokenizer
1013

11-
from llmcompressor import oneshot, train
14+
from llmcompressor import oneshot
1215

1316
# load the model in as bfloat16 to save on memory and compute
1417
model_stub = "neuralmagic/Llama-2-7b-ultrachat200k"
@@ -26,22 +29,11 @@
2629
output_path = Path(output_dir)
2730

2831
# set dataset config parameters
29-
splits = {"calibration": "train_gen[:5%]", "train": "train_gen"}
32+
splits = {"calibration": "train_gen[:5%]"}
3033
max_seq_length = 512
31-
num_calibration_samples = 512
32-
33-
# set training parameters for finetuning
34-
num_train_epochs = 0.01
35-
logging_steps = 500
36-
save_steps = 5000
37-
gradient_checkpointing = True # saves memory during training
38-
learning_rate = 0.0001
39-
bf16 = False # using full precision for training
40-
lr_scheduler_type = "cosine"
41-
warmup_ratio = 0.1
34+
num_calibration_samples = 10
4235
preprocessing_num_workers = 64
4336

44-
4537
oneshot_kwargs = dict(
4638
dataset=dataset,
4739
recipe=recipe,
@@ -50,46 +42,20 @@
5042
splits=splits,
5143
)
5244

53-
training_kwargs = dict(
54-
bf16=bf16,
55-
max_seq_length=max_seq_length,
56-
num_train_epochs=num_train_epochs,
57-
logging_steps=logging_steps,
58-
save_steps=save_steps,
59-
gradient_checkpointing=gradient_checkpointing,
60-
learning_rate=learning_rate,
61-
lr_scheduler_type=lr_scheduler_type,
62-
warmup_ratio=warmup_ratio,
63-
)
64-
65-
# This will run the targeted stage of the recipe
66-
# oneshot sparsification -> finetuning -> oneshot quantization
67-
6845
# Models are automatically saved in
69-
# ./output_llama7b_2of4_w4a16_channel/ + (finetuning/sparsity/quantization)_stage
46+
# ./output_llama7b_2of4_w4a16_channel/ + (sparsity/quantization)_stage
7047

7148
# Oneshot sparsification
72-
7349
oneshot(
7450
model=model,
7551
**oneshot_kwargs,
7652
output_dir=output_dir,
7753
stage="sparsity_stage",
7854
)
7955

80-
# Sparse finetune
81-
# This step can be supplanted by fine tuning via integrated FT libraries such as Axolotl
82-
train(
83-
model=(output_path / "sparsity_stage"),
84-
**oneshot_kwargs,
85-
**training_kwargs,
86-
output_dir=output_dir,
87-
stage="finetuning_stage",
88-
)
89-
9056
# Oneshot quantization
9157
quantized_model = oneshot(
92-
model=(output_path / "finetuning_stage"),
58+
model=(output_path / "sparsity_stage"),
9359
**oneshot_kwargs,
9460
stage="quantization_stage",
9561
)

examples/trl_mixin/README.md

Lines changed: 0 additions & 32 deletions
This file was deleted.

examples/trl_mixin/ex_trl_constant.py

Lines changed: 0 additions & 64 deletions
This file was deleted.

0 commit comments

Comments
 (0)