Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
6c038f9
Add modelopt/torch/_compress CODEOWNERS
kevalmorabia97 Oct 27, 2025
230cee1
Merge branch 'main' into feature/compress
kevalmorabia97 Oct 27, 2025
54c5f0f
Remove llm_ptq example tests from CICD
kevalmorabia97 Oct 27, 2025
9eeee25
E2E test for the experimental compress algorithm based on https://arx…
danielkorzekwa Oct 28, 2025
ad1d18e
Merge branch 'main' into feature/compress
kevalmorabia97 Oct 28, 2025
cef3655
Add convert_llama3_config_to_decilm_config + unit test (#465)
danielkorzekwa Oct 29, 2025
002b8b5
Implement nas.convert() api for the compress algorithm (#482)
danielkorzekwa Oct 31, 2025
1c12fd8
modelopt nas search() implementation for the compress algorithm (#490)
danielkorzekwa Nov 3, 2025
f7d547f
Add decilm modelling code (#505)
danielkorzekwa Nov 12, 2025
50a580c
Compress tutorial (PoC) (#492)
danielkorzekwa Nov 12, 2025
b121945
Add llama converter (no dependency on internal Nvidia code) - part 1/…
danielkorzekwa Nov 13, 2025
866e400
llama converter is self-contained now (no dependency on internal nvid…
danielkorzekwa Nov 14, 2025
0868f1c
Add integration test for attention pruning (#562)
danielkorzekwa Nov 14, 2025
69726cc
Merge branch 'main' into feature/compress
kevalmorabia97 Nov 15, 2025
07ca24d
Merge branch 'main' into feature/compress
kevalmorabia97 Nov 15, 2025
1dde209
Add score_pruning_activations (step 2/6) (#563)
danielkorzekwa Nov 18, 2025
2e559e7
Update README.md
kevalmorabia97 Nov 18, 2025
f10be0d
Add activation hooks used for pruning (#576)
danielkorzekwa Nov 20, 2025
194b532
Add sewing kit and utilities used for pruning scoring - pruning scori…
danielkorzekwa Nov 24, 2025
8c9cdd4
Add L2NormHook and use it in megatron.py (#599)
danielkorzekwa Nov 26, 2025
1f72466
Add pruning checkpoints for the compress algorithm (#607)
danielkorzekwa Nov 27, 2025
97fe7f0
Add build replacement library to the compress algorithm. (#616)
danielkorzekwa Dec 1, 2025
954103e
Add subblock stats to the compress algorithm (#623)
danielkorzekwa Dec 1, 2025
dcc425f
Add 1-block scoring to the compress algorithm (#625)
danielkorzekwa Dec 2, 2025
56d95de
Add checkpoint save/load to ForwardHook + add IterativeChannelContrib…
danielkorzekwa Dec 2, 2025
74aae83
Add MIP step to the compress algorithm (#627)
danielkorzekwa Dec 4, 2025
a1f63bc
Merge branch 'main' into feature/compress
kevalmorabia97 Dec 8, 2025
a99f503
Remove unused mip functions + fix multi-gpu test (#660)
kevalmorabia97 Dec 8, 2025
67489f4
Fix a bug in IterativeChannelContributionHook + tools for activation …
danielkorzekwa Dec 11, 2025
1d8bd20
Remove runtime.py and directly use torch dist utils + remove unused f…
kevalmorabia97 Dec 11, 2025
f7a0cb0
Use shared activation hooks component in the puzzle algorithm (#687)
danielkorzekwa Dec 17, 2025
db866d9
Clean up Puzzle Compress Tutorial (#711)
LianaMikael Dec 22, 2025
2e813bf
Two bug fixes: mix checkpointing and dtype (#718)
danielkorzekwa Dec 22, 2025
83ac3b1
Merge remote-tracking branch 'origin/main' into feature/compress
kevalmorabia97 Jan 13, 2026
0eecfc6
Fix test assertions for 2-gpu (#772)
kevalmorabia97 Jan 13, 2026
43b3cfa
Rename compress to puzzletron (#776)
kevalmorabia97 Jan 14, 2026
4c30bd5
Add NeMo Conversion Scripts to Puzzletron (#784)
LianaMikael Jan 15, 2026
96bb0ba
Merge branch 'main' into feature/compress
kevalmorabia97 Mar 3, 2026
8c84fee
[CI] Update to only run puzzletron tests
kevalmorabia97 Mar 3, 2026
5812777
Merge branch 'main' into feature/puzzletron
kevalmorabia97 Mar 3, 2026
5f77c81
Pin torchprofile==0.0.4 to fix CI
kevalmorabia97 Mar 10, 2026
82df595
Add anymodel-core to feature/puzzletron (#974)
danielkorzekwa Mar 11, 2026
4dc9932
Draft: anymodel activation scoring (#989)
danielkorzekwa Mar 12, 2026
d358eb3
Draft: Merge anymodel pruning (#990)
danielkorzekwa Mar 12, 2026
8e827f3
Draft: Merging anymodel:build_library_and_stats (#993)
danielkorzekwa Mar 12, 2026
eb4b210
Draft: merge any model calc one block scores (#994)
danielkorzekwa Mar 12, 2026
8fe318d
Draft: merge any_model: mip_and_realize_models (#995)
danielkorzekwa Mar 13, 2026
2fbdf0e
Update uv.lock for nspect puzzletron scanning
kevalmorabia97 Mar 13, 2026
1b42f0b
Dkorzekwa/any model other models (#1007)
danielkorzekwa Mar 17, 2026
67999eb
Dkorzekwa/anymodel gptoss (#1020)
danielkorzekwa Mar 17, 2026
660dc17
Merge any_model tutorial (#1035)
danielkorzekwa Mar 19, 2026
01cba6a
Merge mbridge distillation for any_model (#1036)
danielkorzekwa Mar 20, 2026
2b6572c
MR branch for the remaining difference between dkorzekwa/any_model an…
danielkorzekwa Mar 20, 2026
110316a
Dkorzekwa/decilm hf code cleanup (#1071)
danielkorzekwa Mar 23, 2026
4190275
Dkorzekwa/decilm hf code cleanup 2 (#1073)
danielkorzekwa Mar 23, 2026
0708ca2
Dkorzekwa/anymodel subblock stats (#1085)
danielkorzekwa Mar 24, 2026
3193f30
Dkorzekwa/anymodel subblock stats nodecilm (#1102)
danielkorzekwa Mar 24, 2026
928036e
Dkorzekwa/decilm cleanup post subblockstats (#1103)
danielkorzekwa Mar 24, 2026
e508b76
code clean up (#1110)
danielkorzekwa Mar 24, 2026
f460d16
Merge branch 'main' into feature/puzzletron
kevalmorabia97 Mar 25, 2026
2f55c73
Dkorzekwa/puzzletron use importance hooks from prune (#1115)
danielkorzekwa Mar 25, 2026
c5ec50b
Merge remote-tracking branch 'origin/main' into feature/puzzletron
kevalmorabia97 Mar 25, 2026
d257871
Merge branch 'main' into feature/puzzletron
kevalmorabia97 Mar 30, 2026
7e15fdd
Revert CICD and other config changes
kevalmorabia97 Mar 30, 2026
d0209dc
Make Qwen and QwenVL descriptor generic so can be used for other vari…
kevalmorabia97 Mar 25, 2026
d987bad
Set strict=True in distill_hf export
kevalmorabia97 Mar 30, 2026
75651cc
add basic ruff fixes
kevalmorabia97 Mar 25, 2026
03118ce
Apply coderabbit suggestions
kevalmorabia97 Mar 30, 2026
2a170b9
Set weights_only=True in checkpoint_utils.py
kevalmorabia97 Mar 30, 2026
d6f8ddb
More fixes
kevalmorabia97 Mar 30, 2026
4621b65
reuse puzzletron tokenizer in other tests
kevalmorabia97 Mar 30, 2026
be4bd3a
disable puzzletron in coverage check as its covered in gpu tests only
kevalmorabia97 Mar 30, 2026
45426ca
Remove custom DistillationProvider and simplify mbridge distillation …
kevalmorabia97 Apr 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ modelopt/torch/nas @NVIDIA/modelopt-torch-nas-prune-codeowners
modelopt/torch/opt @NVIDIA/modelopt-torch-opt-codeowners
modelopt/torch/peft @NVIDIA/modelopt-torch-peft-codeowners
modelopt/torch/prune @NVIDIA/modelopt-torch-nas-prune-codeowners
modelopt/torch/puzzletron @NVIDIA/modelopt-torch-puzzletron-codeowners
modelopt/torch/quantization @NVIDIA/modelopt-torch-quantization-codeowners
modelopt/torch/sparsity @NVIDIA/modelopt-torch-sparsity-codeowners
modelopt/torch/speculative @NVIDIA/modelopt-torch-speculative-codeowners
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/example_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,14 +125,14 @@ jobs:
strategy: &nemo_strategy
fail-fast: false
matrix:
example: [megatron_bridge]
example: [megatron_bridge, puzzletron]
uses: ./.github/workflows/_example_tests_runner.yml
secrets: inherit
with:
docker_image: "nvcr.io/nvidia/nemo:26.02"
example: ${{ matrix.example }}
timeout_minutes: 30
pip_install_extras: "[hf,dev-test]"
pip_install_extras: "[hf,puzzletron,dev-test]"
runner: linux-amd64-gpu-rtxpro6000-latest-1

nemo-non-pr:
Expand All @@ -144,7 +144,7 @@ jobs:
docker_image: "nvcr.io/nvidia/nemo:26.02"
example: ${{ matrix.example }}
timeout_minutes: 30
pip_install_extras: "[hf,dev-test]"
pip_install_extras: "[hf,puzzletron,dev-test]"
runner: linux-amd64-gpu-rtxpro6000-latest-2

##### ONNX/TensorRT Example Tests #####
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/gpu_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,8 @@ jobs:
- name: Setup environment variables
run: |
echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu" >> $GITHUB_ENV
- name: Install dependencies for mip
run: apt-get update && apt-get install -y libffi-dev
- name: Run gpu tests
run: pip install tox-current-env && tox -e cuda13-${{ matrix.example }} --current-env
gpu-tests-non-pr:
Expand Down
2 changes: 2 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ repos:
modelopt/onnx/quantization/ort_patching.py|
modelopt/torch/_deploy/utils/onnx_utils.py|
modelopt/torch/export/transformer_engine.py|
modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_pruned_to_mxfp4.py|
modelopt/torch/quantization/export_onnx.py|
modelopt/torch/quantization/plugins/attention.py|
modelopt/torch/speculative/eagle/utils.py|
Expand All @@ -95,6 +96,7 @@ repos:
examples/llm_eval/modeling.py|
examples/llm_qat/main.py|
examples/llm_sparsity/weight_sparsity/finetune.py|
examples/puzzletron/evaluation/lm_eval_anymodel.py|
examples/specdec_bench/specdec_bench/models/specbench_medusa.py|
examples/speculative_decoding/main.py|
examples/speculative_decoding/medusa_utils.py|
Expand Down
1 change: 1 addition & 0 deletions examples/pruning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Pruning can involve removal (prune) of Linear and Conv layers; and Transformer a
This section focuses on applying Model Optimizer's state-of-the-art complementary pruning modes to enable you to search for the best subnet architecture from your provided base model:

1. [Minitron](https://arxiv.org/pdf/2408.11796): A pruning method developed by NVIDIA Research for pruning GPT (and later extended to Mamba, MoE, and Hybrid Transformer Mamba) models in NVIDIA Megatron-LM (M-LM) or Megatron-Bridge (M-Bridge) framework. It uses the activation magnitudes to prune the embedding hidden size; mlp ffn hidden size; transformer attention heads; mamba heads and head dimension; MoE number of experts, ffn hidden size, and shared expert intermediate size; and number of layers of the model.
1. [Puzzletron](../puzzletron/README.md): An advanced pruning method by NVIDIA using Mixed Integer Programming (MIP) based NAS search algorithm.
1. FastNAS: A pruning method recommended for Computer Vision models. Given a pretrained model, FastNAS finds the subnet which maximizes the score function while meeting the given constraints.
1. GradNAS: A light-weight pruning method recommended for language models like Hugging Face BERT, GPT-J. It uses the gradient information to prune the model's linear layers and attention heads to meet the given constraints.

Expand Down
14 changes: 14 additions & 0 deletions examples/puzzletron/GPTOSS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@

## GptOss

With this release Puzzle algorithm supports only experts removal for `Gpt-Oss`.

This model comes as a quantized checkpoint i.e. MoE experts matrices are quantized with _MXFP4_ format.
In the pruning steps puzzle utilizes decompressed model (back to BF16) for statistics and scores computation.
This means, during the conversion to puzzle format we decompress the model and store it as a BF16.
Once the pruning is done i.e. experts to be removed are identified and the process is finished, user may want to get back the _MXFP4_ format of the checkpoint.
To do so, there is an additional script, that takes the original and the pruned checkpoint and outputs pruned checkpoint in _MXFP4_ format.

```bash
python -m modelopt.torch.puzzletron.anymodel.models.gpt_oss.gpt_oss_pruned_to_mxfp4 --student-path /workspaces/any_model_gpt_oss/mip/puzzle_solutions/stats_num_params_18014757184/solutions--checkpoints/solution_0/ --original-path /workspaces/source_model_checkpoints/openai_gpt-oss-20b/ --output-path /workspaces/any_model_gpt_oss/mip/puzzle_solutions/stats_num_params_18014757184/solutions--checkpoints/mxfp4-ckpt/ --num-layers 24
```
Loading
Loading