Skip to content

Merge puzzletron compression algorithm#1121

Open
danielkorzekwa wants to merge 73 commits intomainfrom
feature/puzzletron
Open

Merge puzzletron compression algorithm#1121
danielkorzekwa wants to merge 73 commits intomainfrom
feature/puzzletron

Conversation

@danielkorzekwa
Copy link
Copy Markdown
Contributor

@danielkorzekwa danielkorzekwa commented Mar 25, 2026

What does this PR do?

Implement puzzletron compression algorithm based on Puzzle paper (https://arxiv.org/abs/2411.19146)

Th list of reviewed and merged MRs that resulted in the feature/puzzletron branch:

Merging dkorzekwa/any_model to feature/puzzletron

Add anymodel directories to feature/puzzletron by danielkorzekwa · Pull Request #974 · NVIDIA/Model-Optimizer - merged

Draft: anymodel activation scoring by danielkorzekwa · Pull Request #989 · NVIDIA/Model-Optimizer - merged

Draft: Merge anymodel pruning by danielkorzekwa · Pull Request #990 · NVIDIA/Model-Optimizer - merged

Draft: Merging anymodel:build_library_and_stats by danielkorzekwa · Pull Request #993 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/any model calc one block scores by danielkorzekwa · Pull Request #994 · NVIDIA/Model-Optimizer - merged

Draft: merge any_model: mip_and_realize_models by danielkorzekwa · Pull Request #995 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/any model other modeqls by danielkorztiekwa · Pull Request #1007 · NVIDIA/Model-Optimizer - merged

PR to 1007: #1039 - merged

Dkorzekwa/anymodel gptoss by danielkorzekwa · Pull Request #1020 · NVIDIA/Model-Optimizer - merged

Merge any_model tutorial by danielkorzekwa · Pull Request #1035 · NVIDIA/Model-Optimizer - merged

Merge mbridge distillation for any_model by danielkorzekwa · Pull Request #1036 · NVIDIA/Model-Optimizer - merged

MR branch for the remaining difference between dkorzekwa/any_model an… by danielkorzekwa · Pull Request #1047 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/decilm hf code cleanup by danielkorzekwa · Pull Request #1071 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/decilm hf code cleanup 2 by danielkorzekwa · Pull Request #1073 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/anymodel subblock stats by danielkorzekwa · Pull Request #1085 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/anymodel subblock stats nodecilm by danielkorzekwa · Pull Request #1102 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/decilm cleanup post subblockstats by danielkorzekwa · Pull Request #1103 · NVIDIA/Model-Optimizer - merged

code clean up by danielkorzekwa · Pull Request #1110 · NVIDIA/Model-Optimizer - merged

Merging into main:

Activation hooks redesign (reuse hooks component across both minitron and puzzletron) by danielkorzekwa · Pull Request #1022 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/puzzletron use importance hooks from prune by danielkorzekwa · Pull Request #1115 · NVIDIA/Model-Optimizer - merged

Usage

Puzzletron tutorial:
https://github.com/NVIDIA/Model-Optimizer/tree/feature/puzzletron/examples/puzzletron

Testing

The main e2e test for compressing 9 models with Puzzletron:
https://github.com/NVIDIA/Model-Optimizer/blob/feature/puzzletron/tests/gpu/torch/puzzletron/test_puzzletron.py

Before your PR is "Ready for review"

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: ❌

Summary by CodeRabbit

  • New Features
    • Added Puzzletron: an end-to-end heterogeneous LLM compression workflow with MIP-based NAS, model conversion, pruning, realization, and evaluation.
    • Expanded example support for Llama, Mistral, Qwen, NeMo, GPT-OSS and multimodal variants; lm-eval and vLLM evaluation paths; Megatron-Bridge export and distillation tooling.
  • Documentation
    • New guides, READMEs, and example configs covering full Puzzletron pipelines, evaluation, and distillation.
  • Chores
    • CI/workflow updates to run Puzzletron examples; CODEOWNERS and pre-commit exclusions updated; new example requirements added.

kevalmorabia97 and others added 30 commits October 27, 2025 17:42
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…iv.org/abs/2411.19146 (#464)

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do?

Add decilm modelling code

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?

Compress tutorial (PoC) + compress cli app.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Signed-off-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Co-authored-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com>
#545)

## What does this PR do?

Add llama converter (no dependency on internal Nvidia code) - part 1/2
- change top-level dependencies in convert_llama3_to_decilm.py from
puzzle_tools.... to modelopt.....
- added modelopt.torch._compress.tools module
- remove tokenization_mistral.py - not used

scope of 2/2 part (will come once part 1/2 is merged):
- change all deeper dependencies from from puzzle_tools.... to
modelopt....
- test_convert_llama3_config_to_decilm_config.py should run without any
internal nvidia dependencies

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ia code) (#552)

## What does this PR do?
llama converter is self-contained now (no dependency on internal nvidia
code)

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?
Make score pruning activations self-contained (no dependency on internal
Nvidia code) - 1/6
- add integration test for attention pruning (pre-step to make adding
activation scoring safer)

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?
- Add score_pruning_activations.py

Notes:
- validate_model.py still depends on Nvidia internal code (will be
changed in the subsequent MR)
- sharded_checkpoint_utils.py - for now it needs to use DeciLM from
internal Nvidia code, to be changed in the next MR

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do?

Add activation hooks used for pruning

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ng is self-contained now (#584)

## What does this PR do?

Add sewing kit and utilities used for pruning scoring - pruning scoring
is self-contained now - no dependency on internal Nvidia code.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do?
- Add L2NormHook and use it in megatron.py
- Using L2NormHook removes code duplication between
_DynamicSelfAttention and _DynamicMLP

This is the first step towards reusing activation scores logic across
Minitron and Puzzle. Next steps:
- complete redesign of megatron.py - move other activation hooks logic
to hooks.py
- then combined those hooks.py with a similar hooks.py functoriality in
puzzle
(modelopt/torch/_compress/activation_scoring/activation_hooks/hooks.py)

Questions:
- why in the code before and after this redesign we store temp variables
in two ways _register_temp_attribute and self.hook_handle)?

```
self._register_temp_attribute("_activation_hook", activation_hook)
# TODO: confusion: why hook_handle is removed manually in export() and not using _register_temp_attribute?
self.hook_handle = self.linear_fc2.register_forward_hook(activation_hook)
```

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do?

Add pruning checkpoints for the compress algorithm.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?

Add build replacement library to the compress algorithm.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?

Add subblock stats to the compress algorithm.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?

Add 1-block scoring to the compress algorithm.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…utionHook (#610)

## What does this PR do?

Add checkpoint save/load to ForwardHook + add
IterativeChannelContributionHook.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?

Add MIP step to the compress algorithm.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do?

**Type of change:** Improvement <!-- Use one of the following: Bug fix,
new feature, new example, new tests, documentation. -->

- Fix tests for 2-gpu: Some places hard-coded cpu device for distributed
communications which was causing this issue
- Remove unused constrain_search_space.py
- Remove `is_multi_layer_puzzle: False` case
- Remove `use_greedy_search: False` case
- Remove knapsack mip case
- Remove unused `num_solutions` and `minimal_diversity` flags

## Testing
<!-- Mention how have you tested your change if applicable. -->

- GH CICD test passing
- Tested on 2-gpu setup locally as well


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

# Release Notes

* **Refactor**
  * Optimized solver implementation with improved library integration.
* Simplified model compression configuration by removing deprecated
search options.
  * Consolidated optimization paths for streamlined processing.

* **Chores**
  * Updated dependencies for improved compatibility.

* **Documentation**
  * Clarified Model-Optimizer installation instructions in examples.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…hooks analysis (#670)

## What does this PR do?

Fix a bug in IterativeChannelContributionHook + tools for activation
hooks analysis

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…unctions (#667)

## What does this PR do?

- Remove `runtime.py` and directly use
`modelopt.torch.utils.distributed` functions
- Read model_dtype and autocast_dtype from validate_model_defaults.yaml
instead of runtime object
- Remove more unused functions
- Remove unnecessary parse_args for intermediate steps, and improve
docstrings

## Testing
<!-- Mention how have you tested your change if applicable. -->

- No change in functionality
- Existing tests passing

---------

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new Puzzletron feature: an AnyModel-based heterogeneous LLM compression/NAS pipeline with converters, model descriptors, pruning hooks/mixins, activation scoring, MIP optimization, replacement libraries, pruning checkpoint tooling, evaluation/deployment adapters, and many example configs/docs/workflows.

Changes

Cohort / File(s) Summary
Repo config & CI
\.github/CODEOWNERS, .github/workflows/example_tests.yml, .github/workflows/gpu_tests.yml, .pre-commit-config.yaml
Added CODEOWNERS for modelopt/torch/puzzletron; extended NeMo test matrix to include puzzletron extras; installed libffi-dev in GPU CI job; updated pre-commit insert-license exclusions.
Examples & docs
examples/puzzletron/*, examples/pruning/README.md, examples/puzzletron/requirements.txt
New Puzzletron README, detailed example docs (GPTOSS, distillation, evaluation), example scripts (main, lm_eval_anymodel, build/distill instructions), and requirements including lm-eval.
Entrypoints / orchestration
examples/puzzletron/main.py, modelopt/torch/puzzletron/puzzletron.py, .../build_library_and_stats.py, .../mip/*, .../sweep.py
New CLI/Hydra entrypoints coordinating conversion, activation scoring, replacement-library build, MIP search/sweep, and model realization with distributed sync and mip-only mode.
AnyModel framework
modelopt/torch/puzzletron/anymodel/**
Introduces AnyModel conversion framework: Converter base, ConverterFactory, ModelDescriptor base/factory, puzzformer helpers, no-op replacements, and many model-specific converters/descriptors (llama, qwen2/3, qwen3_vl, mistral_small, nemotron_h/_v2, gpt_oss).
Converters & export utilities
anymodel/converter/*, export/mbridge/*, export_mbridge_to_hf.py
Converter implementations, converter registry, Megatron-Bridge heterogeneous adapters, distillation provider, and export-to-HF helper.
Pruning infrastructure
modelopt/torch/prune/importance_hooks/*, modelopt/torch/puzzletron/pruning/*
New importance hooks (expert removal, ranked-choice voting), pruning mixins (FFN, KV-heads, expert removal), pruning utils, checkpoint generation, and bypassed-training init helpers.
Activation scoring & hooks
activation_scoring/*, importance_hooks/base_hooks.py
Activation hook registration utilities, scoring launch flow, checkpointing/complete-detection, and safer hook kwargs handling.
Replacement library & loading
replacement_library/*
Replacement-library builder, replacement parsing/utilities, ReplacementLibrary loader, and temporary checkpoint assembly for realized models.
Subblock stats & sizing
subblock_stats/*
Per-subblock memory/parameter calculators (including MoE, Mamba, KV-cache), stat generation CLI, and multi-scenario aggregation.
Checkpoint & sharded I/O
tools/checkpoint_utils*.py, tools/sharded_checkpoint_utils.py, tools/save/load
Saveload helpers for safetensors/sharded formats, meta-device initialization, tokenizer copy, and sharded model construction/loading.
Validation, evaluation & deployment
tools/validate_model.py, examples/puzzletron/evaluation/*
Validation pipeline integrating hooks, HF/Triton/Ray deployable wrapper, lm-eval integration, and NeMo evaluator instructions.
Sewing kit / runtime graph utils
sewing_kit/*, sewing_kit/passage/*
New stitched-module graph system (Needle/StitchedModule), Passage capture system, activity contexts, fake-tensor and distributed helpers for patching/capture.
Tools & utilities
tools/* (logger, hydra_utils, robust_json, kd_model, common, etc.)
Hydra resolvers/init helpers, distributed-aware logger, robust JSON encoder, KD-loss helpers, dataset prep, and other utilities.
Large new modules (high-risk / heavy logic)
tools/bypassed_training/*, replacement_library/build_replacement_library.py, mip/run_puzzle.py, subblock_stats/calc_subblock_stats.py, tools/sharded_checkpoint_utils.py, tools/Passage/core.py, sewing_kit/core.py
Substantial new logic implementing multi-threaded child init, replacement-library assembly, MIP orchestration/metrics ingestion, subblock stats generation, sharded checkpointing, passage instrumentation, and sewn-module execution — review focus areas for correctness, concurrency, serialization, and distributed behavior.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI as Puzzletron CLI
    participant Hydra as Hydra Config
    participant Converter
    participant Scorer as Activation Scoring
    participant MIP as MIP Solver
    participant Realize as Model Realizer/Checkpoint Writer

    User->>CLI: run --config <hydra_cfg> [--mip-only]
    CLI->>Hydra: load config (puzzle_dir, dataset_path, overrides)
    CLI->>Converter: convert HF -> AnyModel subblocks (master only)
    CLI->>Scorer: launch_score_activations (distributed)
    Scorer-->>CLI: activations logs / checkpoints
    CLI->>MIP: run_puzzle (uses activations, subblock stats, replacement_library)
    MIP-->>Realize: chosen solutions
    Realize-->>CLI: realized model checkpoints
    CLI->>User: finished (paths to realized models)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • Merge any_model tutorial #1035: Overlapping Puzzletron feature additions — similar example files, configs, and AnyModel pipeline changes; likely code-level overlap.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/puzzletron

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 25, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1121/

Built to branch gh-pages at 2026-04-01 12:20 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟡 Minor comments (15)
modelopt/torch/puzzletron/tools/logger.py-164-168 (1)

164-168: ⚠️ Potential issue | 🟡 Minor

Docstring says "rank -1 in node 0" but implementation may not match.

The lprint docstring says "Last rank prints only (rank -1 in node 0)" but the implementation checks local_rank != world_size - 1. If the intent is the last rank globally, this logic appears incorrect. If the intent is the last local rank within a node, the comparison should be against local world size, not global world size.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/logger.py` around lines 164 - 168, The
docstring for lprint mismatches its rank-check logic; update either the
docstring or the implementation in lprint to match the intended behavior: if you
want "last rank in node 0" (last local rank) change the check to compare
local_rank against local_world_size - 1 (use the local world size variable used
elsewhere in logger or from torch/distributed) before calling logger.dist_log,
or if you intended the global last rank adjust the docstring to "last global
rank" and ensure the check compares local_rank against world_size - 1
accordingly; reference the lprint function and logger.dist_log to locate where
to change the comparison and/or docstring.
modelopt/torch/puzzletron/tools/logger.py-62-66 (1)

62-66: ⚠️ Potential issue | 🟡 Minor

Error message is inconsistent with valid choices.

The error message on line 65 lists ['all', 'main', 'local_main'] but 'last' is also a valid choice per line 62.

🐛 Proposed fix
         if ranks not in ["all", "main", "local_main", "last"]:
             raise NotImplementedError(
                 f"Could not broadcast msg {msg} - "
-                f"ranks parameters choices are ['all', 'main', 'local_main']. Got {ranks}"
+                f"ranks parameters choices are ['all', 'main', 'local_main', 'last']. Got {ranks}"
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/logger.py` around lines 62 - 66, The
NotImplementedError message for the ranks check mentions the wrong set of valid
choices; update the f-string in the ranks validation (the raise
NotImplementedError block that references the variable ranks and msg) to include
'last' in the displayed list (i.e., ["all", "main", "local_main", "last"]) so
the error accurately reflects the valid options.
examples/pruning/README.md-10-10 (1)

10-10: ⚠️ Potential issue | 🟡 Minor

Fix compound adjective hyphenation in docs.

At Line 10, use “MIP-based” (not “MIP based”) for correct technical phrasing.

✏️ Proposed edit
-1. [Puzzletron](../puzzletron/README.md): An advanced pruning method by NVIDIA using Mixed Integer Programming (MIP) based NAS search algorithm.
+1. [Puzzletron](../puzzletron/README.md): An advanced pruning method by NVIDIA using a Mixed Integer Programming (MIP)-based NAS search algorithm.

As per coding guidelines, "**/*.{py,cpp,h,hpp,md}: All code (Python, C++, Markdown, etc.) must adhere to coding standards as checked by pre-commit hooks".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/pruning/README.md` at line 10, Update the compound adjective "MIP
based" in the Puzzletron README entry to the hyphenated form "MIP-based" so the
phrase reads "MIP-based NAS search algorithm"; locate the text string
"Puzzletron" / the line containing "MIP based NAS search algorithm" and replace
it with "MIP-based" to comply with the Markdown/style pre-commit checks.
examples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b_remove_experts_memory.yaml-14-17 (1)

14-17: ⚠️ Potential issue | 🟡 Minor

Fix the mismatch between target_memory and its comment.

16_000 MiB is about 15.6 GiB; 45 GiB would be 46_080 MiB. One of these is wrong, and the current ambiguity can make the example optimize for the wrong budget.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b_remove_experts_memory.yaml`
around lines 14 - 17, The mip.human_constraints.target_memory value
(target_memory: 16_000) and its comment (“45 GiB”) disagree; update one to match
the other: either set target_memory to 46_080 (MiB) if you mean 45 GiB, or
change the comment to match 16_000 MiB (~15.6 GiB). Locate the target_memory key
under mip -> human_constraints and make the numeric value and the GiB comment
consistent.
examples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2_pruneffn_memory.yaml-14-17 (1)

14-17: ⚠️ Potential issue | 🟡 Minor

Align the MiB value with the GiB comment.

90_000 MiB is about 87.9 GiB, not 90 GiB. If the intended cap is 90 GiB, this should be 92_160; otherwise update the comment so the search target is unambiguous.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2_pruneffn_memory.yaml`
around lines 14 - 17, The target_memory under mip.human_constraints is
inconsistent with the "90 GiB" comment; update the value of target_memory to
92_160 (MiB) to represent 90 GiB, or alternatively change the comment to the
actual GiB represented by 90_000 MiB (~87.9 GiB); edit the target_memory key in
this YAML (mip -> human_constraints -> target_memory) accordingly to keep value
and comment aligned.
examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/llama-3_2-3B_pruneffn_memory.yaml-17-17 (1)

17-17: ⚠️ Potential issue | 🟡 Minor

Memory unit comment is inaccurate.

Line 17 uses 45_000 MiB, which is ~43.95 GiB, not 45 GiB. Please update either the value or the comment to avoid misconfigured constraints.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/llama-3_2-3B_pruneffn_memory.yaml`
at line 17, The comment next to the target_memory value is incorrect:
target_memory: 45_000 is in MiB (~43.95 GiB), not 45 GiB; either set
target_memory to 46_080 (45 * 1024) to represent 45 GiB exactly or update the
inline comment to the correct GiB approximation (≈43.95 GiB); locate the
target_memory entry and apply one of these fixes so the value and its comment
match.
modelopt/torch/puzzletron/anymodel/models/llama/llama_converter.py-36-52 (1)

36-52: ⚠️ Potential issue | 🟡 Minor

Return type annotation mismatch: method declares List[BlockConfig] but returns List[dict].

The method signature declares a return type of List[BlockConfig], but line 50 calls .to_dict() on each BlockConfig instance, causing the method to actually return List[dict]. Update the return type annotation to match the actual behavior.

🔧 Suggested fix
     `@staticmethod`
-    def create_block_configs_from_main_config(config: LlamaConfig) -> List[BlockConfig]:
+    def create_block_configs_from_main_config(config: LlamaConfig) -> List[dict]:
         """Create uniform block configs for all Llama layers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/anymodel/models/llama/llama_converter.py` around
lines 36 - 52, The function create_block_configs_from_main_config currently
builds BlockConfig objects but calls .to_dict(), so its declared return type
List[BlockConfig] is incorrect; update the signature to return List[dict] (or
List[MutableMapping] if you prefer an abstract type) to match the actual
returned value, and ensure the typing import (List) is present/adjusted
accordingly; reference create_block_configs_from_main_config, LlamaConfig,
BlockConfig, and the .to_dict() call when making the change.
modelopt/torch/puzzletron/anymodel/converter/converter_factory.py-43-47 (1)

43-47: ⚠️ Potential issue | 🟡 Minor

Handle potential None from inspect.getmodule().

inspect.getmodule(ref) can return None if the module cannot be determined, which would cause an AttributeError when accessing .__name__.

🛡️ Proposed fix
-                raise KeyError(
-                    f"Could not register `{cls_name}`: {cls_type}, "
-                    f"`{cls_name}` is already registered and points to "
-                    f"`{inspect.getmodule(ref).__name__}.{ref.__name__}`"
-                )
+                module = inspect.getmodule(ref)
+                module_name = module.__name__ if module else "<unknown>"
+                raise KeyError(
+                    f"Could not register `{cls_name}`: {cls_type}, "
+                    f"`{cls_name}` is already registered and points to "
+                    f"`{module_name}.{ref.__name__}`"
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/anymodel/converter/converter_factory.py` around
lines 43 - 47, The KeyError message in converter_factory.py assumes
inspect.getmodule(ref) always returns a module; update the error construction in
the registration failure branch (where cls_name, cls_type and ref are used) to
guard against None from inspect.getmodule(ref) by using a safe fallback (e.g.,
module = inspect.getmodule(ref); module_name = module.__name__ if module is not
None else getattr(ref, "__module__", "unknown")) and use module_name rather than
inspect.getmodule(ref).__name__; ensure ref.__name__ (or getattr(ref,
"__name__", repr(ref))) is similarly accessed safely so the f-string does not
raise AttributeError.
modelopt/torch/puzzletron/puzzletron.py-60-74 (1)

60-74: ⚠️ Potential issue | 🟡 Minor

Step numbering is inconsistent.

Comments jump from "Step 2" to "Step 4", skipping "Step 3". This appears to be a typo or leftover from a refactor. Consider renumbering for clarity:

📝 Proposed fix
     # Step 2: pruning_ckpts (single process)
     if dist.is_master():
         pruning_ckpts.launch_prune_ckpt(hydra_cfg)
     dist.barrier()

-    # Step 4: build_library_and_stats (single process)
+    # Step 3: build_library_and_stats (single process)
     if dist.is_master():
         build_library_and_stats.launch_build_library_and_stats(hydra_cfg)
     dist.barrier()

-    # Step 5: calc_one_block_scores (distributed processing)
+    # Step 4: calc_one_block_scores (distributed processing)
     scoring.launch_scoring(hydra_cfg)

-    # Step 6: mip_and_realize_models (distributed processing)
+    # Step 5: mip_and_realize_models (distributed processing)
     mip_and_realize_models.launch_mip_and_realize_model(hydra_cfg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/puzzletron.py` around lines 60 - 74, The step
comments are misnumbered (jump from "Step 2" to "Step 4"); update the inline
step comments in puzzletron.py to use a consistent sequence (e.g., Step 2 → Step
3 → Step 4 etc.) around the blocks that call pruning_ckpts.launch_prune_ckpt,
build_library_and_stats.launch_build_library_and_stats, scoring.launch_scoring,
and mip_and_realize_models.launch_mip_and_realize_model so the step numbers
reflect the actual order of execution.
examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/pruning_defaults.yaml-20-31 (1)

20-31: ⚠️ Potential issue | 🟡 Minor

Remove unused ffn_list definition or align with intermediate_size_list usage.

The file serves as a defaults template for Hydra configuration, but there's an inconsistency: ffn_list is defined on line 21 with no value, yet the actual FFN pruning configuration (ffn_pruning.yaml) populates and uses intermediate_size_list instead. Either rename ffn_list to intermediate_size_list for consistency, or remove the unused ffn_list definition.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/pruning_defaults.yaml`
around lines 20 - 31, The defaults file defines an unused key ffn_list while the
FFN pruning logic uses intermediate_size_list (see ffn_pruning.yaml); fix by
either removing the ffn_list entry entirely or renaming it to
intermediate_size_list and ensuring the init mode key (mlp_init_mode) matches
the expected name used by ffn_pruning.yaml so the Hydra defaults align with the
pruning config.
modelopt/torch/puzzletron/tools/hydra_utils.py-29-36 (1)

29-36: ⚠️ Potential issue | 🟡 Minor

Validate block and mbs before division in warmup_steps.

block=0 or mbs=0 causes a runtime failure. Add explicit positive-value checks and fail early with a clear error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/hydra_utils.py` around lines 29 - 36, The
warmup_steps function currently divides by block and mbs which will raise on
zero; update warmup_steps to validate that block and mbs are positive integers
(e.g., block > 0 and mbs > 0) at the top of the function and raise a clear
ValueError if not; keep the existing logic (compute steps =
(tokens//block)//mbs, compute w = pct * steps, return max(1, round(w))) but fail
fast with an explanatory error mentioning the parameter names block and mbs so
callers get a clear message when invalid values are passed.
modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py-46-47 (1)

46-47: ⚠️ Potential issue | 🟡 Minor

Avoid mutating caller-owned activation_hooks_kwargs.

This function modifies the input dict in place (["model"] = model), which can leak state across calls. Copy it first before enrichment.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py`
around lines 46 - 47, The code mutates the caller-owned dict
activation_hooks_kwargs by setting activation_hooks_kwargs["model"] = model;
instead, make a shallow copy of activation_hooks_kwargs (e.g., new_kwargs =
activation_hooks_kwargs.copy() or dict(activation_hooks_kwargs)), add the
"model" key to the copy, and use/return that copy so the original dict isn’t
mutated; update any return/usage to use the new_kwargs variable.
examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml-13-13 (1)

13-13: ⚠️ Potential issue | 🟡 Minor

Minor typo in comment.

ppath should be path.

Proposed fix
-dataset_path: ??? # ppath to Nemotron-Post-Training-Dataset-v2
+dataset_path: ??? # path to Nemotron-Post-Training-Dataset-v2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml`
at line 13, Fix the typo in the YAML comment for the dataset key: change the
comment after the dataset_path entry so it reads "path to
Nemotron-Post-Training-Dataset-v2" instead of "ppath to
Nemotron-Post-Training-Dataset-v2" so the comment for the dataset_path key is
correct and clear.
modelopt/torch/puzzletron/tools/validate_model.py-138-140 (1)

138-140: ⚠️ Potential issue | 🟡 Minor

Potential issue: validation_full_iters could be zero.

If args.eval_samples is less than args.micro_batch_size, the integer division results in validation_full_iters = 0. This could cause unexpected behavior in downstream code that expects at least one iteration.

🛡️ Proposed fix: Add validation or use ceiling division
     validation_full_iters = (
-        args.eval_samples // args.micro_batch_size
+        max(1, args.eval_samples // args.micro_batch_size)
     )  # model pipeline, single data rank

Or add an assertion:

     validation_full_iters = (
         args.eval_samples // args.micro_batch_size
     )  # model pipeline, single data rank
+    assert validation_full_iters > 0, f"eval_samples ({args.eval_samples}) must be >= micro_batch_size ({args.micro_batch_size})"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/validate_model.py` around lines 138 - 140,
The calculation of validation_full_iters can yield zero when args.eval_samples <
args.micro_batch_size; update the logic around validation_full_iters in
validate_model.py to ensure at least one validation iteration by either
validating inputs (raise/exit if args.eval_samples < args.micro_batch_size) or
computing a ceiling division and then taking max(1, ...); reference the variable
validation_full_iters and the CLI args args.eval_samples and
args.micro_batch_size when making the change so downstream loops that expect >=1
iteration do not break.
modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py-316-321 (1)

316-321: ⚠️ Potential issue | 🟡 Minor

Actually emit the missing-MoE-stats warning.

Warning(...) only instantiates an object; it does not notify the caller. Right now a missing moe_stats.json silently changes active_params from MoE-aware counts to total params. Use warnings.warn(...) or mprint(...) here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` around lines
316 - 321, The code creates a pathlib object moe_stats_file and on missing file
only instantiates Warning(...) which does nothing; update the block around
moe_stats_file (using variables moe_stats_file, moe_stats_filename,
master_puzzle_dir and the downstream active_params logic) to actually emit a
warning by calling warnings.warn(...) (or mprint(...)) with a clear message like
f"MOE stats file {moe_stats_file} does not exist, can't calculate num active
params", and add `import warnings` at the top if using warnings.warn; keep
setting moe_stats_file = None so the fallback to total params remains.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 31341e54-9727-48f6-b865-606bb7607464

📥 Commits

Reviewing files that changed from the base of the PR and between 291498b and c5ec50b.

⛔ Files ignored due to path filters (2)
  • examples/puzzletron/mip_sweep_example.png is excluded by !**/*.png
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (212)
  • .github/CODEOWNERS
  • .github/workflows/example_tests.yml
  • .github/workflows/gpu_tests.yml
  • .pre-commit-config.yaml
  • examples/pruning/README.md
  • examples/puzzletron/GPTOSS.md
  • examples/puzzletron/README.md
  • examples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b.yaml
  • examples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b_remove_experts_memory.yaml
  • examples/puzzletron/configs/gptoss-20b_remove_experts_memory/pruning/ffn_pruning.yaml
  • examples/puzzletron/configs/gptoss-20b_remove_experts_memory/pruning/pruning_defaults.yaml
  • examples/puzzletron/configs/gptoss-20b_remove_experts_memory/validate_model_defaults.yaml
  • examples/puzzletron/configs/gptoss-20b_remove_experts_memory/validate_solutions_defaults.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/llama-3_1-8B_pruneffn_memory.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/attn_pruning.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/ffn_pruning.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/hidden_dim_pruning.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/pruning_defaults.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/validate_model_defaults.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/validate_solutions_defaults.yaml
  • examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/Llama-3_2-3B.yaml
  • examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/llama-3_2-3B_pruneffn_memory.yaml
  • examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/pruning/ffn_pruning.yaml
  • examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/pruning/pruning_defaults.yaml
  • examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/validate_model_defaults.yaml
  • examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/validate_solutions_defaults.yaml
  • examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/Mistral-Small-24B.yaml
  • examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/mistral-small-24b-instruct-2501_pruneffn_memory.yaml
  • examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/attn_pruning.yaml
  • examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/ffn_pruning.yaml
  • examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/hidden_dim_pruning.yaml
  • examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/pruning_defaults.yaml
  • examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/validate_model_defaults.yaml
  • examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/validate_solutions_defaults.yaml
  • examples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2.yaml
  • examples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2_pruneffn_memory.yaml
  • examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/attn_pruning.yaml
  • examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/ffn_pruning.yaml
  • examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/hidden_dim_pruning.yaml
  • examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/pruning_defaults.yaml
  • examples/puzzletron/configs/nemotron-nano-12b-v2/validate_model_defaults.yaml
  • examples/puzzletron/configs/nemotron-nano-12b-v2/validate_solutions_defaults.yaml
  • examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/attn_pruning.yaml
  • examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/ffn_pruning.yaml
  • examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/hidden_dim_pruning.yaml
  • examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/pruning_defaults.yaml
  • examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/qwen2_5_7b_instruct.yaml
  • examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/qwen2_5_7b_instruct_pruneffn_memory.yaml
  • examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/validate_model_defaults.yaml
  • examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/validate_solutions_defaults.yaml
  • examples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/attn_pruning.yaml
  • examples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/ffn_pruning.yaml
  • examples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/hidden_dim_pruning.yaml
  • examples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/pruning_defaults.yaml
  • examples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b.yaml
  • examples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b_pruneffn_memory.yaml
  • examples/puzzletron/configs/qwen3-8b_pruneffn_memory/validate_model_defaults.yaml
  • examples/puzzletron/configs/qwen3-8b_pruneffn_memory/validate_solutions_defaults.yaml
  • examples/puzzletron/evaluation/hf_deployable_anymodel.py
  • examples/puzzletron/evaluation/lm_eval_anymodel.py
  • examples/puzzletron/evaluation/nemo_evaluator_instructions.md
  • examples/puzzletron/main.py
  • examples/puzzletron/mbridge_distillation/README.md
  • examples/puzzletron/mbridge_distillation/distill_hf.py
  • examples/puzzletron/requirements.txt
  • modelopt/torch/prune/importance_hooks/__init__.py
  • modelopt/torch/prune/importance_hooks/base_hooks.py
  • modelopt/torch/prune/importance_hooks/expert_removal_hooks.py
  • modelopt/torch/puzzletron/README.md
  • modelopt/torch/puzzletron/__init__.py
  • modelopt/torch/puzzletron/activation_scoring/activation_hooks/__init__.py
  • modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py
  • modelopt/torch/puzzletron/activation_scoring/score_pruning_activations.py
  • modelopt/torch/puzzletron/anymodel/README.md
  • modelopt/torch/puzzletron/anymodel/__init__.py
  • modelopt/torch/puzzletron/anymodel/converter/__init__.py
  • modelopt/torch/puzzletron/anymodel/converter/convert_any_model.py
  • modelopt/torch/puzzletron/anymodel/converter/converter.py
  • modelopt/torch/puzzletron/anymodel/converter/converter_factory.py
  • modelopt/torch/puzzletron/anymodel/model_descriptor/__init__.py
  • modelopt/torch/puzzletron/anymodel/model_descriptor/model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/model_descriptor/model_descriptor_factory.py
  • modelopt/torch/puzzletron/anymodel/models/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_converter.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_pruned_to_mxfp4.py
  • modelopt/torch/puzzletron/anymodel/models/llama/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/llama/llama_converter.py
  • modelopt/torch/puzzletron/anymodel/models/llama/llama_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/mistral_small/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/mistral_small/mistral_small_converter.py
  • modelopt/torch/puzzletron/anymodel/models/mistral_small/mistral_small_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_converter.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_converter.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_converter.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_8b/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_8b/qwen3_8b_converter.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_8b/qwen3_8b_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl_30b_a3b_instruct/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl_30b_a3b_instruct/qwen3_vl_30b_a3b_instruct_converter.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl_30b_a3b_instruct/qwen3_vl_30b_a3b_instruct_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/puzzformer/__init__.py
  • modelopt/torch/puzzletron/anymodel/puzzformer/no_op.py
  • modelopt/torch/puzzletron/anymodel/puzzformer/utils.py
  • modelopt/torch/puzzletron/build_library_and_stats.py
  • modelopt/torch/puzzletron/dataset/__init__.py
  • modelopt/torch/puzzletron/dataset/prepare_dataset.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/__init__.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py
  • modelopt/torch/puzzletron/export/mbridge/__init__.py
  • modelopt/torch/puzzletron/export/mbridge/base.py
  • modelopt/torch/puzzletron/export/mbridge/distillation_provider.py
  • modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py
  • modelopt/torch/puzzletron/export/mbridge/llama.py
  • modelopt/torch/puzzletron/export/mbridge/qwen3.py
  • modelopt/torch/puzzletron/mip/mip_and_realize_models.py
  • modelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.py
  • modelopt/torch/puzzletron/mip/run_puzzle.py
  • modelopt/torch/puzzletron/mip/sweep.py
  • modelopt/torch/puzzletron/mip/utils.py
  • modelopt/torch/puzzletron/nas/plugins/puzzletron_nas_plugin.py
  • modelopt/torch/puzzletron/pruning/expert_removal_pruning_mixin.py
  • modelopt/torch/puzzletron/pruning/ffn_intermediate_pruning_mixin.py
  • modelopt/torch/puzzletron/pruning/kv_heads_pruning_mixin.py
  • modelopt/torch/puzzletron/pruning/pruning_ckpts.py
  • modelopt/torch/puzzletron/pruning/pruning_mixin.py
  • modelopt/torch/puzzletron/pruning/pruning_utils.py
  • modelopt/torch/puzzletron/puzzletron.py
  • modelopt/torch/puzzletron/replacement_library/build_replacement_library.py
  • modelopt/torch/puzzletron/replacement_library/replacement_library.py
  • modelopt/torch/puzzletron/replacement_library/replacement_utils.py
  • modelopt/torch/puzzletron/scoring/scoring.py
  • modelopt/torch/puzzletron/sewing_kit/__init__.py
  • modelopt/torch/puzzletron/sewing_kit/core.py
  • modelopt/torch/puzzletron/sewing_kit/passage/__init__.py
  • modelopt/torch/puzzletron/sewing_kit/passage/core.py
  • modelopt/torch/puzzletron/sewing_kit/utils.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
  • modelopt/torch/puzzletron/tools/__init__.py
  • modelopt/torch/puzzletron/tools/bypassed_training/child_init.py
  • modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
  • modelopt/torch/puzzletron/tools/checkpoint_utils.py
  • modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py
  • modelopt/torch/puzzletron/tools/common.py
  • modelopt/torch/puzzletron/tools/hydra_utils.py
  • modelopt/torch/puzzletron/tools/kd_model.py
  • modelopt/torch/puzzletron/tools/logger.py
  • modelopt/torch/puzzletron/tools/post_init_sparse.py
  • modelopt/torch/puzzletron/tools/robust_json.py
  • modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py
  • modelopt/torch/puzzletron/tools/validate_model.py
  • modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py
  • modelopt/torch/puzzletron/tools/validation_utils.py
  • modelopt/torch/puzzletron/utils/checkpoint_manager.py
  • modelopt/torch/puzzletron/utils/data/dataloaders.py
  • modelopt/torch/puzzletron/utils/data/dataset.py
  • modelopt/torch/puzzletron/utils/dummy_modules.py
  • modelopt/torch/puzzletron/utils/parsing.py
  • modelopt/torch/puzzletron/utils/utils.py
  • modelopt/torch/puzzletron/utils/validate_runtime_pipeline.py
  • modelopt/torch/puzzletron/utils/validation.py
  • modelopt/torch/utils/robust_json.py
  • pyproject.toml
  • tests/_test_utils/torch/puzzletron/utils.py
  • tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
  • tests/gpu/torch/conftest.py
  • tests/gpu/torch/puzzletron/conftest.py
  • tests/gpu/torch/puzzletron/nas/plugins/test_nas_convert.py
  • tests/gpu/torch/puzzletron/nas/plugins/test_nas_search.py
  • tests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen2.5-7B-Instruct/Qwen2.5-7B-Instruct.yaml
  • tests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen2.5-7B-Instruct/pruning/ffn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-8B/Qwen3-8B.yaml
  • tests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-8B/pruning/ffn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-VL-30B-A3B-Instruct/Qwen3-VL-30B-A3B-Instruct.yaml
  • tests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-VL-30B-A3B-Instruct/pruning/expert_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.1-8B-Instruct/Llama-3.1-8B-Instruct-attn-pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.1-8B-Instruct/Llama-3.1-8B-Instruct.yaml
  • tests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.1-8B-Instruct/pruning/attn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.1-8B-Instruct/pruning/ffn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.2-3B-Instruct/Llama-3.2-3B-Instruct.yaml
  • tests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.2-3B-Instruct/pruning/ffn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/mistralai/Mistral-Small-24B-Instruct-2501/Mistral-Small-24B-Instruct-2501.yaml
  • tests/gpu/torch/puzzletron/resources/configs/mistralai/Mistral-Small-24B-Instruct-2501/pruning/ffn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16.yaml
  • tests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16/pruning/expert_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16/pruning/ffn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-Nano-12B-v2/NVIDIA-Nemotron-Nano-12B-v2.yaml
  • tests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-Nano-12B-v2/pruning/ffn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/openai/gpt-oss-20b/gpt-oss-20b.yaml
  • tests/gpu/torch/puzzletron/resources/configs/openai/gpt-oss-20b/pruning/expert_removal.yaml
  • tests/gpu/torch/puzzletron/resources/configs/pruning/attn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/pruning/ffn_pruning_base.yaml
  • tests/gpu/torch/puzzletron/resources/configs/pruning/hidden_dim_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/pruning/pruning_defaults.yaml
  • tests/gpu/torch/puzzletron/resources/configs/validate_model_defaults.yaml
  • tests/gpu/torch/puzzletron/resources/configs/validate_solutions_defaults.yaml
  • tests/gpu/torch/puzzletron/resources/tokenizer/special_tokens_map.json
  • tests/gpu/torch/puzzletron/resources/tokenizer/tokenizer.json
  • tests/gpu/torch/puzzletron/resources/tokenizer/tokenizer_config.json
  • tests/gpu/torch/puzzletron/resources/tokenizer/truncate_tokenizer.py
  • tests/gpu/torch/puzzletron/test_puzzletron.py
  • tox.ini

Comment on lines +624 to +677
process_group = node.target.process_group
peers = node.target.peer_rank
if not isinstance(peers, Sequence):
peers = [peers]

if len(values_to_node) > 0:
items_to_send = list(self.create_output_overrides(values_to_node).items())

data_descriptors: list[RemoteDataDescriptor] = []
tensors_to_send: list[torch.Tensor] = []

for key, value in items_to_send:
if isinstance(value, torch.Tensor):
if value.is_cuda:
tensor_device = "cuda"
elif value.is_cpu:
tensor_device = "cpu"
else:
raise RuntimeError(
f"Invalid tensor device to send to remote target: {value.device}"
)

data_descriptor = RemoteTensorDataDescriptor(
key=key,
device=tensor_device,
dtype=value.dtype,
shape=value.shape,
)
tensors_to_send.append(value)

else:
data_descriptor = RemotePythonDataDescriptor(
key=key,
value=value,
)

data_descriptors.append(data_descriptor)

works: list[Optional[torch.distributed.Work]] = []
for peer in peers:
if process_group is not None:
peer = torch.distributed.get_global_rank(process_group, peer)

peer_works = distributed_isend_obj(data_descriptors, dst=peer)
works.extend(peer_works)

for tensor in tensors_to_send:
work = torch.distributed.isend(tensor, dst=peer)
works.append(work)

if node.target.blocking:
for work in works:
if work is not None:
work.wait()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, explore the file structure and locate the target file
find . -name "core.py" -path "*puzzletron/sewing_kit*" | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 167


🏁 Script executed:

# Check the file size and read the relevant sections
wc -l modelopt/torch/puzzletron/sewing_kit/core.py

Repository: NVIDIA/Model-Optimizer

Length of output: 114


🏁 Script executed:

# Read the specific lines mentioned in the review (624-677 and 687-718)
sed -n '620,730p' modelopt/torch/puzzletron/sewing_kit/core.py | cat -n

Repository: NVIDIA/Model-Optimizer

Length of output: 6346


🏁 Script executed:

# Search for all uses of distributed_isend_obj, distributed_recv_obj in the file
rg "distributed_(isend_obj|recv_obj|isend|irecv)" modelopt/torch/puzzletron/sewing_kit/core.py -B2 -A2

Repository: NVIDIA/Model-Optimizer

Length of output: 698


🏁 Script executed:

# Check the function signatures - find where these are imported from
rg "from.*distributed_isend_obj|import.*distributed_isend_obj" modelopt/torch/puzzletron/sewing_kit/ -B2 -A2

Repository: NVIDIA/Model-Optimizer

Length of output: 445


🏁 Script executed:

# Check if there's a utils module or similar that defines distributed_isend_obj
find modelopt/ -name "*.py" -type f -exec rg -l "def distributed_isend_obj|def distributed_recv_obj" {} \;

Repository: NVIDIA/Model-Optimizer

Length of output: 111


🏁 Script executed:

# Check the function signatures in utils.py
sed -n '1,200p' modelopt/torch/puzzletron/sewing_kit/utils.py | grep -A 15 "def distributed_isend_obj\|def distributed_recv_obj"

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Get full definitions of both functions
rg "def distributed_(isend_obj|recv_obj)" modelopt/torch/puzzletron/sewing_kit/utils.py -A 20

Repository: NVIDIA/Model-Optimizer

Length of output: 1582


🏁 Script executed:

# Check PyTorch distributed documentation - search for torch.distributed.isend signature
rg "torch.distributed.isend|torch.distributed.irecv" modelopt/torch/puzzletron/sewing_kit/ -B2 -A8

Repository: NVIDIA/Model-Optimizer

Length of output: 3244


🏁 Script executed:

# Search for any examples in the codebase that use group parameter with these functions
rg "distributed_isend_obj.*group|torch.distributed.isend.*group|torch.distributed.irecv.*group" modelopt/

Repository: NVIDIA/Model-Optimizer

Length of output: 652


🏁 Script executed:

# Search for any other potential P2P call sites in the file that might have similar issues
rg "distributed.isend|distributed.irecv|distributed.send|distributed.recv|distributed_isend|distributed_recv" modelopt/torch/puzzletron/sewing_kit/core.py -n

Repository: NVIDIA/Model-Optimizer

Length of output: 483


Pass group=process_group to all distributed P2P calls.

The code extracts process_group and converts local peer ranks to global ranks when process_group is not None (lines 646 and 688 via torch.distributed.get_global_rank()), but the actual distributed_isend_obj(), isend(), distributed_recv_obj(), and irecv() calls at lines 667, 671, 690, and 713 do not receive the group parameter. This causes all communication to use the default process group instead of the specified subgroup, resulting in hangs or communication with the wrong peers.

🛠️ Suggested fix
-                            peer_works = distributed_isend_obj(data_descriptors, dst=peer)
+                            peer_works = distributed_isend_obj(
+                                data_descriptors,
+                                dst=peer,
+                                group=process_group,
+                            )
@@
-                                work = torch.distributed.isend(tensor, dst=peer)
+                                work = torch.distributed.isend(
+                                    tensor,
+                                    dst=peer,
+                                    group=process_group,
+                                )
@@
-                        data_descriptors = distributed_recv_obj(src=peer)
+                        data_descriptors = distributed_recv_obj(
+                            src=peer,
+                            group=process_group,
+                        )
@@
-                            work = torch.distributed.irecv(tensor, src=peer)
+                            work = torch.distributed.irecv(
+                                tensor,
+                                src=peer,
+                                group=process_group,
+                            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/sewing_kit/core.py` around lines 624 - 677, The P2P
calls are missing the process group argument so they use the default group;
update all distributed send/recv calls to pass group=process_group where
available: add group=process_group to calls to distributed_isend_obj(...) and
distributed_recv_obj(...), and to torch.distributed.isend(...) and
torch.distributed.irecv(...); keep the existing logic that maps local->global
ranks via torch.distributed.get_global_rank(process_group, peer) and only pass
group when process_group is not None so subgroup communication uses the correct
group.

Comment on lines +405 to +416
def _load_subblock_configs_from_subblock_library(master_puzzle_dir: Path) -> list[SubblockConfig]:
subblocks_df = pd.read_json(master_puzzle_dir / "subblock_library.json")
subblocks_df["attention_config"] = subblocks_df["attention_config"].apply(
partial(_dataclass_from_dict, cls=AttentionConfig)
)
subblocks_df["ffn_config"] = subblocks_df["ffn_config"].apply(
partial(_dataclass_from_dict, cls=FFNConfig)
)
attention_configs = subblocks_df["attention_config"].dropna().drop_duplicates().tolist()
ffn_configs = subblocks_df["ffn_config"].dropna().drop_duplicates().tolist()
subblock_configs = attention_configs + ffn_configs
return subblock_configs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Return the same indexed shape from the fallback loader.

This fallback returns bare AttentionConfig/FFNConfig instances, but calculate_subblock_stats() later indexes each item as ["subblock_config"] and ["parent_layer_indices"]. When replacement_library.json is absent, this path fails immediately instead of producing stats.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` around lines
405 - 416, The fallback loader _load_subblock_configs_from_subblock_library
currently returns raw AttentionConfig/FFNConfig objects but callers
(calculate_subblock_stats) expect list entries with keys/fields
"subblock_config" and "parent_layer_indices" (i.e. SubblockConfig-shaped items);
change the function to construct and return SubblockConfig instances (or dicts
matching SubblockConfig) for each unique attention/ffn config, populating
subblock_config with the AttentionConfig/FFNConfig and setting
parent_layer_indices to a sensible default (e.g. empty list) or the original
parent indices if present in the JSON, so the returned list matches the same
indexed shape used by calculate_subblock_stats. Ensure the function name
_load_subblock_configs_from_subblock_library and the
SubblockConfig/AttentionConfig/FFNConfig symbols are used to locate and
instantiate the correct dataclass.

Comment on lines +425 to +443
# Prepare arguments for parallel processing
process_layer_partial = partial(
_process_single_layer,
pruning_mixin=pruning_mixin,
descriptor=descriptor,
parent_state_dict=original_state_dict,
new_state_dict=new_state_dict,
original_config=original_config,
new_config=new_config,
gqa_init_mode=gqa_init_mode,
mlp_init_mode=mlp_init_mode,
mlp_init_config=mlp_init_config,
linear_init_mode=linear_init_mode,
ignored_keys=ignored_keys,
keys=keys,
is_original_mha=is_original_mha,
head_size=head_size,
hidden_size=hidden_size,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't share a mutable keys map across worker threads.

create_child_state_dict() passes the same keys dict into every worker, and _init_mlp() mutates it with keys.pop(...). That introduces a real race in the fallback path: workers can remove each other's entries, yielding nondeterministic KeyErrors or copying the wrong leftovers into the final state dict.

Also applies to: 452-455, 694-696

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/bypassed_training/child_init.py` around lines
425 - 443, The issue is that the same mutable keys dict is shared across worker
tasks causing races when _init_mlp mutates it; update each partial so it
receives a worker-local copy of keys (e.g., pass keys.copy() or dict(keys))
instead of the original dict when creating process_layer_partial (and the other
partials at the other sites noted), and ensure
create_child_state_dict/_process_single_layer/_init_mlp keep using their
existing names (create_child_state_dict, _process_single_layer, _init_mlp) but
operate on the copied keys to avoid cross-thread pops and nondeterministic
KeyError or wrong-state copying.

Comment on lines +335 to +360
def _write_file_process_safe(
content: Any,
path: Path | str,
write_fn: Callable[[Any, BinaryIO], None] = _write_text,
) -> None:
"""
Write a file in a multi-process safe way.
If another process tries to write the same file using this method, the current process
"gives up" and assumes that the matter is being taken care of by another process.

write_fn is a function that receives file contents and a binary file object,
and writes the content to the file. It can be _write_text (defined above), or torch.save,
or a similar function (not safetensors.torch.save_file since it expects a path).
"""
with open(path, "wb") as f:
# Try to acquire an exclusive, non-blocking lock
try:
fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
except BlockingIOError:
return # Exit immediately if the lock is not acquired

write_fn(content, f) # Write the content if lock is acquired
f.flush() # Ensure data is written to disk

# Release the lock
fcntl.flock(f, fcntl.LOCK_UN)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Acquire the file lock before truncating the index file.

open(path, "wb") truncates the file before flock() runs. If another process already owns the lock, this contender still zeroes the file and then returns, so the writer you intended to protect can lose or corrupt the index.

🛠️ Proposed fix
-    with open(path, "wb") as f:
+    with open(path, "ab+") as f:
         # Try to acquire an exclusive, non-blocking lock
         try:
             fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
         except BlockingIOError:
             return  # Exit immediately if the lock is not acquired
-
+        f.seek(0)
+        f.truncate()
         write_fn(content, f)  # Write the content if lock is acquired
         f.flush()  # Ensure data is written to disk
+        os.fsync(f.fileno())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py` around lines 335 -
360, The current _write_file_process_safe opens the file with "wb" which
truncates before acquiring the flock, risking data loss; change the open/lock
sequence so you acquire the exclusive non-blocking lock before
truncating/writing: open the file in a non-truncating mode that still creates
the file if missing (e.g., "a+b"), immediately acquire fcntl.flock(f,
fcntl.LOCK_EX | fcntl.LOCK_NB), return on BlockingIOError, then seek to start
and f.truncate(0) (or otherwise reset) before calling write_fn(content, f),
flushing, and finally releasing the lock in _write_file_process_safe. Ensure you
still support write_fn semantics and file creation behavior.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.17%. Comparing base (a3f5c46) to head (2a170b9).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1121      +/-   ##
==========================================
+ Coverage   70.14%   70.17%   +0.02%     
==========================================
  Files         230      230              
  Lines       26053    26057       +4     
==========================================
+ Hits        18276    18285       +9     
+ Misses       7777     7772       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97 kevalmorabia97 requested review from kevalmorabia97 and removed request for Edwardf0t1, jenchen13 and shengliangxu March 28, 2026 17:47
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py (3)

76-82: Consider deleting frame reference to avoid reference cycles.

Per Python documentation, holding references to frame objects can prevent garbage collection. Delete the frame reference when done.

♻️ Proposed fix
 def _is_called_from_post_init() -> bool:
     frame = inspect.currentframe()
-    while frame:
-        if frame.f_code.co_name == "__post_init__":
-            return True
-        frame = frame.f_back
-    return False
+    try:
+        while frame:
+            if frame.f_code.co_name == "__post_init__":
+                return True
+            frame = frame.f_back
+        return False
+    finally:
+        del frame
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py` around
lines 76 - 82, The function _is_called_from_post_init currently retains a
reference to the inspected frame object which can create reference cycles;
update the function to ensure the frame reference is cleared when done (e.g.,
use a try/finally or explicitly del frame before returning) so the frame object
can be garbage-collected; keep the existing traversal logic (checking
frame.f_code.co_name == "__post_init__") but ensure you delete the frame
variable at the end of the function and also delete any temporary frame
references used inside the loop.

89-93: Bare except: catches too much.

This catches SystemExit, KeyboardInterrupt, and other exceptions that shouldn't be silently swallowed. Use except Exception: to catch only regular exceptions.

♻️ Proposed fix
     try:
         _get_dataclass_type(tp)
         return True
-    except:
+    except Exception:
         return False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py` around
lines 89 - 93, The current try/except around the call to _get_dataclass_type
uses a bare except which swallows SystemExit/KeyboardInterrupt; change the bare
except to catch only regular exceptions (use except Exception:) in the block
that tries _get_dataclass_type and returns True/False so only normal errors are
caught while allowing critical exceptions to propagate.

96-96: Incorrect return type annotation.

dataclass is a decorator, not a type. The return type should be Type[Any] or similar.

♻️ Proposed fix
-def _get_dataclass_type(tp: Type) -> dataclass:
+def _get_dataclass_type(tp: Type) -> Type[Any]:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py` at line 96,
The return type annotation on _get_dataclass_type is wrong: `dataclass` is a
decorator, not a type; change the signature to return a type object (e.g. use ->
Type[Any] or a generic -> Type[T]) and update imports as needed (from typing
import Any, Type or TypeVar T) so the function signature becomes something like
def _get_dataclass_type(tp: Type) -> Type[Any]:; ensure any callers/types align
with the new typing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py`:
- Around line 146-155: The __post_init__ validation in class BlockConfig
incorrectly references self.top_k (undefined) instead of
self.num_experts_per_tok; update all occurrences in the __post_init__ method
(the checks that raise ValueError) to use self.num_experts_per_tok for the
numeric comparisons and the error messages, and make the messages consistent
(e.g., "num_experts_per_tok must be positive" and "num_experts_per_tok (...)
cannot be greater than num_local_experts (...)") so no AttributeError occurs at
runtime.

---

Nitpick comments:
In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py`:
- Around line 76-82: The function _is_called_from_post_init currently retains a
reference to the inspected frame object which can create reference cycles;
update the function to ensure the frame reference is cleared when done (e.g.,
use a try/finally or explicitly del frame before returning) so the frame object
can be garbage-collected; keep the existing traversal logic (checking
frame.f_code.co_name == "__post_init__") but ensure you delete the frame
variable at the end of the function and also delete any temporary frame
references used inside the loop.
- Around line 89-93: The current try/except around the call to
_get_dataclass_type uses a bare except which swallows
SystemExit/KeyboardInterrupt; change the bare except to catch only regular
exceptions (use except Exception:) in the block that tries _get_dataclass_type
and returns True/False so only normal errors are caught while allowing critical
exceptions to propagate.
- Line 96: The return type annotation on _get_dataclass_type is wrong:
`dataclass` is a decorator, not a type; change the signature to return a type
object (e.g. use -> Type[Any] or a generic -> Type[T]) and update imports as
needed (from typing import Any, Type or TypeVar T) so the function signature
becomes something like def _get_dataclass_type(tp: Type) -> Type[Any]:; ensure
any callers/types align with the new typing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c79afac1-3372-4f26-862d-05cd1e49a5be

📥 Commits

Reviewing files that changed from the base of the PR and between c5ec50b and 1b25b4a.

📒 Files selected for processing (4)
  • .github/workflows/example_tests.yml
  • .github/workflows/gpu_tests.yml
  • .pre-commit-config.yaml
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py
✅ Files skipped from review due to trivial changes (1)
  • .github/workflows/gpu_tests.yml
🚧 Files skipped from review as they are similar to previous changes (2)
  • .pre-commit-config.yaml
  • .github/workflows/example_tests.yml

Comment on lines +146 to +155
def __post_init__(self):
# Validate the configuration
if self.num_local_experts <= 0:
raise ValueError(f"num_local_experts must be positive, got {self.num_local_experts}")
if self.num_experts_per_tok <= 0:
raise ValueError(f"top_k must be positive, got {self.top_k}")
if self.num_experts_per_tok > self.num_local_experts:
raise ValueError(
f"top_k ({self.top_k}) cannot be greater than num_local_experts ({self.num_local_experts})"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical bug: self.top_k is undefined; should be self.num_experts_per_tok.

The validation references self.top_k in error messages and comparisons, but the field is named num_experts_per_tok. This will raise AttributeError at runtime whenever these validation branches are hit.

🐛 Proposed fix
     def __post_init__(self):
         # Validate the configuration
         if self.num_local_experts <= 0:
             raise ValueError(f"num_local_experts must be positive, got {self.num_local_experts}")
         if self.num_experts_per_tok <= 0:
-            raise ValueError(f"top_k must be positive, got {self.top_k}")
+            raise ValueError(f"num_experts_per_tok must be positive, got {self.num_experts_per_tok}")
         if self.num_experts_per_tok > self.num_local_experts:
             raise ValueError(
-                f"top_k ({self.top_k}) cannot be greater than num_local_experts ({self.num_local_experts})"
+                f"num_experts_per_tok ({self.num_experts_per_tok}) cannot be greater than num_local_experts ({self.num_local_experts})"
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py` around
lines 146 - 155, The __post_init__ validation in class BlockConfig incorrectly
references self.top_k (undefined) instead of self.num_experts_per_tok; update
all occurrences in the __post_init__ method (the checks that raise ValueError)
to use self.num_experts_per_tok for the numeric comparisons and the error
messages, and make the messages consistent (e.g., "num_experts_per_tok must be
positive" and "num_experts_per_tok (...) cannot be greater than
num_local_experts (...)") so no AttributeError occurs at runtime.

…ants

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

♻️ Duplicate comments (2)
modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py (1)

404-415: ⚠️ Potential issue | 🟠 Major

Return format mismatch in fallback loader (existing issue).

The _load_subblock_configs_from_subblock_library function returns raw AttentionConfig/FFNConfig objects, but calculate_subblock_stats (lines 151-152) expects items with ["subblock_config"] and ["parent_layer_indices"] keys. This will cause a runtime error when replacement_library.json is absent.

🐛 Proposed fix
 def _load_subblock_configs_from_subblock_library(master_puzzle_dir: Path) -> list[SubblockConfig]:
     subblocks_df = pd.read_json(master_puzzle_dir / "subblock_library.json")
     subblocks_df["attention_config"] = subblocks_df["attention_config"].apply(
         partial(_dataclass_from_dict, cls=AttentionConfig)
     )
     subblocks_df["ffn_config"] = subblocks_df["ffn_config"].apply(
         partial(_dataclass_from_dict, cls=FFNConfig)
     )
     attention_configs = subblocks_df["attention_config"].dropna().drop_duplicates().tolist()
     ffn_configs = subblocks_df["ffn_config"].dropna().drop_duplicates().tolist()
-    subblock_configs = attention_configs + ffn_configs
+    subblock_configs = [
+        immutabledict({"subblock_config": cfg, "parent_layer_indices": tuple([-1])})
+        for cfg in attention_configs + ffn_configs
+    ]
     return subblock_configs
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` around lines
404 - 415, The fallback loader _load_subblock_configs_from_subblock_library
currently returns raw AttentionConfig/FFNConfig objects but
calculate_subblock_stats expects each item to be a mapping with keys
"subblock_config" and "parent_layer_indices"; change
_load_subblock_configs_from_subblock_library to wrap each config into a dict
like {"subblock_config": <config>, "parent_layer_indices": []} (use an empty
list or appropriate default for parent_layer_indices) for both attention_configs
and ffn_configs so the combined subblock_configs list matches the shape expected
by calculate_subblock_stats.
modelopt/torch/puzzletron/mip/run_puzzle.py (1)

396-424: ⚠️ Potential issue | 🟠 Major

CLI validation crashes on argparse.Namespace (existing issue).

The checks like "subblock_stats_args" not in args (line 408) use in operator on argparse.Namespace, which is not iterable and will raise TypeError. The past review comment already flagged this issue.

🐛 Proposed fix
 def _assert_valid_config(args, puzzle_profile):
     required_args = (
         "subblock_stats_path",
         "objective",
         "output_path",
     )
-    missing_args = [arg for arg in required_args if arg not in args or getattr(args, arg) is None]
+    missing_args = [arg for arg in required_args if getattr(args, arg, None) is None]
     if missing_args:
         mprint(f"error: The following arguments are required: {', '.join(missing_args)}")
         sys.exit(1)

     # Make sure we have specified subblock_stats_args
-    if "subblock_stats_args" not in args and "subblock_stats_args" not in puzzle_profile:
+    if getattr(args, "subblock_stats_args", None) is None and "subblock_stats_args" not in puzzle_profile:
         mprint(
             "error: Must specify `subblock_stats_arrs` in either puzzle_profile or as a commandline arg."
         )
         sys.exit(1)

     # Make sure we have specified constraints
     if (
-        "mip_constraints" not in args
-        and "human_constraints" not in args
+        getattr(args, "mip_constraints", None) is None
+        and getattr(args, "human_constraints", None) is None
         and "mip_constraints" not in puzzle_profile
         and "human_constraints" not in puzzle_profile
     ):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/mip/run_puzzle.py` around lines 396 - 424, In
_assert_valid_config, the checks like `"subblock_stats_args" not in args` and
similar use the in-operator on argparse.Namespace (causing TypeError); change
those presence checks to use hasattr(args, "subblock_stats_args") or
getattr(args, "subblock_stats_args") is not None for each CLI arg check, while
keeping the existing membership checks against puzzle_profile (which is a dict);
update the two conditional blocks that validate subblock_stats_args and the
constraints (mip_constraints, human_constraints) to use hasattr/getattr on args
and keep the puzzle_profile checks as-is so validation works for both Namespace
and profile.
🧹 Nitpick comments (9)
modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_converter.py (1)

33-33: Return type annotation mismatch with actual return value.

The signature declares List[BlockConfig] but the method returns List[dict] after calling .to_dict(). While context snippet 4 shows the Llama converter follows the same pattern (so this is established behavior), it creates a disconnect with the base class contract in converter.py:193-210.

Consider updating the return type to List[Dict[str, Any]] to accurately reflect what is returned, or coordinate a broader fix across all converters.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_converter.py` at line
33, The function create_block_configs_from_main_config currently declares a
return type of List[BlockConfig] but actually returns a list of dicts (via
.to_dict()); update the signature to return List[Dict[str, Any]] and add the
necessary typing imports (Dict, Any) so the annotation matches the actual return
value, and scan related converters (e.g., the Llama converter and base converter
methods) to ensure callers expect dicts or adjust them consistently.
modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_model_descriptor.py (1)

64-65: Add return type annotation for consistency with base class.

The base class in model_descriptor.py:42-54 declares block_config_to_layer_overrides with return type Dict[str, Any]. Adding the annotation here would improve type safety and IDE support.

♻️ Proposed fix
+from typing import Any, Dict, List
-from typing import Dict, List
...
     `@staticmethod`
-    def block_config_to_layer_overrides(block_config: BlockConfig):
+    def block_config_to_layer_overrides(block_config: BlockConfig) -> Dict[str, Any]:
         return {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_model_descriptor.py`
around lines 64 - 65, The method block_config_to_layer_overrides in
qwen3_model_descriptor.py is missing the return type annotation declared on the
base class; update its signature to include -> Dict[str, Any] (import Dict and
Any from typing if not already present) so it matches the base class declaration
in model_descriptor.py (see block_config_to_layer_overrides and BlockConfig) and
restores proper type safety and IDE support.
modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py (1)

15-15: Narrow the file-wide mypy suppression.

Line 15 (# mypy: ignore-errors) disables type checking for the entire file. Multiple static methods lack return type annotations (e.g., decoder_layer_cls(), input_embedding_name(), layer_block_name(index: int), etc.). Add return type annotations to these methods and use targeted # type: ignore comments only where necessary (e.g., for untyped third-party imports from transformers).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py` at
line 15, The file-wide "# mypy: ignore-errors" must be removed and replaced by
precise type annotations on the static methods (e.g., add return types to
decoder_layer_cls, input_embedding_name, layer_block_name(index: int), and other
static methods) and only use targeted "# type: ignore" comments for specific
untyped third-party imports; for example declare decoder_layer_cls() ->
Type[Any] or the proper transformer layer class, input_embedding_name() -> str,
layer_block_name(index: int) -> str, and any other methods with explicit return
types so mypy can validate the file while leaving isolated ignores on the import
lines that actually require them.
modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_converter.py (1)

74-76: Consider using mprint instead of print for logging.

Other modules in this PR use mprint from modelopt.torch.puzzletron.tools.logger for consistent logging. Using print() directly may not integrate well with logging configurations.

♻️ Suggested change
+from modelopt.torch.puzzletron.tools.logger import mprint
+
 ...
 
-        print(
+        mprint(
             f"Created {len(block_configs)} block configs for Qwen3-VL MoE (decoder_sparse_step={decoder_sparse_step})"
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_converter.py`
around lines 74 - 76, Replace the direct print call with the project logging
helper: import mprint from modelopt.torch.puzzletron.tools.logger and change the
print(...) in qwen3_vl_converter.py to mprint(f"Created {len(block_configs)}
block configs for Qwen3-VL MoE (decoder_sparse_step={decoder_sparse_step})") so
the message uses the consistent logger; ensure the import is added near other
imports and no other behavior changes to block_configs or decoder_sparse_step.
modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py (1)

27-27: Setting environment variable at module import time has global side effects.

Setting os.environ["PYTORCH_CUDA_ALLOC_CONF"] at the top level of the module means this side effect occurs whenever any code imports this module, even if that code doesn't use the functions here. Consider moving this inside the functions that need it, or making it conditional.

♻️ Suggested change
-os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
+def _set_cuda_alloc_config():
+    """Set CUDA allocator config for memory-intensive operations."""
+    if "PYTORCH_CUDA_ALLOC_CONF" not in os.environ:
+        os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

Then call _set_cuda_alloc_config() at the start of calculate_subblock_stats or calculate_subblock_stats_for_puzzle_dir.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` at line 27,
The module currently sets os.environ["PYTORCH_CUDA_ALLOC_CONF"] at import time
which creates a global side effect; instead create a small helper (e.g.
_set_cuda_alloc_config()) that sets that env var only when needed, remove the
top-level assignment, and call _set_cuda_alloc_config() at the start of
calculate_subblock_stats and calculate_subblock_stats_for_puzzle_dir so the CUDA
config is applied only when those functions run (ensure the helper is idempotent
to avoid repeated sets).
modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py (2)

165-166: Remove duplicate variable assignment.

actual_io_workers is computed on line 165 and then recomputed identically on line 181. The second assignment is redundant.

♻️ Suggested fix
     # Profile _save_checkpoint with automatic I/O worker calculation
     mprint("Starting _save_checkpoint...")
-    actual_io_workers = max_workers if max_workers else "auto"
-    mprint(f"I/O Settings: max_workers={actual_io_workers}")
+    mprint(f"I/O Settings: max_workers={max_workers if max_workers else 'auto'}")
     start_time = time.time()
     ...
     # Print profiling summary with actual worker counts used
     total_core_time = create_child_state_dict_time + save_checkpoint_time
     actual_layer_workers = max_layer_workers if max_layer_workers else "auto"
-    actual_io_workers = max_workers if max_workers else "auto"
+    actual_io_workers = max_workers if max_workers else "auto"  # keep this one

Also applies to: 180-181

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py`
around lines 165 - 166, Remove the duplicated assignment of actual_io_workers
and its associated mprint call in init_child_from_parent.py: keep the first
computation (actual_io_workers = max_workers if max_workers else "auto") and the
initial mprint("I/O Settings:..."), and delete the later identical reassignment
and print (the repeated lines around where actual_io_workers is set again near
line 181). This removes the redundant code while leaving any subsequent uses of
actual_io_workers and the mprint calls (function mprint) intact.

90-91: Add error handling for JSON parsing.

If model_config_overrides_dict is a malformed JSON string, json.loads() will raise a JSONDecodeError. Consider wrapping this in a try-except with a more informative error message.

🛡️ Suggested improvement
     # Parse JSON if string
     if isinstance(model_config_overrides_dict, str):
-        model_config_overrides_dict = json.loads(model_config_overrides_dict)
+        try:
+            model_config_overrides_dict = json.loads(model_config_overrides_dict)
+        except json.JSONDecodeError as e:
+            raise ValueError(f"Invalid JSON in model_config_overrides_dict: {e}") from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py`
around lines 90 - 91, Wrap the json.loads call that parses
model_config_overrides_dict in a try-except that catches json.JSONDecodeError,
so malformed JSON yields a clear, actionable error; specifically, around the
block that checks isinstance(model_config_overrides_dict, str) and calls
json.loads, catch JSONDecodeError and raise a ValueError (or log and raise) with
a message that includes that model_config_overrides_dict failed to parse and
include the offending string (or truncated preview) and the original exception
message to aid debugging.
modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py (2)

119-126: Consider adding defensive checks for missing attributes.

_block_no_op_post_init accesses decoder_layer.config.block_configs[decoder_layer.layer_idx] directly. If decoder_layer lacks config, block_configs, or layer_idx attributes (e.g., when called on a DummyBlock during initialization), this will raise an AttributeError.

🛡️ Suggested defensive check
 `@staticmethod`
 def _block_no_op_post_init(decoder_layer):
     """
     Due to the subblock structure of NemotronH always one of the subblock is set to no-op, for a real no-op both attention & ffn no-op should be set to True.
     """
+    if not hasattr(decoder_layer, 'config') or not hasattr(decoder_layer, 'layer_idx'):
+        return
     block_config = decoder_layer.config.block_configs[decoder_layer.layer_idx]
     if block_config.ffn.no_op and block_config.attention.no_op:
         decoder_layer.norm = Same()
         decoder_layer.mixer = MatchingZeros()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py`
around lines 119 - 126, The _block_no_op_post_init function assumes
decoder_layer has config, block_configs, and layer_idx; add defensive checks at
the start of _block_no_op_post_init to avoid AttributeError (e.g., if called on
a DummyBlock): verify hasattr(decoder_layer, "config"),
hasattr(decoder_layer.config, "block_configs"), and hasattr(decoder_layer,
"layer_idx") (or use a try/except AttributeError), ensure
decoder_layer.layer_idx is an int within range of
decoder_layer.config.block_configs, and return early if any check fails; then
proceed to access block_config and apply the existing ffn/attention no_op logic
on decoder_layer.norm and decoder_layer.mixer.

40-52: Dynamic module discovery may be slow for large module trees.

The get_dynamic_modules function walks all packages under transformers_modules and imports each module to find matching classes. This could be slow if the module tree is large. Consider caching the result or limiting the search scope if performance becomes an issue.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py`
around lines 40 - 52, get_dynamic_modules currently imports every module under
transformers_modules which can be slow; to fix, add caching and an optional
scope parameter: decorate get_dynamic_modules with functools.lru_cache (or
maintain a module-level cache keyed by module_cls_str and optional scope), and
add an optional argument (e.g., package_whitelist: Optional[Iterable[str]] or
max_results/first_match: int) so the caller can limit search to specific
subpackages or stop after N matches; update the function body to skip packages
not in the whitelist and to short-circuit when max_results is reached, keeping
references to get_dynamic_modules, transformers_modules, matches, and the
import/package iteration logic so changes are easy to locate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py`:
- Around line 90-93: The MoE override assignments in the block-construction path
are writing None into override_kwargs when a partial BlockConfig.ffn.moe is
provided; update the logic in the block config handling (the code that builds
override_kwargs for MoE) to check each MoE field individually before setting
override_kwargs["moe_intermediate_size"] and override_kwargs["n_routed_experts"]
(i.e., only assign block_config.ffn.moe.expert_intermediate_dim and
block_config.ffn.moe.num_local_experts if they are not None), so that
unspecified MoE values do not overwrite model defaults in the NemotronH_v2 model
descriptor.
- Around line 168-174: The loop that filters singleton groups (weight_groups ->
valid_weight_groups) is too broad: it currently skips any single entry ending
with "norm.weight" and containing "layers", which can accidentally drop valid
tensors like "backbone.layers.<i>.mixer.norm.weight". Change the condition in
that loop (variables: weight_groups, valid_weight_groups, only_name) to only
skip when only_name exactly matches the root-layer norm pattern, e.g. use a
regex or precise string check to match "backbone.layers.<number>.norm.weight"
(backbone\.layers\.\d+\.norm\.weight) and only continue in that case. Ensure
other single-entry names that end with "norm.weight" but include additional
subblocks (like ".mixer.") are preserved.
- Around line 39-51: The dynamic discovery currently imports every module in the
shared transformers_modules cache; change get_dynamic_modules to first search
already-imported modules in sys.modules whose names start with
transformers_modules.__name__ + "." and collect matching classes, and only if
none found (or if a specific checkpoint package name is provided) limit
pkgutil.walk_packages to that specific checkpoint subpackage instead of walking
transformers_modules.__path__ entirely; update uses of pkgutil.walk_packages and
importlib.import_module in get_dynamic_modules to restrict imports to that
narrower package scope (reference symbols: get_dynamic_modules, module_cls_str,
transformers_modules, pkgutil.walk_packages, importlib.import_module).

In `@modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py`:
- Around line 24-28: The module currently does eager, top-level imports of
Qwen2DecoderLayer, Qwen2ForCausalLM, and Qwen2RotaryEmbedding which breaks when
transformers is absent; change to lazy imports by removing those module-level
imports and instead: add a TYPE_CHECKING guard to import those types for type
hints only, and import the actual classes inside the functions/methods that
instantiate or return them (e.g., wherever Qwen2ForCausalLM, Qwen2DecoderLayer,
or Qwen2RotaryEmbedding are used in this descriptor), or use the repository's
import_plugin() pattern to load transformers at call time so the optional
dependency is gated at runtime. Ensure all usage sites reference the locally
imported symbols and keep type hints under TYPE_CHECKING.

In
`@modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py`:
- Around line 22-26: The module-level imports of Qwen3VLMoeTextDecoderLayer,
Qwen3VLMoeTextRotaryEmbedding, and Qwen3VLMoeVisionRotaryEmbedding will raise
ModuleNotFoundError on transformers 4.56.x; either bump the transformers
requirement to >=4.57.0 in pyproject.toml, or change the top-level import in
qwen3_vl_model_descriptor.py to a guarded lazy import: wrap the current import
block for
Qwen3VLMoeTextDecoderLayer/Qwen3VLMoeTextRotaryEmbedding/Qwen3VLMoeVisionRotaryEmbedding
in try/except ImportError and fall back to importing alternatives or deferring
import inside the functions/classes that use them so the module can load under
older transformers versions.

In `@modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_converter.py`:
- Around line 36-42: The block_configs list currently uses [block_config] *
num_hidden_layers which creates repeated references to the same BlockConfig
dict; change this to produce independent dicts per layer by constructing
block_configs with a comprehension or loop that calls BlockConfig(...).to_dict()
(or deep-copies) for each index so each entry is its own object—modify the code
around the BlockConfig/AttentionConfig/FFNConfig creation and the block_configs
variable in qwen3_converter.py to use a list comprehension like creating a new
BlockConfig per iteration rather than repeating the same dict reference.

In `@modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py`:
- Line 20: Remove the module-level "from megatron.bridge import AutoBridge" and
instead call import_plugin("megatron.bridge") inside the export routine (e.g.,
export_mbridge_to_hf) to lazily load the optional Megatron integration; assign
the returned module or object to a local name and use that (or its AutoBridge
attribute) where currently AutoBridge is referenced so imports don't fail when
Megatron is absent. Ensure you import and use the existing import_plugin()
helper and update any code that referred to the top-level AutoBridge symbol to
use the locally loaded module/attribute.

In `@modelopt/torch/puzzletron/mip/run_puzzle.py`:
- Around line 158-160: Assertion message references 'block_size' while the code
checks for 'batch_size' and 'generation_seq_len'; update the assertion message
in the assert statement that checks all(key in subblock_stats_args for key in
("batch_size", "generation_seq_len")) to mention 'batch_size' instead of
'block_size' (e.g., "Can't realize human constraints without 'batch_size' and
'generation_seq_len' in subblock_stats_args.") so the text matches the checked
keys.

In `@modelopt/torch/puzzletron/sewing_kit/utils.py`:
- Line 15: Remove the file-wide directive "# mypy: ignore-errors" at the top of
the sewing_kit.utils module; instead run mypy to locate the specific type
errors, add precise type annotations to public functions/classes in this module,
and apply narrowly-scoped ignores (e.g., "# type: ignore[assignment]" or "#
type: ignore[arg-type]") only on the specific Torch internals or import lines
that mypy cannot model; ensure exported symbols have explicit return/argument
types so the module is type-checkable without disabling mypy globally.
- Around line 94-126: Fix three bugs in ActivityContext: in __init__/__call__
stop using insert(self.head_index) for normal order — for reversed=True insert
at index 0, otherwise use list.append to push to the end so nested contexts
unwind LIFO; in __call__ only pop when this call actually pushed an item (set a
local flag pushed=False, set pushed=True after insert/append, and in finally
only pop if pushed using pop(0) for reversed or pop() for normal); and correct
the is_active check elsewhere to call the method (use self.is_active() not
self.is_active) so get_active() can return None for empty stacks. Ensure
references changed affect __init__, __call__, is_active, and get_active.
- Around line 357-373: Update _get_device_for_distributed to consistently return
torch.device objects (change the return type annotation from -> str to ->
torch.device); when not initialized return torch.device("cpu"), and when backend
is "nccl" return torch.device("cuda", torch.cuda.current_device()) (or
torch.device("cuda", index) using torch.cuda.current_device()) so both branches
return torch.device instances and usage with Tensor.to() is consistent.

In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py`:
- Around line 131-140: The call to calc_runtime_ms_for_subblocks inside the
block guarded by is_calc_runtime is unreachable because earlier code raises
NotImplementedError for is_calc_runtime==True; remove the dead call to
calc_runtime_ms_for_subblocks (and any related vars runtime_by_subblock_dict,
non_block_runtime_ms) or mark it explicitly as a TODO/placeholder with a clear
comment linking calc_subblock_stats and is_calc_runtime so future implementers
know to implement/ import calc_runtime_ms_for_subblocks; ensure no other
references to runtime_by_subblock_dict or non_block_runtime_ms remain in
calc_subblock_stats.
- Around line 317-320: Replace the incorrect use of the Warning(...) constructor
with warnings.warn(...) so the message is actually emitted (e.g.,
warnings.warn(f"MOE stats file {moe_stats_file} does not exist, can't calculate
num active params", UserWarning, stacklevel=2)); ensure the warnings module is
imported at top of the file and remove the unused Warning() call or exception
object assignment; locate the call by searching for the Warning(...) invocation
or the moe_stats_file variable in calc_subblock_stats.py to apply the change.

---

Duplicate comments:
In `@modelopt/torch/puzzletron/mip/run_puzzle.py`:
- Around line 396-424: In _assert_valid_config, the checks like
`"subblock_stats_args" not in args` and similar use the in-operator on
argparse.Namespace (causing TypeError); change those presence checks to use
hasattr(args, "subblock_stats_args") or getattr(args, "subblock_stats_args") is
not None for each CLI arg check, while keeping the existing membership checks
against puzzle_profile (which is a dict); update the two conditional blocks that
validate subblock_stats_args and the constraints (mip_constraints,
human_constraints) to use hasattr/getattr on args and keep the puzzle_profile
checks as-is so validation works for both Namespace and profile.

In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py`:
- Around line 404-415: The fallback loader
_load_subblock_configs_from_subblock_library currently returns raw
AttentionConfig/FFNConfig objects but calculate_subblock_stats expects each item
to be a mapping with keys "subblock_config" and "parent_layer_indices"; change
_load_subblock_configs_from_subblock_library to wrap each config into a dict
like {"subblock_config": <config>, "parent_layer_indices": []} (use an empty
list or appropriate default for parent_layer_indices) for both attention_configs
and ffn_configs so the combined subblock_configs list matches the shape expected
by calculate_subblock_stats.

---

Nitpick comments:
In
`@modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py`:
- Around line 119-126: The _block_no_op_post_init function assumes decoder_layer
has config, block_configs, and layer_idx; add defensive checks at the start of
_block_no_op_post_init to avoid AttributeError (e.g., if called on a
DummyBlock): verify hasattr(decoder_layer, "config"),
hasattr(decoder_layer.config, "block_configs"), and hasattr(decoder_layer,
"layer_idx") (or use a try/except AttributeError), ensure
decoder_layer.layer_idx is an int within range of
decoder_layer.config.block_configs, and return early if any check fails; then
proceed to access block_config and apply the existing ffn/attention no_op logic
on decoder_layer.norm and decoder_layer.mixer.
- Around line 40-52: get_dynamic_modules currently imports every module under
transformers_modules which can be slow; to fix, add caching and an optional
scope parameter: decorate get_dynamic_modules with functools.lru_cache (or
maintain a module-level cache keyed by module_cls_str and optional scope), and
add an optional argument (e.g., package_whitelist: Optional[Iterable[str]] or
max_results/first_match: int) so the caller can limit search to specific
subpackages or stop after N matches; update the function body to skip packages
not in the whitelist and to short-circuit when max_results is reached, keeping
references to get_dynamic_modules, transformers_modules, matches, and the
import/package iteration logic so changes are easy to locate.

In `@modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py`:
- Line 15: The file-wide "# mypy: ignore-errors" must be removed and replaced by
precise type annotations on the static methods (e.g., add return types to
decoder_layer_cls, input_embedding_name, layer_block_name(index: int), and other
static methods) and only use targeted "# type: ignore" comments for specific
untyped third-party imports; for example declare decoder_layer_cls() ->
Type[Any] or the proper transformer layer class, input_embedding_name() -> str,
layer_block_name(index: int) -> str, and any other methods with explicit return
types so mypy can validate the file while leaving isolated ignores on the import
lines that actually require them.

In `@modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_converter.py`:
- Around line 74-76: Replace the direct print call with the project logging
helper: import mprint from modelopt.torch.puzzletron.tools.logger and change the
print(...) in qwen3_vl_converter.py to mprint(f"Created {len(block_configs)}
block configs for Qwen3-VL MoE (decoder_sparse_step={decoder_sparse_step})") so
the message uses the consistent logger; ensure the import is added near other
imports and no other behavior changes to block_configs or decoder_sparse_step.

In `@modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_converter.py`:
- Line 33: The function create_block_configs_from_main_config currently declares
a return type of List[BlockConfig] but actually returns a list of dicts (via
.to_dict()); update the signature to return List[Dict[str, Any]] and add the
necessary typing imports (Dict, Any) so the annotation matches the actual return
value, and scan related converters (e.g., the Llama converter and base converter
methods) to ensure callers expect dicts or adjust them consistently.

In `@modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_model_descriptor.py`:
- Around line 64-65: The method block_config_to_layer_overrides in
qwen3_model_descriptor.py is missing the return type annotation declared on the
base class; update its signature to include -> Dict[str, Any] (import Dict and
Any from typing if not already present) so it matches the base class declaration
in model_descriptor.py (see block_config_to_layer_overrides and BlockConfig) and
restores proper type safety and IDE support.

In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py`:
- Line 27: The module currently sets os.environ["PYTORCH_CUDA_ALLOC_CONF"] at
import time which creates a global side effect; instead create a small helper
(e.g. _set_cuda_alloc_config()) that sets that env var only when needed, remove
the top-level assignment, and call _set_cuda_alloc_config() at the start of
calculate_subblock_stats and calculate_subblock_stats_for_puzzle_dir so the CUDA
config is applied only when those functions run (ensure the helper is idempotent
to avoid repeated sets).

In `@modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py`:
- Around line 165-166: Remove the duplicated assignment of actual_io_workers and
its associated mprint call in init_child_from_parent.py: keep the first
computation (actual_io_workers = max_workers if max_workers else "auto") and the
initial mprint("I/O Settings:..."), and delete the later identical reassignment
and print (the repeated lines around where actual_io_workers is set again near
line 181). This removes the redundant code while leaving any subsequent uses of
actual_io_workers and the mprint calls (function mprint) intact.
- Around line 90-91: Wrap the json.loads call that parses
model_config_overrides_dict in a try-except that catches json.JSONDecodeError,
so malformed JSON yields a clear, actionable error; specifically, around the
block that checks isinstance(model_config_overrides_dict, str) and calls
json.loads, catch JSONDecodeError and raise a ValueError (or log and raise) with
a message that includes that model_config_overrides_dict failed to parse and
include the offending string (or truncated preview) and the original exception
message to aid debugging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dad8af7f-6d6b-49c2-bbc4-3bd3b55b551b

📥 Commits

Reviewing files that changed from the base of the PR and between 7e15fdd and 75651cc.

📒 Files selected for processing (23)
  • .pre-commit-config.yaml
  • examples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/ffn_pruning.yaml
  • modelopt/torch/puzzletron/anymodel/models/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_converter.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_converter.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py
  • modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py
  • modelopt/torch/puzzletron/mip/run_puzzle.py
  • modelopt/torch/puzzletron/mip/utils.py
  • modelopt/torch/puzzletron/sewing_kit/core.py
  • modelopt/torch/puzzletron/sewing_kit/utils.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
  • modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
  • pyproject.toml
  • tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
  • tests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-8B/pruning/ffn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-VL-30B-A3B-Instruct/pruning/expert_pruning.yaml
✅ Files skipped from review due to trivial changes (5)
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl/init.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3/init.py
  • examples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/ffn_pruning.yaml
  • modelopt/torch/puzzletron/mip/utils.py
  • modelopt/torch/puzzletron/anymodel/models/init.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • .pre-commit-config.yaml
  • modelopt/torch/puzzletron/sewing_kit/core.py

Comment on lines +39 to +51
def get_dynamic_modules(module_cls_str: str) -> List[Type[nn.Module]]:
import transformers_modules

matches = []
for finder, modname, ispkg in pkgutil.walk_packages(
transformers_modules.__path__, transformers_modules.__name__ + "."
):
module = importlib.import_module(modname)
for _, obj in inspect.getmembers(module, inspect.isclass):
if obj.__name__ == module_cls_str:
matches.append(obj)

return matches
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Hugging Face Transformers, how is the transformers_modulesnamespace fortrust_remote_code models populated? Is it shared across cached repos/revisions, and does importing every discovered submodule execute top-level code from unrelated cached models?

💡 Result:

transformers_modules is not a “registry” that Transformers eagerly fills; it’s a filesystem-backed Python package tree rooted at HF_MODULES_CACHE (by default under your HF cache), and it’s populated lazily when you load a trust_remote_code=True model.

How it gets populated

When Transformers needs a class from remote code, it calls get_cached_module_file(...), which:

  1. Ensures HF_MODULES_CACHE exists, contains a top-level __init__.py, and is added to sys.path (init_hf_modules). (raw.githubusercontent.com)
  2. Creates a package path HF_MODULES_CACHE/transformers_modules/<sanitized_repo_id>/... with __init__.py files (create_dynamic_module). (raw.githubusercontent.com)
  3. Copies (or downloads+copies) the requested module file (and any relative-imported .py files) into that package tree. (raw.githubusercontent.com)
  4. Imports the module by file path and executes it via module_spec.loader.exec_module(module) when the class is needed (get_class_in_module). (raw.githubusercontent.com)
    Transformers explicitly warns that calling this mechanism “will execute the code in the module file”. (raw.githubusercontent.com)

Is the namespace shared across cached repos / revisions?

Yes, in the sense that there is one shared root package (transformers_modules) under HF_MODULES_CACHE that accumulates subpackages for whatever you’ve loaded on that machine/env. (raw.githubusercontent.com)

But remote-code revisions are not merged: for non-local repos, Transformers places code under an additional commit-hash subfolder, e.g.:
.../transformers_modules/<repo_id>/<commit_hash>/module.py (raw.githubusercontent.com)
So different revisions get distinct import paths.

Does “importing every discovered submodule” execute unrelated cached models’ top-level code?

  • Importing a module executes its top-level code (and get_class_in_module uses exec_module, which does exactly that). (raw.githubusercontent.com)
  • Transformers itself does not automatically import all cached transformers_modules.* modules; it imports only the specific module needed for the class you’re loading. (raw.githubusercontent.com)
  • However, if you enumerate the transformers_modules package (e.g., via pkgutil.walk_packages) and then importlib.import_module(...) every discovered module, then yes: you would end up executing top-level code from unrelated cached remote-code modules—because you explicitly imported them. (raw.githubusercontent.com)

Citations:


Scope dynamic-module discovery to the selected checkpoint.

Lines 39–51 walk the entire transformers_modules namespace and import every discovered module. Since transformers_modules is a shared filesystem cache that accumulates remote-code packages from all previously loaded trust_remote_code=True models, this code unnecessarily executes top-level code from unrelated cached models. The function should discover modules only within the package cached for the specific checkpoint being converted, or from modules already loaded in the current session.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py`
around lines 39 - 51, The dynamic discovery currently imports every module in
the shared transformers_modules cache; change get_dynamic_modules to first
search already-imported modules in sys.modules whose names start with
transformers_modules.__name__ + "." and collect matching classes, and only if
none found (or if a specific checkpoint package name is provided) limit
pkgutil.walk_packages to that specific checkpoint subpackage instead of walking
transformers_modules.__path__ entirely; update uses of pkgutil.walk_packages and
importlib.import_module in get_dynamic_modules to restrict imports to that
narrower package scope (reference symbols: get_dynamic_modules, module_cls_str,
transformers_modules, pkgutil.walk_packages, importlib.import_module).

Comment on lines +90 to +93
if block_config.ffn is not None and block_config.ffn.moe is not None:
override_kwargs["moe_intermediate_size"] = block_config.ffn.moe.expert_intermediate_dim
override_kwargs["n_routed_experts"] = block_config.ffn.moe.num_local_experts

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't write None into MoE constructor overrides.

Unlike the dense FFN and attention branches, Lines 91-92 forward the MoE fields without checking for None. A partial BlockConfig.ffn.moe will overwrite model defaults with None, which can break block construction and sweep configs that change only one MoE dimension.

🐛 Proposed fix
-        if block_config.ffn is not None and block_config.ffn.moe is not None:
-            override_kwargs["moe_intermediate_size"] = block_config.ffn.moe.expert_intermediate_dim
-            override_kwargs["n_routed_experts"] = block_config.ffn.moe.num_local_experts
+        if block_config.ffn is not None and block_config.ffn.moe is not None:
+            if block_config.ffn.moe.expert_intermediate_dim is not None:
+                override_kwargs["moe_intermediate_size"] = (
+                    block_config.ffn.moe.expert_intermediate_dim
+                )
+            if block_config.ffn.moe.num_local_experts is not None:
+                override_kwargs["n_routed_experts"] = block_config.ffn.moe.num_local_experts
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py`
around lines 90 - 93, The MoE override assignments in the block-construction
path are writing None into override_kwargs when a partial BlockConfig.ffn.moe is
provided; update the logic in the block config handling (the code that builds
override_kwargs for MoE) to check each MoE field individually before setting
override_kwargs["moe_intermediate_size"] and override_kwargs["n_routed_experts"]
(i.e., only assign block_config.ffn.moe.expert_intermediate_dim and
block_config.ffn.moe.num_local_experts if they are not None), so that
unspecified MoE values do not overwrite model defaults in the NemotronH_v2 model
descriptor.

Comment on lines +168 to +174
valid_weight_groups = {}
for group, names in weight_groups.items():
if len(names) == 1:
only_name = names[0]
if only_name.endswith("norm.weight") and "layers" in only_name:
# Skip and don't append this group to valid_weight_groups
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Narrow the singleton-group filter to the duplicated root norm only.

Line 172 drops any one-element group ending in norm.weight, which also matches unique tensors such as backbone.layers.<i>.mixer.norm.weight. If a subblock legitimately has only that weight, get_weight_groups() will silently omit it from the checkpoint map. Match only backbone.layers.<i>.norm.weight here.

🐛 Proposed fix
-                if only_name.endswith("norm.weight") and "layers" in only_name:
+                if re.fullmatch(r"backbone\.layers\.\d+\.norm\.weight", only_name):
                     # Skip and don't append this group to valid_weight_groups
                     continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py`
around lines 168 - 174, The loop that filters singleton groups (weight_groups ->
valid_weight_groups) is too broad: it currently skips any single entry ending
with "norm.weight" and containing "layers", which can accidentally drop valid
tensors like "backbone.layers.<i>.mixer.norm.weight". Change the condition in
that loop (variables: weight_groups, valid_weight_groups, only_name) to only
skip when only_name exactly matches the root-layer norm pattern, e.g. use a
regex or precise string check to match "backbone.layers.<number>.norm.weight"
(backbone\.layers\.\d+\.norm\.weight) and only continue in that case. Ensure
other single-entry names that end with "norm.weight" but include additional
subblocks (like ".mixer.") are preserved.

Comment on lines +24 to +28
from transformers.models.qwen2.modeling_qwen2 import (
Qwen2DecoderLayer,
Qwen2ForCausalLM,
Qwen2RotaryEmbedding,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify module-level HF imports in descriptor modules
rg -n --type=py '^\s*(from|import)\s+transformers' modelopt/torch/puzzletron/anymodel/models -C2

# Locate existing import_plugin usage to align this file with repo pattern
rg -n --type=py '\bimport_plugin\(' modelopt -C2

Repository: NVIDIA/Model-Optimizer

Length of output: 24064


🏁 Script executed:

# Read the full qwen2_model_descriptor.py to see how imports are used
head -100 modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py

Repository: NVIDIA/Model-Optimizer

Length of output: 3511


🏁 Script executed:

# Check if Qwen2ForCausalLM, Qwen2DecoderLayer are used in type hints at module level
rg -A 5 'Qwen2ForCausalLM|Qwen2DecoderLayer|Qwen2RotaryEmbedding' modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1140


🏁 Script executed:

# Check how descriptor modules are loaded - search for registration/loading patterns
rg -n 'qwen2_model_descriptor|register.*descriptor' modelopt/torch/puzzletron -C 3

Repository: NVIDIA/Model-Optimizer

Length of output: 2186


Use lazy imports to gate optional transformers dependency in this descriptor module.

Lines 24–28 eagerly import Qwen2 classes at module level, making the descriptor fragile in environments without transformers installed. Per coding guidelines, optional dependencies must use lazy imports.

Move imports to function scope:

  • Use TYPE_CHECKING guard for type hints in method signatures
  • Import class at each usage site where it's instantiated or returned

This pattern aligns with the codebase's import_plugin() approach used throughout optional integrations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py`
around lines 24 - 28, The module currently does eager, top-level imports of
Qwen2DecoderLayer, Qwen2ForCausalLM, and Qwen2RotaryEmbedding which breaks when
transformers is absent; change to lazy imports by removing those module-level
imports and instead: add a TYPE_CHECKING guard to import those types for type
hints only, and import the actual classes inside the functions/methods that
instantiate or return them (e.g., wherever Qwen2ForCausalLM, Qwen2DecoderLayer,
or Qwen2RotaryEmbedding are used in this descriptor), or use the repository's
import_plugin() pattern to load transformers at call time so the optional
dependency is gated at runtime. Ensure all usage sites reference the locally
imported symbols and keep type hints under TYPE_CHECKING.

Comment on lines +22 to +26
from transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe import (
Qwen3VLMoeTextDecoderLayer,
Qwen3VLMoeTextRotaryEmbedding,
Qwen3VLMoeVisionRotaryEmbedding,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Does transformers library version 4.56 support Qwen3VLMoe model with module path transformers.models.qwen3_vl_moe?

💡 Result:

No. In the Transformers 4.56.x line, the transformers.models.qwen3_vl_moe package/module does not exist—src/transformers/models/qwen3_vl_moe was added in 4.57.0 (it appears in the 4.56.2 → 4.57.0 diff as a newly added directory). [1] The 4.57.0 release notes also list Qwen3-related additions, consistent with support landing in 4.57+. [2]

So you need Transformers >= 4.57.0 for that module path/model support. [1][3]

Sources: [1] [2] [3]


🏁 Script executed:

# Find dependency configuration files
git ls-files | grep -E '(pyproject\.toml|requirements|setup\.py)' | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 867


🏁 Script executed:

# Check for main project config files at root
ls -la pyproject.toml setup.py setup.cfg 2>/dev/null || echo "Checking for them..."

# Also search for pyproject.toml anywhere
find . -maxdepth 2 -name "pyproject.toml" -type f

Repository: NVIDIA/Model-Optimizer

Length of output: 167


🏁 Script executed:

# Check the transformers dependency version in pyproject.toml
cat pyproject.toml | grep -A 5 -B 5 "transformers"

Repository: NVIDIA/Model-Optimizer

Length of output: 578


🏁 Script executed:

# Check the full file to see if imports are at module level and how the descriptor is registered
cat -n modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py | head -50

Repository: NVIDIA/Model-Optimizer

Length of output: 2179


🏁 Script executed:

# Check if this descriptor module is imported or registered anywhere
rg "qwen3_vl_model_descriptor\|Qwen3VLModelDescriptor" --type py | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Check how model descriptors are discovered/registered
find modelopt/torch/puzzletron/anymodel/models -name "__init__.py" -exec cat {} \;

Repository: NVIDIA/Model-Optimizer

Length of output: 8632


🏁 Script executed:

# Check what's in qwen3_vl __init__.py to confirm eager import
cat modelopt/torch/puzzletron/anymodel/models/qwen3_vl/__init__.py

Repository: NVIDIA/Model-Optimizer

Length of output: 965


Update minimum transformers version or use lazy imports for Qwen3-VL MoE imports.

The import at lines 22-26 uses transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe, which was added in transformers 4.57.0. However, pyproject.toml specifies transformers>=4.56,<5.0, allowing 4.56.x where this module does not exist. This will cause an immediate ModuleNotFoundError when the module is loaded.

Either update pyproject.toml to require transformers>=4.57.0 or wrap these imports in a try/except block with lazy loading to handle both versions gracefully.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py`
around lines 22 - 26, The module-level imports of Qwen3VLMoeTextDecoderLayer,
Qwen3VLMoeTextRotaryEmbedding, and Qwen3VLMoeVisionRotaryEmbedding will raise
ModuleNotFoundError on transformers 4.56.x; either bump the transformers
requirement to >=4.57.0 in pyproject.toml, or change the top-level import in
qwen3_vl_model_descriptor.py to a guarded lazy import: wrap the current import
block for
Qwen3VLMoeTextDecoderLayer/Qwen3VLMoeTextRotaryEmbedding/Qwen3VLMoeVisionRotaryEmbedding
in try/except ImportError and fall back to importing alternatives or deferring
import inside the functions/classes that use them so the module can load under
older transformers versions.

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# mypy: ignore-errors
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid file-wide mypy: ignore-errors.

This disables type checking for the entire module and makes it much easier to miss bugs in a new shared utility surface. Prefer targeted # type: ignore[...] on the specific Torch internals that mypy cannot model.

As per coding guidelines, "Ensure type hints are properly annotated for static type checking with mypy".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/sewing_kit/utils.py` at line 15, Remove the
file-wide directive "# mypy: ignore-errors" at the top of the sewing_kit.utils
module; instead run mypy to locate the specific type errors, add precise type
annotations to public functions/classes in this module, and apply
narrowly-scoped ignores (e.g., "# type: ignore[assignment]" or "# type:
ignore[arg-type]") only on the specific Torch internals or import lines that
mypy cannot model; ensure exported symbols have explicit return/argument types
so the module is type-checkable without disabling mypy globally.

Comment on lines +94 to +126
def __init__(self, max_depth: Optional[int] = None, no_duplicates=False, reversed=False):
self.activity_stack: list[T] = []
self.max_depth = max_depth
self.no_duplicates = no_duplicates
self.reversed = reversed
self.head_index = 0 if self.reversed else -1

def __contains__(self, value: T) -> bool:
result = value in self.activity_stack
return result

def __call__(self, value: T) -> ContextManager:
@contextmanager
def fn():
try:
if self.no_duplicates and value in self.activity_stack:
raise ActivityContextDuplicateException(
f"Activity stack cannot have a duplicate of item {value}"
)

self.activity_stack.insert(self.head_index, value)

if self.max_depth is not None and len(self) > self.max_depth:
raise ActivityContextMaxDepthException(
f"Activity stack exceeds max depth of {self.max_depth}"
)

yield
finally:
assert self.is_active()
self.activity_stack.pop(self.head_index)

return fn()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "utils.py" -path "*/puzzletron/sewing_kit/*" -type f

Repository: NVIDIA/Model-Optimizer

Length of output: 113


🏁 Script executed:

cat -n "modelopt/torch/puzzletron/sewing_kit/utils.py" | head -160

Repository: NVIDIA/Model-Optimizer

Length of output: 5831


🏁 Script executed:

grep -n "get_active" "modelopt/torch/puzzletron/sewing_kit/utils.py"

Repository: NVIDIA/Model-Optimizer

Length of output: 110


🏁 Script executed:

find . -name "core.py" -path "*/passage/*" -type f

Repository: NVIDIA/Model-Optimizer

Length of output: 120


🏁 Script executed:

grep -n "ActivityContext\|max_depth\|no_duplicates" "modelopt/torch/puzzletron/sewing_kit/passage/core.py" | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 248


🏁 Script executed:

rg -A 3 -B 3 "ActivityContext" "modelopt/torch/puzzletron/sewing_kit/passage/core.py"

Repository: NVIDIA/Model-Optimizer

Length of output: 419


🏁 Script executed:

python3 << 'EOF'
# Test list.insert(-1, value) behavior
test_list = [1, 2, 3]
print("Original list:", test_list)

# Insert -1 with value 99
test_list_copy = test_list.copy()
test_list_copy.insert(-1, 99)
print("After insert(-1, 99):", test_list_copy)
print("Does insert(-1, value) insert BEFORE the last element? Yes")

# What append does
test_list_copy2 = test_list.copy()
test_list_copy2.append(99)
print("After append(99):", test_list_copy2)

# Test with default reversed=False (head_index = -1)
print("\n--- Simulating ActivityContext with reversed=False (head_index=-1) ---")
stack = []
stack.insert(-1, "A")  # head_index=-1
print(f"After insert(-1, 'A'): {stack}")
stack.insert(-1, "B")  # head_index=-1
print(f"After insert(-1, 'B'): {stack}")
stack.insert(-1, "C")  # head_index=-1
print(f"After insert(-1, 'C'): {stack}")
print(f"pop(-1) removes: {stack.pop(-1)}")
print(f"Stack after pop(-1): {stack}")
print(f"pop(-1) removes: {stack.pop(-1)}")
print(f"Stack after pop(-1): {stack}")

# Show expected behavior with append/pop
print("\n--- Expected behavior (append, pop(-1)) ---")
stack2 = []
stack2.append("A")
print(f"After append('A'): {stack2}")
stack2.append("B")
print(f"After append('B'): {stack2}")
stack2.append("C")
print(f"After append('C'): {stack2}")
print(f"pop(-1) removes: {stack2.pop(-1)}")
print(f"Stack after pop(-1): {stack2}")
EOF

Repository: NVIDIA/Model-Optimizer

Length of output: 674


🏁 Script executed:

grep -B 10 -A 10 "ActivityContext(" "modelopt/torch/puzzletron/sewing_kit/passage/core.py" | head -60

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

grep -n "ActivityContext\|max_depth\|no_duplicates" "modelopt/torch/puzzletron/sewing_kit/passage/core.py"

Repository: NVIDIA/Model-Optimizer

Length of output: 248


Fix ActivityContext enter/exit bookkeeping — three critical bugs.

  1. list.insert(-1, value) inserts before the last element, not at the end. With reversed=False (default), the stack winds up as ['B', 'C', 'A'] instead of ['A', 'B', 'C'], causing nested contexts to unwind in the wrong order.

  2. The finally block unconditionally pops even when ActivityContextDuplicateException is raised before the insert happens, removing an already-active entry that was never pushed in this call.

  3. Line 145: if self.is_active: checks the method object without calling it, so the condition is always truthy. This means get_active() never returns None for empty stacks and instead attempts to index an empty list, raising IndexError.

These issues manifest in passage/core.py where create_fn_context uses max_depth=1 and active_passages_context uses no_duplicates=True.

Suggested fix
 class ActivityContext(Generic[T]):
     def __call__(self, value: T) -> ContextManager:
         `@contextmanager`
         def fn():
+            inserted = False
             try:
                 if self.no_duplicates and value in self.activity_stack:
                     raise ActivityContextDuplicateException(
                         f"Activity stack cannot have a duplicate of item {value}"
                     )
 
-                self.activity_stack.insert(self.head_index, value)
+                if self.reversed:
+                    self.activity_stack.insert(0, value)
+                else:
+                    self.activity_stack.append(value)
+                inserted = True
 
                 if self.max_depth is not None and len(self) > self.max_depth:
                     raise ActivityContextMaxDepthException(
                         f"Activity stack exceeds max depth of {self.max_depth}"
                     )
 
                 yield
             finally:
-                assert self.is_active()
-                self.activity_stack.pop(self.head_index)
+                if inserted:
+                    assert self.is_active()
+                    self.activity_stack.pop(0 if self.reversed else -1)
 
         return fn()
@@
     def get_active(self) -> Optional[T]:
-        if self.is_active:
+        if self.is_active():
             return self.activity_stack[-1]
-        else:
-            return None
+        return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/sewing_kit/utils.py` around lines 94 - 126, Fix
three bugs in ActivityContext: in __init__/__call__ stop using
insert(self.head_index) for normal order — for reversed=True insert at index 0,
otherwise use list.append to push to the end so nested contexts unwind LIFO; in
__call__ only pop when this call actually pushed an item (set a local flag
pushed=False, set pushed=True after insert/append, and in finally only pop if
pushed using pop(0) for reversed or pop() for normal); and correct the is_active
check elsewhere to call the method (use self.is_active() not self.is_active) so
get_active() can return None for empty stacks. Ensure references changed affect
__init__, __call__, is_active, and get_active.

Comment on lines +357 to +373
def _get_device_for_distributed(
group: Optional[torch.distributed.ProcessGroup] = None,
) -> str:
"""
Determine the appropriate device for distributed communication based on the backend.
NCCL backend requires CUDA tensors, while Gloo supports both CPU and CUDA.
"""
if not torch.distributed.is_initialized():
return "cpu"

backend = torch.distributed.get_backend(group)
if backend == "nccl":
# NCCL requires CUDA tensors
return torch.cuda.current_device()
else:
# Gloo and other backends support CPU tensors
return "cpu"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the function
find . -name "utils.py" -path "*/puzzletron/sewing_kit/*" | head -5

Repository: NVIDIA/Model-Optimizer

Length of output: 113


🏁 Script executed:

# Search for the file containing the function
find . -type f -name "utils.py" | xargs grep -l "_get_device_for_distributed" 2>/dev/null

Repository: NVIDIA/Model-Optimizer

Length of output: 113


🏁 Script executed:

# Also check the entire puzzletron directory structure
find . -type d -name "puzzletron" 2>/dev/null

Repository: NVIDIA/Model-Optimizer

Length of output: 209


🏁 Script executed:

# Examine the function implementation
cat -n ./modelopt/torch/puzzletron/sewing_kit/utils.py | sed -n '350,380p'

Repository: NVIDIA/Model-Optimizer

Length of output: 1165


🏁 Script executed:

# Find all callers of _get_device_for_distributed
rg "_get_device_for_distributed" -B 2 -A 5 --no-heading

Repository: NVIDIA/Model-Optimizer

Length of output: 2217


🏁 Script executed:

# Search for distributed_recv_obj and _object_to_tensor to see how the device is used
rg "distributed_recv_obj|_object_to_tensor" -B 2 -A 10 --no-heading -t py

Repository: NVIDIA/Model-Optimizer

Length of output: 10214


🏁 Script executed:

# Let me check the exact line where .to(device) is called to understand the context better
cat -n ./modelopt/torch/puzzletron/sewing_kit/utils.py | sed -n '395,415p'

Repository: NVIDIA/Model-Optimizer

Length of output: 724


🌐 Web query:

PyTorch torch.cuda.current_device return type and Tensor.to() device parameter documentation

💡 Result:

  • torch.cuda.current_device() return type: It returns the index of the currently selected CUDA device as a Python int. [1]

  • Tensor.to() “device” argument (what you can pass):

    • Tensor.to() supports a call form torch.to(device=None, dtype=None, ...), which returns a tensor on the specified device (and optionally dtype). [2]
    • For the device you can pass a torch.device, a properly formatted device string (e.g. "cuda:1", "cpu", "mps"), or an integer device ordinal (interpreted as the current accelerator type’s device index). [3]

Sources: PyTorch docs for torch.cuda.current_device() [1], torch.Tensor.to() [2], and torch.device / device argument conventions [3].


Fix the function return type and use consistent torch.device objects.

The function currently has an inconsistent return type—it returns a string "cpu" in some branches but torch.cuda.current_device() (an integer) in others. While PyTorch's Tensor.to() and _object_to_tensor() technically accept integer device indices, the documented and preferred approach is to pass torch.device objects for clarity and consistency. The return type annotation -> str is also incorrect. Update the function to return torch.device objects consistently.

🚚 Suggested fix
 def _get_device_for_distributed(
     group: Optional[torch.distributed.ProcessGroup] = None,
-) -> str:
+) -> torch.device:
     """
     Determine the appropriate device for distributed communication based on the backend.
     NCCL backend requires CUDA tensors, while Gloo supports both CPU and CUDA.
     """
     if not torch.distributed.is_initialized():
-        return "cpu"
+        return torch.device("cpu")
 
     backend = torch.distributed.get_backend(group)
     if backend == "nccl":
         # NCCL requires CUDA tensors
-        return torch.cuda.current_device()
+        return torch.device("cuda", torch.cuda.current_device())
     else:
         # Gloo and other backends support CPU tensors
-        return "cpu"
+        return torch.device("cpu")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/sewing_kit/utils.py` around lines 357 - 373, Update
_get_device_for_distributed to consistently return torch.device objects (change
the return type annotation from -> str to -> torch.device); when not initialized
return torch.device("cpu"), and when backend is "nccl" return
torch.device("cuda", torch.cuda.current_device()) (or torch.device("cuda",
index) using torch.cuda.current_device()) so both branches return torch.device
instances and usage with Tensor.to() is consistent.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (8)
modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py (2)

60-61: Confusing TypeVar redefinition.

T_DataClass is defined as a TypeVar at line 61, then shadowed at line 482 with a type annotation. The line 482 syntax T_DataClass: TypeVar = Type[dataclasses.dataclass] is confusing—it annotates T_DataClass as having type TypeVar but assigns a Type value. If line 482 is intended as a type alias, consider using TypeAlias:

♻️ Proposed fix
-T_DataClass: TypeVar = Type[dataclasses.dataclass]
+# Type alias for dataclass types (used in _dataclass_from_dict)
+DataClassType = Type[T_DataClass]

Or simply remove line 482 and use the existing T_DataClass TypeVar with the function's generic signature.

Also applies to: 482-482

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` around lines
60 - 61, T_DataClass is defined as a TypeVar at the top and then re-used
incorrectly at the later declaration `T_DataClass: TypeVar =
Type[dataclasses.dataclass]`, which shadows and confuses the type system; remove
the second declaration and either (A) use the original TypeVar `T_DataClass` in
the generic function signatures that need it, or (B) if you meant a type alias
for dataclass types, replace the second declaration with a proper alias using
`TypeAlias` (e.g., `T_DataClass: TypeAlias = Type[dataclasses.dataclass]`).
Ensure all functions referencing `T_DataClass` (the generic parameter) import
and use the original TypeVar instead of the mistaken reassignment.

28-28: Module-level side effect may cause unexpected behavior.

Setting os.environ["PYTORCH_CUDA_ALLOC_CONF"] at import time affects all subsequent PyTorch CUDA allocations globally, even for code that only imports this module without using it. Consider moving this to within the functions that require it, or making it configurable.

♻️ Proposed refactor
-os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
-
 import pandas as pd
 import torch

Then set it within calculate_subblock_stats_for_puzzle_dir or launch_calc_subblock_stats if needed:

def launch_calc_subblock_stats(cfg: DictConfig) -> None:
    os.environ.setdefault("PYTORCH_CUDA_ALLOC_CONF", "expandable_segments:True")
    # ... rest of function
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` at line 28,
The module currently sets os.environ["PYTORCH_CUDA_ALLOC_CONF"] at import time
which produces a global side effect; move this environment change into the
runtime path that actually needs it (e.g., inside launch_calc_subblock_stats or
calculate_subblock_stats_for_puzzle_dir) and use os.environ.setdefault or assign
it at the start of that function so importing the module no longer mutates
global process state; update references to the top-level os.environ assignment
(the current module-level line) by removing it and adding the environment setup
at the start of the specified functions (launch_calc_subblock_stats /
calculate_subblock_stats_for_puzzle_dir).
modelopt/torch/puzzletron/mip/run_puzzle.py (1)

509-529: Typo in function name: "puzle" should be "puzzle".

The function gather_multi_layer_puzle_metrics is missing a 'z' in "puzzle". This inconsistency may cause confusion.

♻️ Proposed fix
-def gather_multi_layer_puzle_metrics(
+def gather_multi_layer_puzzle_metrics(
     single_replacement_validation_dir: Path,
 ) -> MultiLayerPuzzleMetrics:

Also update the call site at line 450:

-        gathered_metrics = gather_multi_layer_puzle_metrics(
+        gathered_metrics = gather_multi_layer_puzzle_metrics(
             args.single_block_replacement_validation_dir
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/mip/run_puzzle.py` around lines 509 - 529, Function
name has a typo: rename gather_multi_layer_puzle_metrics to
gather_multi_layer_puzzle_metrics; update the function definition and all call
sites that reference gather_multi_layer_puzle_metrics to use the corrected name,
and adjust any imports/exports or type annotations (e.g.,
MultiLayerPuzzleMetrics) that reference the old symbol so references remain
consistent.
modelopt/torch/puzzletron/tools/logger.py (2)

89-100: Stack frame depth is hardcoded and fragile.

The get_caller_location() relies on exactly 3 f_back calls to reach the original caller. If the helper functions (aprint, mprint, etc.) are refactored or if dist_log is called differently, this will silently report the wrong location.

Consider making the depth configurable or using inspect.stack() with a known function name anchor.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/logger.py` around lines 89 - 100,
get_caller_location currently walks a fixed 3-frame chain which is fragile;
update get_caller_location to inspect the full call stack (e.g., via
inspect.stack()) and locate the first frame that is not inside the logger
utilities (exclude function names like "get_caller_location", "aprint",
"mprint", "dist_log" and this module's filename) or accept an optional depth
parameter, then return that frame's filename:lineno; modify references to
get_caller_location accordingly so caller location resolution is robust to
refactors.

103-112: Global logger class mutation at import time.

Calling logging.setLoggerClass(DistributedLogger) affects all subsequent getLogger() calls process-wide, which may surprise users who import this module indirectly. This is intentional based on the design (relevant snippet shows distillation_provider.py relies on it), but worth documenting in the module docstring.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/logger.py` around lines 103 - 112, The module
currently calls logging.setLoggerClass(DistributedLogger) at import time which
mutates the global logger class for the whole process; update the module-level
docstring to explicitly state this side-effect and its implications (mention
DistributedLogger, logging.setLoggerClass, and that subsequent
logging.getLogger() calls will use the custom class), describing why this choice
was made and pointing callers to an alternative (e.g., an explicit init
function) if they need to avoid the global mutation.
modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py (3)

263-263: Use explicit exception instead of assert for input validation.

assert statements can be disabled with python -O. For runtime input validation, raise an explicit exception.

Proposed fix
-    assert solutions_path.exists(), f"{solutions_path=} does not exist"
+    if not solutions_path.exists():
+        raise FileNotFoundError(f"{solutions_path=} does not exist")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py`
at line 263, Replace the assert used for input validation (the line referencing
solutions_path) with an explicit exception: check whether
solutions_path.exists() and if not raise a FileNotFoundError (or ValueError)
with a clear message that includes solutions_path; update the code around the
validation (the block using solutions_path) to use this explicit raise so the
check cannot be skipped by running Python with optimizations.

284-284: Use mprint instead of print for distributed-aware logging.

This file imports and uses distributed utilities but uses bare print() here. For consistency with the rest of the Puzzletron codebase, consider using mprint from the logger module to ensure only the main rank prints.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py`
at line 284, Replace the bare print call that logs sorted solutions (the line
printing f"sorted solutions by {sort_solutions_by}. {vals[:10]=} {vals[-10:]=}")
with the distributed-aware logger mprint so only main rank prints; import mprint
from the logger module if it's not already imported and call mprint with the
same formatted message, referencing the existing variables sort_solutions_by and
vals to preserve output content and context.

188-191: TODO comment for symlink checkpoint saving.

There's a TODO indicating that save_checkpoint_as_symlinks is not currently supported.

Would you like me to open an issue to track implementing symlink-based checkpoint saving?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py`
around lines 188 - 191, The TODO about symlink-based checkpoint saving indicates
missing work around save_checkpoint_as_symlinks inside the dist.is_master()
branch; open a tracking issue titled "Support symlink checkpoint saving in
Puzzletron" that references save_checkpoint_as_symlinks and the checkpoint save
flow used in validate_puzzle_with_multi_replacements.py (the dist.is_master()
block) and documents expected behavior, platforms, and possible implementation
approaches; meanwhile, replace the silent pass with an explicit runtime-visible
note (e.g., a processLogger.warning or raising NotImplementedError) and add a
TODO comment that references the new issue ID so callers know symlink saving is
intentionally unsupported for now.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/prune/importance_hooks/compare_module_outputs.py`:
- Around line 184-203: The --trust-inputs flag is inverted causing unsafe
default loading; update the two torch.load calls in compare_module_outputs.py
(the ref_data and comp_data loads) to pass weights_only=not args.trust_inputs so
that by default weights_only=True (safe) and when the user supplies
--trust-inputs it becomes weights_only=False (trusted), and adjust the
parser.add_argument help text if needed to reflect this semantics.

In `@modelopt/torch/puzzletron/mip/run_puzzle.py`:
- Around line 408-412: Typo in the error string: update the message in the
run_puzzle.py block that checks for "subblock_stats_args" (the if that
references args and puzzle_profile and calls mprint/sys.exit) to refer to
"subblock_stats_args" instead of the incorrect "subblock_stats_arrs"; locate the
check using the identifiers args, puzzle_profile, and mprint and change only the
quoted error text to the correct key name.

In `@modelopt/torch/puzzletron/tools/checkpoint_utils.py`:
- Around line 107-118: The function init_empty_module changes global dtype via
torch.set_default_dtype which is not thread-safe; protect the set/restore and
the call to skip_init by introducing a module-level threading.Lock (e.g.,
_default_dtype_lock) and wrap the torch.set_default_dtype(...) / skip_init(...)
/ restore sequence in a with _default_dtype_lock: block (use try/finally to
ensure the original dtype is restored), so concurrent calls cannot interleave;
reference init_empty_module, torch.set_default_dtype, skip_init, and the new
_default_dtype_lock in your change.
- Around line 146-156: The code accesses model_config.block_configs directly
which can raise AttributeError if that attribute is absent; change the check to
safely query the attribute (e.g., use getattr(model_config, "block_configs",
None) or hasattr(model_config, "block_configs")) in the block inside the try in
the function that calls load_model_config (referencing model_config,
checkpoint_dir, trust_remote_code), warn and return False when the attribute is
missing instead of allowing an exception to be raised; keep the existing warning
text but ensure it triggers when the attribute is absent.

In `@modelopt/torch/puzzletron/tools/logger.py`:
- Around line 62-66: The NotImplementedError raised when validating the ranks
parameter incorrectly omits "last" from the displayed allowed choices; update
the error message in the validation block that checks variable ranks (the raise
NotImplementedError(...) near the ranks check) to include "last" in the list of
valid choices (e.g., "['all', 'main', 'local_main', 'last']") or, better,
reference a single ALLOWED_RANKS constant used for the check to keep the message
and the validation in sync.

In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py`:
- Around line 241-242: There is a bare "except:" in the try/except block in
validate_puzzle_with_multi_replacements.py which will catch BaseException
subclasses (e.g., KeyboardInterrupt/SystemExit); replace "except:" with "except
Exception:" to only catch regular exceptions (keep the existing "pass" or add
minimal handling/logging if desired) so that system-level exceptions can
propagate; locate the try/except around the puzzle validation logic (the bare
except at lines shown) and make this single-line change.
- Around line 234-245: The _load_tokenizer function currently hardcodes
trust_remote_code=True creating an RCE risk; change its signature to accept a
trust_remote_code: bool = False parameter and pass that parameter into both
AutoTokenizer.from_pretrained calls (replace the hardcoded True), and update any
callers (the code that invokes _load_tokenizer) to pass through the intended
trust_remote_code value (defaulting to False) so callers can opt in explicitly
when loading tokenizers; reference function name _load_tokenizer and the
AutoTokenizer.from_pretrained invocations to locate the changes.

---

Nitpick comments:
In `@modelopt/torch/puzzletron/mip/run_puzzle.py`:
- Around line 509-529: Function name has a typo: rename
gather_multi_layer_puzle_metrics to gather_multi_layer_puzzle_metrics; update
the function definition and all call sites that reference
gather_multi_layer_puzle_metrics to use the corrected name, and adjust any
imports/exports or type annotations (e.g., MultiLayerPuzzleMetrics) that
reference the old symbol so references remain consistent.

In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py`:
- Around line 60-61: T_DataClass is defined as a TypeVar at the top and then
re-used incorrectly at the later declaration `T_DataClass: TypeVar =
Type[dataclasses.dataclass]`, which shadows and confuses the type system; remove
the second declaration and either (A) use the original TypeVar `T_DataClass` in
the generic function signatures that need it, or (B) if you meant a type alias
for dataclass types, replace the second declaration with a proper alias using
`TypeAlias` (e.g., `T_DataClass: TypeAlias = Type[dataclasses.dataclass]`).
Ensure all functions referencing `T_DataClass` (the generic parameter) import
and use the original TypeVar instead of the mistaken reassignment.
- Line 28: The module currently sets os.environ["PYTORCH_CUDA_ALLOC_CONF"] at
import time which produces a global side effect; move this environment change
into the runtime path that actually needs it (e.g., inside
launch_calc_subblock_stats or calculate_subblock_stats_for_puzzle_dir) and use
os.environ.setdefault or assign it at the start of that function so importing
the module no longer mutates global process state; update references to the
top-level os.environ assignment (the current module-level line) by removing it
and adding the environment setup at the start of the specified functions
(launch_calc_subblock_stats / calculate_subblock_stats_for_puzzle_dir).

In `@modelopt/torch/puzzletron/tools/logger.py`:
- Around line 89-100: get_caller_location currently walks a fixed 3-frame chain
which is fragile; update get_caller_location to inspect the full call stack
(e.g., via inspect.stack()) and locate the first frame that is not inside the
logger utilities (exclude function names like "get_caller_location", "aprint",
"mprint", "dist_log" and this module's filename) or accept an optional depth
parameter, then return that frame's filename:lineno; modify references to
get_caller_location accordingly so caller location resolution is robust to
refactors.
- Around line 103-112: The module currently calls
logging.setLoggerClass(DistributedLogger) at import time which mutates the
global logger class for the whole process; update the module-level docstring to
explicitly state this side-effect and its implications (mention
DistributedLogger, logging.setLoggerClass, and that subsequent
logging.getLogger() calls will use the custom class), describing why this choice
was made and pointing callers to an alternative (e.g., an explicit init
function) if they need to avoid the global mutation.

In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py`:
- Line 263: Replace the assert used for input validation (the line referencing
solutions_path) with an explicit exception: check whether
solutions_path.exists() and if not raise a FileNotFoundError (or ValueError)
with a clear message that includes solutions_path; update the code around the
validation (the block using solutions_path) to use this explicit raise so the
check cannot be skipped by running Python with optimizations.
- Line 284: Replace the bare print call that logs sorted solutions (the line
printing f"sorted solutions by {sort_solutions_by}. {vals[:10]=} {vals[-10:]=}")
with the distributed-aware logger mprint so only main rank prints; import mprint
from the logger module if it's not already imported and call mprint with the
same formatted message, referencing the existing variables sort_solutions_by and
vals to preserve output content and context.
- Around line 188-191: The TODO about symlink-based checkpoint saving indicates
missing work around save_checkpoint_as_symlinks inside the dist.is_master()
branch; open a tracking issue titled "Support symlink checkpoint saving in
Puzzletron" that references save_checkpoint_as_symlinks and the checkpoint save
flow used in validate_puzzle_with_multi_replacements.py (the dist.is_master()
block) and documents expected behavior, platforms, and possible implementation
approaches; meanwhile, replace the silent pass with an explicit runtime-visible
note (e.g., a processLogger.warning or raising NotImplementedError) and add a
TODO comment that references the new issue ID so callers know symlink saving is
intentionally unsupported for now.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cb7e3231-2259-4883-8ea7-c1ebb8e0538a

📥 Commits

Reviewing files that changed from the base of the PR and between 75651cc and 2a170b9.

📒 Files selected for processing (10)
  • modelopt/torch/prune/importance_hooks/base_hooks.py
  • modelopt/torch/prune/importance_hooks/compare_module_outputs.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_converter.py
  • modelopt/torch/puzzletron/mip/run_puzzle.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
  • modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
  • modelopt/torch/puzzletron/tools/checkpoint_utils.py
  • modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py
  • modelopt/torch/puzzletron/tools/logger.py
  • modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py
✅ Files skipped from review due to trivial changes (1)
  • modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_converter.py
  • modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py

Comment on lines +107 to +118
def init_empty_module(
module_cls: type[NNModule],
dtype: torch.dtype,
*init_args,
**init_kwargs,
) -> NNModule:
default_dtype = torch.get_default_dtype()
current_device = torch.ones(1).device
torch.set_default_dtype(dtype)
module = skip_init(module_cls, *init_args, device=current_device, **init_kwargs)
torch.set_default_dtype(default_dtype)
return module
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Global default dtype manipulation is not thread-safe.

torch.set_default_dtype() modifies global state. If multiple threads call init_empty_module concurrently, they may interfere with each other's dtype settings. Consider using a lock or torch.set_default_dtype as a context manager if available in the target PyTorch version.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/checkpoint_utils.py` around lines 107 - 118,
The function init_empty_module changes global dtype via torch.set_default_dtype
which is not thread-safe; protect the set/restore and the call to skip_init by
introducing a module-level threading.Lock (e.g., _default_dtype_lock) and wrap
the torch.set_default_dtype(...) / skip_init(...) / restore sequence in a with
_default_dtype_lock: block (use try/finally to ensure the original dtype is
restored), so concurrent calls cannot interleave; reference init_empty_module,
torch.set_default_dtype, skip_init, and the new _default_dtype_lock in your
change.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

PR #1121 introduces Puzzletron, an end-to-end heterogeneous LLM compression framework based on the Puzzle paper (arxiv 2411.19146). It adds a 6-step pipeline (activation scoring → pruning → library building → scoring → MIP optimization → model realization) with support for 9 model architectures across 8 families. The PR includes an AnyModel abstraction layer (Converter/ModelDescriptor factories), a Sewing Kit for distributed module graph stitching, MIP-based neural architecture search, expert removal pruning for MoE models, and bypassed training with multi-threaded child initialization. This is a large addition (~94 files) consolidating 18+ sub-PRs.

Issues Found

[Correctness] Thread safety concern in child_init.py shared dictionary mutation

File: modelopt/torch/puzzletron/tools/bypassed_training/child_init.py lines 685-697

Workers in a ThreadPoolExecutor call keys.pop() on a shared dictionary. Each thread processes a different layer_idx, so different key prefixes, meaning keys don't overlap in practice. However, dict.pop() is not guaranteed thread-safe in CPython for concurrent mutation, and nothing in the code documents or enforces this invariant. A future refactor adding overlapping keys or reordering layers could introduce silent data corruption.

Suggestion: Pass each worker a pre-filtered copy of keys, or use a lock around shared dict mutation.

[Correctness] default_output_reducer_fn ignores accumulator — trap for custom reducers

File: modelopt/torch/puzzletron/sewing_kit/core.py lines 92, 460-468

The output reduction loop passes acc=None on the first iteration. The default reducer (default_output_reducer_fn) simply returns input_override, ignoring acc entirely, so it works. But any custom reducer that tries to accumulate (the natural interpretation of acc) will fail on the first call when acc is None. This is a latent correctness trap.

Suggestion: Initialize acc to a well-defined identity value, or document that reducers must handle acc=None on first call.

[Readability] print() instead of mprint() in distributed code

File: modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py line 299

print(f"{dist.rank()} loaded state_dict shard")

All other logging in this module uses mprint() (rank-0 only). This print() fires on every rank, producing noisy output in multi-GPU runs.

[Readability] Unresolved TODO in main entry point

File: modelopt/torch/puzzletron/puzzletron.py line 45

# TODO @colunga: fix this, Hydra config should be immutable after this point

The main pipeline entry point mutates the config object after initialization. This should be resolved before merge or converted to a tracked issue.

[Duplicated Code] Redundant helper functions in MIP solver

File: modelopt/torch/puzzletron/mip/run_puzzle.py lines 728-755

_none_add and _none_max duplicate the logic of _none_add_list and _none_max_list (which handle the single-element case). The scalar versions could be replaced with calls to the list versions.

[Tests] Very limited unit test coverage for high-risk modules

Files: tests/unit/torch/puzzletron/test_convert_anymodel.py (only unit test), tests/gpu/torch/puzzletron/test_puzzletron.py (E2E only)

The test suite has strong E2E coverage (9 model architectures with regression values), but zero unit tests for:

  • Sewing Kit (core.py, 883 lines) — the distributed stitching framework
  • Child initialization (child_init.py, 1181 lines) — the largest file with thread-pool logic
  • Replacement library building (build_replacement_library.py, 616 lines)
  • MIP solver (run_puzzle.py, 760 lines)
  • Sharded checkpoint utilities (sharded_checkpoint_utils.py, 404 lines)

The only unit test (test_convert_anymodel.py) covers a single model family (Qwen3). E2E GPU tests catch integration regressions but won't isolate bugs in individual components. Given the complexity of modules like sewing_kit and child_init, this is a significant gap.

[Readability] Wildcard-style broad imports in several modules

Multiple files use from module import * or import large numbers of symbols without clear need. This makes it harder to trace dependencies across the 94-file addition.

Suggestions

  1. Add unit tests for sewing_kit, child_init, and MIP solver — even basic tests for the reducer loop, thread-pool key partitioning, and None-handling helpers would catch the latent issues identified above.
  2. Replace shared dict mutation in child_init.py with per-worker key sets to eliminate the thread-safety concern entirely.
  3. Resolve or file the TODO at puzzletron.py:45 — config immutability is important for reproducibility in distributed settings.
  4. Standardize logging — replace the stray print() with mprint() in sharded_checkpoint_utils.py.

Overall Assessment

This is an impressive, well-structured compression framework with clean factory abstractions and strong E2E test coverage across 9 model architectures. The AnyModel Converter/Descriptor pattern is extensible, and the pipeline orchestration in puzzletron.py is clear. The main concerns are: (1) a thread-safety design fragility in the largest file, (2) a latent correctness trap in the sewing kit's reducer interface, and (3) the near-absence of unit tests for complex core modules. None of these are showstoppers for the current code paths, but they create risk for future maintenance. The E2E tests provide a strong safety net, and the codebase is well-organized for its size.


…and hf export (#1122)

- In nemo:26.02.01 container, we have DistillationProvider fix for
homogeneous models already. That seems sufficient for Heterogeneous
models as well so removing copied DistillationProvider to simplify
- Replace hacky megatron to hf export logic with simplified one

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Refactor**
* Reworked distillation and HuggingFace export flow to use upstream
bridge/export APIs, removed local monkey-patching and extra exception
logging, and simplified distributed cleanup.
* **Chores**
* Consolidated and renamed Qwen3 / Qwen3-VL model and converter
registrations; updated pruning configs, CLI export flags, and packaging
lint/dependency settings.
* **Tests**
* Updated integration tests to use Qwen3 checkpoints and adjusted export
verification.
* **Documentation**
  * Updated README example to reflect new CLI usage.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants