Draft: Feat: Enable torch compile #496

jiemingz · 2025-06-10T14:01:18Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Signed-off-by: Jimmy Zhang <[email protected]>

terrykong · 2025-06-10T15:52:17Z

examples/configs/dpo.yaml

Could you add this key to all the configs/recipes?

terrykong

Is this possible to unit test?

terrykong · 2025-07-09T00:29:59Z

@jiemingz is the only thing blocking this PR the seq-packing change since we need static shapes for torch.compile?

terrykong · 2025-07-09T03:57:12Z

Dependent on #300

SahilJain314 · 2025-07-23T20:43:33Z

Dtensor sequence packing has been merged. @ahmadki to support max-padding packed sequences in DTensor to enable torch.compile (fixed seqlen).

ahmadki · 2025-07-24T15:09:58Z

Dtensor sequence packing has been merged. @ahmadki to support max-padding packed sequences in DTensor to enable torch.compile (fixed seqlen).

tracking here

StrongerXi · 2025-07-29T18:48:54Z

nemo_rl/models/policy/dtensor_policy_worker.py

@@ -195,6 +196,9 @@ def __init__(
            custom_parallel_plan=self.cfg["dtensor_cfg"]["custom_parallel_plan"],
        )

+        if self.torch_compile:
+            self.model = torch.compile(model)


Could you try model.compile() instead? That should fix the _orig_mod issue. This is also the recommended way of compiling a model now. We'll work on throwing warnings and publicizing to raise awareness on this.

torch compile

9024892

Signed-off-by: Jimmy Zhang <[email protected]>

jiemingz requested a review from terrykong June 10, 2025 14:01

jiemingz self-assigned this Jun 10, 2025

terrykong reviewed Jun 10, 2025

View reviewed changes

examples/configs/dpo.yaml

Copy link

Contributor

terrykong Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add this key to all the configs/recipes?

terrykong reviewed Jun 10, 2025

View reviewed changes

StrongerXi reviewed Jul 29, 2025

View reviewed changes

terrykong linked an issue Aug 7, 2025 that may be closed by this pull request

torch.compile for training #4

Open

terrykong mentioned this pull request Aug 27, 2025

draft: feat: fused loss and logit to logprob conversion #994

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft: Feat: Enable torch compile #496

Draft: Feat: Enable torch compile #496

Uh oh!

jiemingz commented Jun 10, 2025

Uh oh!

terrykong Jun 10, 2025

Uh oh!

terrykong left a comment

Uh oh!

terrykong commented Jul 9, 2025

Uh oh!

terrykong commented Jul 9, 2025

Uh oh!

SahilJain314 commented Jul 23, 2025

Uh oh!

ahmadki commented Jul 24, 2025

Uh oh!

StrongerXi Jul 29, 2025

Uh oh!

Uh oh!

Draft: Feat: Enable torch compile #496

Are you sure you want to change the base?

Draft: Feat: Enable torch compile #496

Uh oh!

Conversation

jiemingz commented Jun 10, 2025

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

terrykong Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

terrykong commented Jul 9, 2025

Uh oh!

terrykong commented Jul 9, 2025

Uh oh!

SahilJain314 commented Jul 23, 2025

Uh oh!

ahmadki commented Jul 24, 2025

Uh oh!

StrongerXi Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!