fix bug: fsdp2 cannnot run with npu, because the hardcode with cuda #3790

frozenleaves · 2025-09-22T08:52:33Z

What does this PR do?

fix the bug: when run fsdp2 on npu device, it will raise an error:


AssertionError: Torch not compiled with CUDA enabled.

S1ro1 · 2025-09-22T10:07:38Z

src/accelerate/utils/fsdp_utils.py

    from torch.distributed.tensor import distribute_tensor
-
-    # Model was previously copied to meta device
+    from accelerate.state import PartialState


Don't have to import here, accelerator is passed into the function, can take device from there.

done. Because some packages depend on version 1.7.0, such as llama-factory, etc., we hope to fix this issue on this version.

S1ro1 · 2025-09-22T10:08:04Z

Small nit, except of that lgtm, thank you for noticing!

S1ro1 · 2025-09-22T12:05:03Z

cc @SunMarc for a patch, not sure how we want to approach this

HuggingFaceDocBuilderDev · 2025-09-22T12:08:12Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc · 2025-09-22T14:37:47Z

@bot /style

github-actions · 2025-09-22T14:38:12Z

Style fix is beginning .... View the workflow run here.

SunMarc

Thanks ! I will probably do a patch in a few days

SunMarc · 2025-09-22T14:50:27Z

@bot /style

github-actions · 2025-09-22T14:51:03Z

Style fix is beginning .... View the workflow run here.

SunMarc · 2025-09-22T15:09:29Z

Can you fix the style ? the other failures are not related

frozenleaves · 2025-09-23T01:01:43Z

Can you fix the style ? the other failures are not related

ok, the fixed code have already pushed

…::TrainerUtilsTest::test_executable_batch_size - AssertionError: Lists differ: [64, 32, 16] != [64, 57, 51, 45, 40, 36, 32, 28, 25, 22, 19, 17, 15]

frozenleaves · 2025-09-25T09:36:41Z

@SunMarc Hi, I have try to fix the CI problem, I noticed that in the CI of accelerate is dependent on transformers’ UT, and there was a test case as follows：

    @require_accelerate
    def test_executable_batch_size(self):
        batch_sizes = []

        @find_executable_batch_size(starting_batch_size=64, auto_find_batch_size=True)
        def mock_training_loop_function(batch_size):
            nonlocal batch_sizes
            batch_sizes.append(batch_size)
            if batch_size > 16:
                raise RuntimeError("CUDA out of memory.")

        mock_training_loop_function()
        self.assertEqual(batch_sizes, [64, 57, 51, 45, 40, 36, 32, 28, 25, 22, 19, 17, 15])

This test case still judged by reducing 10% each time in the main branch of transformers, but in the implementation logic of find_executable_batch_size in accelerate, it was reduced by 50% each time, which led to the UT failing.

Here, I modified the implementation related to accelerate to make it pass the UT test. Please consider whether such a fix is reasonable or fix the corresponding test code of transformers.

SunMarc · 2025-09-25T11:35:24Z

Oh the real issue is that you are trying to merge this pr into this specific branch huggingface:v1.7.0-release. Can you reopen the PR with the right target branch ?

frozenleaves · 2025-09-25T12:07:30Z

Oh the real issue is that you are trying to merge this pr into this specific branch huggingface:v1.7.0-release. Can you reopen the PR with the right target branch ?

sure, is my operation correct？

SunMarc · 2025-09-26T15:20:10Z

I mean you need to recreate a pull request with the target branch main. Right now, you are still trying to merge into another branch:
frozenleaves wants to merge 2 commits into huggingface:v1.7.0-release from frozenleaves:v1.7.0-release

S1ro1 reviewed Sep 22, 2025

View reviewed changes

frozenleaves force-pushed the v1.7.0-release branch 2 times, most recently from 48a6dac to f8bbcf8 Compare September 22, 2025 11:36

SunMarc approved these changes Sep 22, 2025

View reviewed changes

fix bug: fsdp2 cannnot run with npu, because the hardcode with cuda

d5fbf81

frozenleaves force-pushed the v1.7.0-release branch from f8bbcf8 to d5fbf81 Compare September 23, 2025 00:57

fix transformers UT error: FAILED tests/trainer/test_trainer_utils.py…

e41b6d7

…::TrainerUtilsTest::test_executable_batch_size - AssertionError: Lists differ: [64, 32, 16] != [64, 57, 51, 45, 40, 36, 32, 28, 25, 22, 19, 17, 15]

frozenleaves closed this Sep 25, 2025

frozenleaves reopened this Sep 25, 2025

fix bug: fsdp2 cannnot run with npu, because the hardcode with cuda #3790

Are you sure you want to change the base?

fix bug: fsdp2 cannnot run with npu, because the hardcode with cuda #3790

Conversation

frozenleaves commented Sep 22, 2025

What does this PR do?

Uh oh!

S1ro1 Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

frozenleaves Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

S1ro1 commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

S1ro1 commented Sep 22, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 22, 2025

Uh oh!

SunMarc commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

SunMarc commented Sep 22, 2025

Uh oh!

frozenleaves commented Sep 23, 2025

Uh oh!

frozenleaves commented Sep 25, 2025

Uh oh!

SunMarc commented Sep 25, 2025

Uh oh!

frozenleaves commented Sep 25, 2025

Uh oh!

SunMarc commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

S1ro1 commented Sep 22, 2025 •

edited

Loading

SunMarc commented Sep 26, 2025 •

edited

Loading