Skip to content

Conversation

@robert-mcdermott
Copy link

@robert-mcdermott robert-mcdermott commented Nov 2, 2025

When attempting to follow the DGX Spark unsloth playbook I ran into an error some dependency issues. This PR addresses all these issues and was tested and found to work correctly now on my DGX Spark. Here's a description of the changes:

Step 6, has the wrong url to use with curl, it will download a html github page of the script, not the raw contents of the script:

Wrong:

curl -O https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/unsloth/assets/test_unsloth.py

Correct:

curl -O https://raw.githubusercontent.com/NVIDIA/dgx-spark-playbooks/refs/heads/main/nvidia/unsloth/assets/test_unsloth.py

Then when trying to execute the test_unsloth.py script unsloth had an error indicating that it needed a different version of the 'trl' library.

After that was fixed, the script now progressed further but enountered another error when it lacked the 'hf_transfer' package required to download models from huggingface.

Also, with the original docker command, the user will lose their fine-tuned model ouputs when they exit the container. I've updated the command to map the local 'outputs' directory, to the 'workspace/outputs' directory in the container so the user doesn't lose their work.

The playbook changes provided by this PR have been tested on DGX Spark and the unsloth playbook now works correctly.

@robert-mcdermott
Copy link
Author

robert-mcdermott commented Nov 6, 2025

This PR also adds the missing hf_transfer package

@margaretz-nv
Copy link
Collaborator

@robert-mcdermott Thank you so much for investigating the issue and providing a fix. Unfortunately we don't take contributions to this repository at the moment. I'll take the issue back to the team to provide a fix.

I can close out the PR for you after we provide the fix. Thank you for your understanding!

@margaretz-nv
Copy link
Collaborator

Hi @robert-mcdermott , this issue is fixed in the instructions. Feel free to close out your PR. Many thanks!

@robert-mcdermott
Copy link
Author

robert-mcdermott commented Nov 7, 2025

@margaretz-nv

@robert-mcdermott Thank you so much for investigating the issue and providing a fix. Unfortunately we don't take contributions to this repository at the moment. I'll take the issue back to the team to provide a fix.

I can close out the PR for you after we provide the fix. Thank you for your understanding!

Then why have a public repository if you don't want the community to use it? By limiting community participation you are limiting your success with this as building a community is critical. You clearly didn't test the playbooks before publishing them. If you let the community contribute, they can keep them up to date, improve them and add new playbooks as new models and frameworks are released. I'm going to hard fork this and maintain my own version and create new playbooks where others can contribute. thanks

@robert-mcdermott
Copy link
Author

robert-mcdermott commented Nov 7, 2025

@margaretz-nv

Hi @robert-mcdermott , this issue is fixed in the instructions. Feel free to close out your PR. Many thanks!

You partially fixed it, but you still have a dependency version issue, but I'll leave that for you to figure out.

@raphaelamorim
Copy link

@margaretz-nv

@robert-mcdermott Thank you so much for investigating the issue and providing a fix. Unfortunately we don't take contributions to this repository at the moment. I'll take the issue back to the team to provide a fix.
I can close out the PR for you after we provide the fix. Thank you for your understanding!

Then why have a public repository if you don't want the community to use it? By limiting community participation you are limiting your success with this as building a community is critical. You clearly didn't test the playbooks before publishing them. If you let the community contribute, they can keep them up to date, improve them and add new playbooks as new models and frameworks are released. I'm going to hard fork this and maintain my own version and create new playbooks where others can contribute. thanks

Robert, I think it's because the NVIDIA team uses gitlab internally and they have some automation to generate the playbooks and publish them to this repo. Right now they're still trying to make sure the environment is stable and since this is a highly visible and completely new product, they decided to take a safe route on this. There's a lot of great contributions happening in the Forums, and I think you should spend more time there than here for now. I think they will figure it out a way of processing external patches in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants