Skip to content

Conversation

eeangelini
Copy link

@eeangelini eeangelini commented Sep 26, 2025

What does this PR do?

Fixes #495.

This PR updates the pinned torch version in pyproject.toml to be ~=2.6.0 (which is the latest version compatible with CUDA 12.4) instead of ~=2.0.0(which is compatible with CUDA 11.8 and 11.7).

The CUDA drivers on the internal AICS A100s were recently updated from 11.7/11.8 to 12.4, which was causing issues in the cellsmap repository: the pinned torch version there is 2.0.0 in order to have compatibility with cyto-dl. Therefore, in order to update the torch version in that repository, we also need to update the torch version here.

Before submitting

  • Did you make sure title is self-explanatory and the description concisely explains the PR?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you test your PR locally with pytest command?
  • Did you run pre-commit hooks with pre-commit run -a command?

Notes for reviewers

This is my first time contributing to cyto-dl, which is also no longer being actively maintained, so I wasn't sure who typically reviews PRs here. Please add anyone that I might have missed!

I also am not sure where to start testing to make sure that this change doesn't break anything.

@eeangelini eeangelini linked an issue Sep 26, 2025 that may be closed by this pull request
@eeangelini eeangelini added bug Something isn't working help wanted Extra attention is needed question Further information is requested dependencies Pull requests that update a dependency file labels Sep 26, 2025
@eeangelini
Copy link
Author

eeangelini commented Sep 26, 2025

UPDATE: As per @smishra3, this is not an issue with the torch version, but instead a problem with how 039 was un-MIG-ed and torch not being able to access CUDA_VISIBLE_DEVICES. @fatwir will take the lead on posting to #infra-eng-requests to make this issue visible before the other two machines get un-MIG-ed.

@eeangelini
Copy link
Author

Closing without merging as infra fixed the issue with the un-MIG-ed devices 🥳

@eeangelini eeangelini closed this Sep 26, 2025
@eeangelini eeangelini deleted the bug/upgrade-pytorch branch September 26, 2025 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dependencies Pull requests that update a dependency file help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade Pytorch
1 participant