-
Notifications
You must be signed in to change notification settings - Fork 26
prgenv-gnu with ROCm 7 #273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
|
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
ROCm is a menace: https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/551234120955960/1440398897047560/-/jobs/12201564988#L3724. hipblaslt seems to be picking up amdclang++ from the system... needs further investigation. |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
Hi, |
I'll try it out, hopefully no bigger issues (though note that I have issues with other packages before this is usable unfortunately). |
|
spack/spack-packages#2287 to add rocm 7.1.0 is also currently open. It may help, it may make things worse... The PR description does mention a change to hipblaslt, which may change something. |
Yes, it looks like an older version of ROCm is installed on the system and it's choosing the incorrect version of amdclang++. I'll see if I can reproduce the issue and put in a fix. |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
I was able to reproduce and fix the previous error with hipblaslt by setting Tensile_COMPILER in the recipe: |
Good suggestion, thank you 👍 Any reason why this was not merged in |
I was trying to get my 7.1.0 PR merged for the past couple weeks, I didn't want to add any additional changes because I thought it might delay it getting merged. We'll add the change with 7.1.1: spack/spack-packages#2782 |
Great! Thank you very much for the clarification and the updates to the ROCm packages 👍 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
…with llvm-amdgpu to avoid rebuilding python packages
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
the gtl library of UPDATE 🤦🏻 cray-mpich |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
I haven't looked into this issue but I saw that there's not only I'm not really familiar with the |
|
@iomaganaris The problem is that the binary rpms for cray-mpich are linked against A workaround would be to build without the gtl/hsa library ( |
|
BTW, it occurred to me now that if we want to avoid the issues with HPE's precompiled binaries, we could make an OpenMPI uenv instead with ROCm 7. I'd make it a separate uenv, but it might be a faster/smoother option at the moment instead of trying to patch up HPE's binaries. What do you think? If ROCm 7 is otherwise building ok now I'd expect changing to OpenMPI to be relatively simple. I can try to set that up if you think it's useful. I'm anyway about to deploy #263. |
|
@msimberg I built q-e-sirius with openmpi+rocm7: I'm not sure how to run it correctly:
|
|
@simonpintarelli have a look here: https://docs.tds.cscs.ch/301/software/communication/openmpi/#uenv (not merged yet). The munge warning should be harmless if it ran otherwise (but I'm pretty sure you need to set the other variables mentioned on that page). That said, pmix might be set up differently on beverin and I haven't tested there. CXI should be the safe choice. LNX is faster if it works, but it may not work... |
|
I just did a quick test with |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
1 similar comment
|
cscs-ci run alps;system=beverin;uarch=mi200;uenv=prgenv-gnu:25.12 |
Just for testing builds, I don't know if this will work.