Built docs on GPU#237
Conversation
|
@giordano see the CUDA toolchain:
- runtime 12.9, artifact installation
- driver 535.247.1 for 12.2
- compiler 12.9
CUDA libraries:
- cuBLAS: 12.9.1
- cuSPARSE: 12.5.10
- cuSOLVER: 11.7.5
- cuFFT: 11.4.1
- cuRAND: 10.3.10
- CUPTI: 2025.2.1 (API 12.9.1)
- NVML: 12.0.0+535.247.1
Julia packages:
- CUDACore: 6.0.0
- GPUArrays: 11.5.3
- GPUCompiler: 1.10.0
- KernelAbstractions: 0.9.41
- CUDA_Driver_jll: 13.2.1+2
- CUDA_Compiler_jll: 0.4.3+0
- CUDA_Runtime_jll: 0.21.0+1
Toolchain:
- Julia: 1.12.5
- LLVM: 18.1.7
Environment:
- JULIA_CUDA_USE_COMPAT: false
1 device:
0: NVIDIA TITAN V (sm_70, 11.770 GiB / 12.000 GiB available)
CUDA.versioninfo() = nothing |
|
That wouldn't explain the error the driver is way below v580, and the toolkit is 12.9, not 13+. |
|
So how do we upgrade the toolkit? Could I do it? |
|
Or the driver also? |
|
No, nothings should be changed! My point is that the output isn't consistent with the error message. Also, the build is actually progressing, no? |
|
Upgrading the driver would actually wreak everything, no one should touch it. |
giordano
left a comment
There was a problem hiding this comment.
I'd merge this PR and keep monitoring the situation. When there's an error we can see what was versioninfo(). As I said above, the current output doesn't explain the error message you saw before at all
|
The build progresses but if I include a literated example on GPU it'll fail! So how do we make the GPU work? This is I think tartarus GPU, so perhaps it reached its lifetime? |
Co-authored-by: Mosè Giordano <765740+giordano@users.noreply.github.com>
Co-authored-by: Mosè Giordano <765740+giordano@users.noreply.github.com>
But the error doesn't make any sense. |
|
This got past the point where there was the error in previous builds. This may have been fixed by JuliaPackaging/Yggdrasil#13712, if so that'd finally explain the error message: before that PR, CUDA toolkit 13+ was installed, causing the error message above (which would then make sense again). But now we're getting CUDA toolkit 12.9 being installed automatically, and that solves the compatibility issues. |
|
So nothing we did here, just coincidentally it was fixed upstream? Great! |
|
Yes. Not entirely coincidental though: that PR was spurred by JuliaGPU/CUDA.jl#3132. However Oceananigans is still broken: CliMA/Oceananigans.jl#5577. Can you please do the same debugging? I'm away from computer and sorely need to sleep at this point |
|
Uhm, actually the label didn't work because doesn't match the label name. Could you fix it? I really must go now. |
|
Seems that the GPU examples didn't built? Despite the label? |
|
changed the label to match the if condition and restarted the build |
|
The fact that https://github.com/NumericalEarth/NumericalEarth.jl/actions/runs/25838710508/job/75919202625 has been running for over 4 hours and a half is a good sign, right? |
|
GPU goes through, the speedy-weather coupled simulation crashes though |
|
Ok, that seems unrelated to the GPU issue though |
ERROR: LoadError: LoadError: ArgumentError: write failed, IOStream is not writeable
in expression starting at /storage5/github-action-runners/actions-runner-3/_work/NumericalEarth.jl/NumericalEarth.jl/docs/src/literated/global_climate_simulation.md:1Is tartarus out of space? |
|
I get the same error locally so it's not tartarus fault; something changed in Oceananigans/SpeedyWeather/ClimaSeaIce? |
|
I merged and let's see... |
Attempt to deal with #234