Skip to content

Built docs on GPU#237

Merged
navidcy merged 18 commits into
mainfrom
navidcy-patch-1
May 20, 2026
Merged

Built docs on GPU#237
navidcy merged 18 commits into
mainfrom
navidcy-patch-1

Conversation

@navidcy
Copy link
Copy Markdown
Member

@navidcy navidcy commented May 14, 2026

Attempt to deal with #234

@navidcy
Copy link
Copy Markdown
Member Author

navidcy commented May 14, 2026

@giordano see the CUDA.versioninfo() output at https://github.com/NumericalEarth/NumericalEarth.jl/actions/runs/25834595689/job/75906773612?pr=237#step:7:21

CUDA toolchain: 
- runtime 12.9, artifact installation
- driver 535.247.1 for 12.2
- compiler 12.9
CUDA libraries: 
- cuBLAS: 12.9.1
- cuSPARSE: 12.5.10
- cuSOLVER: 11.7.5
- cuFFT: 11.4.1
- cuRAND: 10.3.10
- CUPTI: 2025.2.1 (API 12.9.1)
- NVML: 12.0.0+535.247.1
Julia packages: 
- CUDACore: 6.0.0
- GPUArrays: 11.5.3
- GPUCompiler: 1.10.0
- KernelAbstractions: 0.9.41
- CUDA_Driver_jll: 13.2.1+2
- CUDA_Compiler_jll: 0.4.3+0
- CUDA_Runtime_jll: 0.21.0+1
Toolchain:
- Julia: 1.12.5
- LLVM: 18.1.7
Environment:
- JULIA_CUDA_USE_COMPAT: false
1 device:
  0: NVIDIA TITAN V (sm_70, 11.770 GiB / 12.000 GiB available)
CUDA.versioninfo() = nothing

@giordano
Copy link
Copy Markdown
Member

That wouldn't explain the error

┌ Error: Your NVIDIA TITAN V GPU (compute capability 7.0) is not supported on CUDA 13+.
│ Please use a device with at least capability 7.5, or downgrade your NVIDIA driver to below v580.
└ @ CUDACore /storage5/github-action-runners/julia-depot/packages/CUDACore/eDgUR/lib/cudadrv/state.jl:225

the driver is way below v580, and the toolkit is 12.9, not 13+.

@navidcy
Copy link
Copy Markdown
Member Author

navidcy commented May 14, 2026

So how do we upgrade the toolkit? Could I do it?

@navidcy
Copy link
Copy Markdown
Member Author

navidcy commented May 14, 2026

Or the driver also?

@giordano
Copy link
Copy Markdown
Member

No, nothings should be changed! My point is that the output isn't consistent with the error message. Also, the build is actually progressing, no?

@giordano
Copy link
Copy Markdown
Member

Upgrading the driver would actually wreak everything, no one should touch it.

Copy link
Copy Markdown
Member

@giordano giordano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd merge this PR and keep monitoring the situation. When there's an error we can see what was versioninfo(). As I said above, the current output doesn't explain the error message you saw before at all

Comment thread docs/make.jl Outdated
Comment thread docs/make.jl Outdated
@navidcy
Copy link
Copy Markdown
Member Author

navidcy commented May 14, 2026

The build progresses but if I include a literated example on GPU it'll fail!

So how do we make the GPU work? This is I think tartarus GPU, so perhaps it reached its lifetime?

navidcy and others added 2 commits May 14, 2026 11:04
Co-authored-by: Mosè Giordano <765740+giordano@users.noreply.github.com>
Co-authored-by: Mosè Giordano <765740+giordano@users.noreply.github.com>
@giordano
Copy link
Copy Markdown
Member

The build progresses but if I include a literated example on GPU it'll fail!

But the error doesn't make any sense.

@giordano giordano added the build all examples add this label to build all the examples in the PR label May 14, 2026
@giordano giordano closed this May 14, 2026
@giordano giordano reopened this May 14, 2026
Comment thread docs/make.jl Outdated
@giordano
Copy link
Copy Markdown
Member

giordano commented May 14, 2026

This got past the point where there was the error in previous builds.

This may have been fixed by JuliaPackaging/Yggdrasil#13712, if so that'd finally explain the error message: before that PR, CUDA toolkit 13+ was installed, causing the error message above (which would then make sense again). But now we're getting CUDA toolkit 12.9 being installed automatically, and that solves the compatibility issues.

@navidcy
Copy link
Copy Markdown
Member Author

navidcy commented May 14, 2026

So nothing we did here, just coincidentally it was fixed upstream?

Great!

@giordano
Copy link
Copy Markdown
Member

Yes. Not entirely coincidental though: that PR was spurred by JuliaGPU/CUDA.jl#3132. However Oceananigans is still broken: CliMA/Oceananigans.jl#5577. Can you please do the same debugging? I'm away from computer and sorely need to sleep at this point

@giordano
Copy link
Copy Markdown
Member

Uhm, actually the label didn't work because

NUMERICAL_EARTH_LABEL_BUILD_ALL_EXAMPLES: 'build all examples'
doesn't match the label name. Could you fix it? I really must go now.

@navidcy
Copy link
Copy Markdown
Member Author

navidcy commented May 14, 2026

Seems that the GPU examples didn't built? Despite the label?

@simone-silvestri
Copy link
Copy Markdown
Member

changed the label to match the if condition and restarted the build

@giordano
Copy link
Copy Markdown
Member

The fact that https://github.com/NumericalEarth/NumericalEarth.jl/actions/runs/25838710508/job/75919202625 has been running for over 4 hours and a half is a good sign, right?

@simone-silvestri
Copy link
Copy Markdown
Member

GPU goes through, the speedy-weather coupled simulation crashes though
https://github.com/NumericalEarth/NumericalEarth.jl/actions/runs/25838710508/job/75919202625

@giordano
Copy link
Copy Markdown
Member

Ok, that seems unrelated to the GPU issue though

@navidcy
Copy link
Copy Markdown
Member Author

navidcy commented May 15, 2026

ERROR: LoadError: LoadError: ArgumentError: write failed, IOStream is not writeable
in expression starting at /storage5/github-action-runners/actions-runner-3/_work/NumericalEarth.jl/NumericalEarth.jl/docs/src/literated/global_climate_simulation.md:1

https://github.com/NumericalEarth/NumericalEarth.jl/actions/runs/25845095025/job/75938479243?pr=237#step:7:64

Is tartarus out of space?

@navidcy
Copy link
Copy Markdown
Member Author

navidcy commented May 16, 2026

I get the same error locally so it's not tartarus fault; something changed in Oceananigans/SpeedyWeather/ClimaSeaIce?

@navidcy navidcy merged commit 4c66e02 into main May 20, 2026
1 check failed
@navidcy navidcy deleted the navidcy-patch-1 branch May 20, 2026 21:30
@navidcy
Copy link
Copy Markdown
Member Author

navidcy commented May 20, 2026

I merged and let's see...
I Veros is breaking; see #263

@navidcy navidcy removed the DO NOT MERGE ⚠️ DON'T EVEN THINK ABOUT IT label May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build all examples add this label to build all the examples in the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants