Skip to content
Draft
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
a8ccde1
Add test recipe for next version of prgenv-gnu with upsteam libfabric…
msimberg Sep 4, 2025
466a169
Add prgenv-gnu/next to config
msimberg Sep 5, 2025
0ea7a0e
No zen for prgenv-gnu/next
msimberg Sep 5, 2025
ab2e9bb
Use special branch of alps-cluster-config
msimberg Sep 5, 2025
1ae291c
Disable xpmem kernel-module
msimberg Sep 5, 2025
f2b9f01
Use GCC 14.2 again
msimberg Sep 10, 2025
e80770d
Use libfabric 2.3
msimberg Oct 1, 2025
7cfa922
Add +gdrcopy variant to libfabric in prgenv-gnu/next
msimberg Oct 3, 2025
7ae62d1
Don't exclude gcc module
msimberg Oct 3, 2025
04027f8
Don't use custom alps-cluster-config
msimberg Oct 29, 2025
e42a796
Move prgenv-gnu/next to prgenv-gnu/25.11
msimberg Oct 29, 2025
fb5a928
Remove custom xpmem from prgenv-gnu
msimberg Oct 29, 2025
25124f1
Add prgenv-gnu/25.11 mc recipe
msimberg Oct 29, 2025
c649e60
Try gcc 14.3 again for prgenv-gnu/25.11
msimberg Oct 29, 2025
46675fb
Add openmpi view to prgenv-gnu/25.11
msimberg Oct 29, 2025
c769617
Use gcc 14.2 again in prgenv-gnu/25.11
msimberg Oct 29, 2025
ee2d15e
Merge branch 'prgenv-gnu-next' into prgenv-gnu-ompi
msimberg Oct 29, 2025
d1b83c6
Move openmpi environment to separate uenv
msimberg Oct 29, 2025
f84670a
Fix config.yaml
msimberg Oct 29, 2025
d9ea85b
No openmpi on eiger, for now...
msimberg Oct 29, 2025
e3647d7
Rename openmpi view
msimberg Oct 29, 2025
d055ea7
Add gmp to prgenv-gnu-openmpi/25.11
msimberg Oct 29, 2025
5d76a78
Add openmpi feature to prgenv-gnu-openmpi reframe metadata
msimberg Oct 31, 2025
4610066
Add nccl reframe feature to prgenv-gnu-openmpi/25.11
msimberg Nov 11, 2025
8fb2658
Add netcdf-cxx4 to to prgenv-gnu-openmpi/25.11
msimberg Nov 13, 2025
780be85
Update libfabric spec
msimberg Nov 14, 2025
cea6437
Add patch for GPU-GPU communication with lnx in libfabric
msimberg Nov 13, 2025
d8c2054
Merge remote-tracking branch 'origin/main' into prgenv-gnu-ompi
msimberg Nov 14, 2025
015c11e
Try libfabric with system cray-xpmem
msimberg Nov 20, 2025
fa4d7ed
Upgrade spack-packages for openmpi 5.0.9
msimberg Nov 21, 2025
aaba38a
Disable libfabric patch temporarily
msimberg Nov 21, 2025
1fc7a38
Pin cuda to 12
msimberg Nov 21, 2025
f530498
Use [email protected]
msimberg Nov 21, 2025
d08b5ee
Update recipes/prgenv-gnu-openmpi/25.11/gh200/config.yaml
msimberg Nov 21, 2025
d151291
Update recipes/prgenv-gnu-openmpi/25.11/gh200/config.yaml
msimberg Nov 21, 2025
b3bba95
Merge remote-tracking branch 'origin/main' into prgenv-gnu-ompi
msimberg Nov 25, 2025
0f39f0a
Remove extra prgenv-gnu config entry
msimberg Nov 25, 2025
5f720c2
Merge remote-tracking branch 'origin/main' into prgenv-gnu-ompi
msimberg Nov 28, 2025
637aaa9
Remove custom repo for openmpi recipe
msimberg Nov 28, 2025
ca98862
Fix name of openmpi uenv
msimberg Dec 2, 2025
9639f29
Use simpler network spec for openmpi recipe
msimberg Dec 2, 2025
5a8fcb2
Add mc recipe for openmpi
msimberg Dec 2, 2025
c946017
Bump spack-packages for openmpi recipe
msimberg Dec 2, 2025
f9b9466
Try lifting compiler restriction in gh200 recipe
msimberg Dec 2, 2025
6535c10
Move openmpi recipe to 25.12
msimberg Dec 2, 2025
3f5cfcd
Enable openmpi recipe for eiger
msimberg Dec 2, 2025
e3ad543
Remove repo link in openmpi recipe
msimberg Dec 2, 2025
f847ed6
Fix config paths
msimberg Dec 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -307,6 +307,15 @@ uenvs:
daint: [gh200]
santis: [gh200]
eiger: [zen2]
"25.11":
recipes:
zen2: 25.11/mc
zen3: 25.11/mc
gh200: 25.11/gh200
deploy:
clariden: [gh200]
daint: [gh200]
santis: [gh200]
"25.7":
recipes:
mi200: 25.7/amdgpu
Expand All @@ -324,6 +333,17 @@ uenvs:
santis: [gh200]
bristen: [a100]
eiger: [zen2]
prgenv-gnu-openmpi:
"25.11":
recipes:
# zen2: 25.11/mc
# zen3: 25.11/mc
gh200: 25.11/gh200
deploy:
clariden: [gh200]
daint: [gh200]
santis: [gh200]
# eiger: [zen2]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do: test an eiger build.

prgenv-nvfortran:
"24.11":
recipes:
Expand Down
2 changes: 2 additions & 0 deletions recipes/prgenv-gnu-openmpi/25.11/gh200/compilers.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
gcc:
version: "14.2"
10 changes: 10 additions & 0 deletions recipes/prgenv-gnu-openmpi/25.11/gh200/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
name: prgenv-gnu
spack:
repo: https://github.com/spack/spack.git
commit: releases/v1.1
packages:
repo: https://github.com/spack/spack-packages.git
commit: a896fdbe5d01981cbc6f9b5139a5d551ac2fe248 # develop on 2025-11-21
store: /user-environment
description: GNU Compiler toolchain with OpenMPI, Python, CMake and other development tools.
version: 2
51 changes: 51 additions & 0 deletions recipes/prgenv-gnu-openmpi/25.11/gh200/environments.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
gcc-env:
compiler: [gcc]
network:
mpi: [email protected]
specs:
- [email protected] +gdrcopy fabrics=lnx,shm,cxi,xpmem # TODO: Add to/update alps-cluster-config
unify: true
specs:
- boost +chrono +filesystem +iostreams +mpi +python +regex +serialization +shared +system +timer
- cmake
- fftw
- fmt
- gmp
- gsl
- hdf5+cxx+hl+fortran
- kokkos +aggressive_vectorization ~alloc_async cuda_arch=90 +cuda_constexpr +cuda_lambda ~cuda_relocatable_device_code ~cuda_uvm cxxstd=17 +openmp +pic +serial +shared +tuning +wrapper
- kokkos-kernels +blas +cublas +cusparse +cusolver +execspace_cuda +execspace_openmp +execspace_serial +lapack +memspace_cudaspace +openmp scalars=float,double,complex_float,complex_double +serial +shared +superlu
- kokkos-tools +mpi +papi
- netlib-scalapack
- lua
- libtree
- lz4
- meson
- netcdf-c
- netcdf-cxx
- netcdf-cxx4
- netcdf-fortran
- ninja
- openblas threads=openmp
- osu-micro-benchmarks
- papi
- python
- zlib-ng
# add GPU-specific packages here, for easier comparison with mc version
- nccl
- nccl-tests
- cuda@12
- xcb-util-cursor
- aws-ofi-nccl
- superlu
variants:
- +mpi
- +cuda
- cuda_arch=90a
views:
default:
link: roots
uenv:
add_compilers: true
prefix_paths:
LD_LIBRARY_PATH: [lib, lib64]
16 changes: 16 additions & 0 deletions recipes/prgenv-gnu-openmpi/25.11/gh200/extra/reframe.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
default:
features:
- cuda
- mpi
- openmpi
- nccl
- nccl-tests
- openmp
- osu-micro-benchmarks
- prgenv
- serial
cc: mpicc
cxx: mpic++
ftn: mpifort
views:
- default
23 changes: 23 additions & 0 deletions recipes/prgenv-gnu-openmpi/25.11/gh200/modules.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
modules:
# Paths to check when creating modules for all module sets
prefix_inspections:
bin:
- PATH
lib:
- LD_LIBRARY_PATH
lib64:
- LD_LIBRARY_PATH

default:
arch_folder: false
# Where to install modules
roots:
tcl: /user-environment/modules
tcl:
all:
autoload: none
hash_length: 0
exclude_implicits: true
exclude: []
projections:
all: '{name}/{version}'
1 change: 1 addition & 0 deletions recipes/prgenv-gnu-openmpi/25.11/gh200/repo
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
diff --git a/prov/lnx/src/lnx_ops.c b/prov/lnx/src/lnx_ops.c
index ba3f097..2d6c187 100644
--- a/prov/lnx/src/lnx_ops.c
+++ b/prov/lnx/src/lnx_ops.c
@@ -455,6 +455,7 @@ ssize_t lnx_trecv(struct fid_ep *ep, void *buf, size_t len, void *desc,
struct lnx_ep *lep;
const struct iovec iov = {.iov_base = buf, .iov_len = len};

+ cuda_set_sync_memops(buf);
lep = container_of(ep, struct lnx_ep, le_ep.ep_fid.fid);
if (!lep)
return -FI_ENOSYS;
@@ -666,6 +667,7 @@ ssize_t lnx_tsenddata(struct fid_ep *ep, const void *buf, size_t len, void *desc
fi_addr_t core_addr;
void *core_desc = desc;

+ cuda_set_sync_memops(buf);
lep = container_of(ep, struct lnx_ep, le_ep.ep_fid.fid);
if (!lep)
return -FI_ENOSYS;
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
from spack_repo.builtin.packages.libfabric.package import Libfabric as BuiltinLibfabric

from spack.package import *

class Libfabric(BuiltinLibfabric):
# This patches missing synchronization for GPU-GPU transfers in the lnx
# provider of libfabric. The patch is from from a comment on the
# corresponding issue:
# https://github.com/ofiwg/libfabric/issues/11231#issue-3252163450.
#
# It's unclear if this is a good patch, but it's sufficient for testing of
# the lnx provider. If and when the correct fix is published the patch can
# be backported on the upstream libfabric package.
#
# The patch may not apply for all versions (tested with 2.3.1), but there
# is no version constraint as the patch is essential. Builds should fail if
# the patch doesn't apply.
# patch("issue-11231-cuda-sync.patch", when="fabrics=lnx")
pass