Skip to content

Latest commit

 

History

History
86 lines (59 loc) · 2.32 KB

File metadata and controls

86 lines (59 loc) · 2.32 KB

Server Setup

Updates

To apply changes to the config, run the following:

sudo nix run github:wi2trier/gpu-server

Initial Setup

Ubuntu

First, install the dependencies for nix and the CUDA installation. The package(s) uidmap are needed for rootless podman.

sudo apt update
sudo apt upgrade -y
sudo apt install -y git curl wget uidmap

Nix

Then install nix using the DeterminateSystems installer.

curl -fsSL https://install.determinate.systems/nix | sh -s -- install

Afterwards, open a new shell to apply the changes (e.g., exit and reconnect via ssh). Then we can apply the system manager configuration for the first time.

sudo /nix/var/nix/profiles/default/bin/nix run github:wi2trier/gpu-server

Again open a new shell to apply the changes.

CUDA

First install the CUDA Toolkit and the NVIDIA Container Toolkit. Make sure to update the keyring url when changing the distro!

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
rm cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install -y cuda-drivers-580 cuda-toolkit-12-9 nvidia-container-toolkit

Restart the server to load the new driver.

sudo reboot

Verify Installation

Correctly setting up the CUDA drivers is crucial, so please verify that the following commands work as expected.

CUDA_VISIBLE_DEVICES=0 apptainer run --nv docker://ubuntu nvidia-smi
podman run --rm --device nvidia.com/gpu=0 ubuntu nvidia-smi

In addition, you may test the setup using the pytorch image:

CUDA_VISIBLE_DEVICES=0 apptainer run --nv docker://pytorch/pytorch python -c "import torch; print(torch.cuda.is_available())"
podman run --rm --device nvidia.com/gpu=0 pytorch/pytorch python -c "import torch; print(torch.cuda.is_available())"

The new Apptainer runtime using nvidia-container-cli currently does not work:

CUDA_VISIBLE_DEVICES=0 apptainer --debug run --nvccli docker://ubuntu nvidia-smi

Uninstall

sudo nix run github:wi2trier/gpu-server#system-uninstall