Ability to add extra volumes and extra volume mounts to daemonsets

We would like to manage daemonsets like `gpu-feature-discovery` and `dcgm-exporter` on GKE via the gpu-operator where the driver is installed in a non standard location `/home/kubernetes/bin/nvidia/`

We do not want to use the `device-plugin` and `container-toolkit` via the gpu-operator because 
- GKE NAP does not allow labelling nodes to disable the GKE's default device plugin 
- Just enabling container-toolkit via the gpu-operator can cause race conditions where GKE's device plugin registers devices before gpu-operator's toolkit gets chance to run

Sadly this means we have to play by rules of GKE's `runc` which calls GKE's own `nvidia-container-cli` variant which does not inject devices, nvidia-smi and libnvml to `gpu-feature-discovery` and `dcgm-exporter` since these pods do not ask for any gpu devices using `resources.limits`

Ability to add volumes and mounts will allow to bypass this limitation by manually exposing devices, binaries and libraries

```
      volumes:
        - name: dev
          hostPath:
            path: /dev
            type: ''
        - name: nvidia-install-dir-host
          hostPath:
            path: /home/kubernetes/bin/nvidia
            type: ''
        - name: nvidia-config
          hostPath:
            path: /etc/nvidia
            type: ''
      containers:
        - name: nvidia-dcgm-exporter
          volumeMounts:
            - name: dev
              mountPath: /dev
            - name: nvidia-install-dir-host
              mountPath: /usr/local/nvidia
            - name: nvidia-config
              mountPath: /etc/nvidia
          env:
            - name: DCGM_EXPORTER_LISTEN
              value: ':9400'
            - name: DCGM_EXPORTER_KUBERNETES
              value: 'true'
            - name: DCGM_EXPORTER_COLLECTORS
              value: /etc/dcgm-exporter/dcp-metrics-included.csv
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName
            - name: DCGM_EXPORTER_KUBERNETES_GPU_ID_TYPE
              value: uid
            - name: NVIDIA_INSTALL_DIR_HOST
              value: /home/kubernetes/bin/nvidia
            - name: NVIDIA_INSTALL_DIR_CONTAINER
              value: /usr/local/nvidia
            - name: LD_LIBRARY_PATH
              value: >-
                /usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/local/cuda/lib64
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ability to add extra volumes and extra volume mounts to daemonsets #1532

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ability to add extra volumes and extra volume mounts to daemonsets #1532

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions