Add HPC job mapping support for DCGM Exporter

## Problem

In HPC environments, DCGM Exporter cannot correlate GPU metrics with
workload manager job IDs (e.g., Slurm jobs) because it lacks access to
job mapping files.

According to the  DCGM Exporter documentation (https://github.com/NVIDIA/dcgm-exporter?tab=readme-ov-file#enabling-hpc-job-mapping-on-dcgm-exporter), GPU-to-job mapping can be enabled by setting `DCGM_HPC_JOB_MAPPING_DIR` and providing access to a directory where the HPC cluster creates job mapping files.

Currently, the GPU Operator's ClusterPolicy CRD supports configuring DCGM Exporter's environment variables but does not support custom volumes/volumeMounts. This prevents HPC workload managers (like Slurm) from enabling GPU-to-job mapping in DCGM metrics.

The workflow requires:

- DCGM Exporter container needs to mount this directory to read the mapping files
- Environment variable: `DCGM_HPC_JOB_MAPPING_DIR`

The missing piece here is the mount. The ClusterPolicy CRD doesn't expose `volumes` or `volumeMounts` fields for DCGM Exporter The base DaemonSet template https://github.com/NVIDIA/gpu-operator/blob/main/assets/state-dcgm-exporter/0800_daemonset.yaml has limited volume mounts:

## Proposed Solution

Add `hpcJobMapping` configuration to `DCGMExporterSpec` allowing users
to enable and configure the job mapping directory.

## PR

https://github.com/NVIDIA/gpu-operator/pull/1894

## Testing

All existing tests pass + new test for HPC job mapping transformation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add HPC job mapping support for DCGM Exporter #1893

Problem

Proposed Solution

PR

Testing

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add HPC job mapping support for DCGM Exporter #1893

Description

Problem

Proposed Solution

PR

Testing

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions