Skip to content

Bug: Incorrect set_default_active_thread_percentage Behavior in Kubernetes Device Plugin with MPS #1494

@Atoms

Description

@Atoms

Summary

When using NVIDIA MPS with the Kubernetes device plugin, the set_default_active_thread_percentage value is being set incorrectly, leading to severe GPU underutilization on workloads scheduled to the same GPU.

This parameter is global per MPS daemon, and since there is one MPS daemon per GPU, incorrectly setting it (e.g. based on replica count) results in throttling all associated workloads to a fraction of the GPU capacity.

Observed Behavior

  • When running with the default active_thread_percentage applied by the device plugin, nvidia-smi or any other gpu monitoring tool shows GPU usage around 60% with 2 workloads. (adding more workloads GPU usage increases to 100% and applications start to slow down)
  • When applying set_active_thread_percentage 100 manually via nvidia-cuda-mps-control, the same workload drops to ~2–3% GPU usage, showing that resources are correctly shared and not artificially limited.
  • This confirms that the device plugin is configuring MPS with the wrong active_thread_percentage during initialization.

How to Reproduce

  • Deploy a GPU workload using the Kubernetes NVIDIA device plugin with MPS enabled.
  • Observe GPU utilization in DCGM/NVIDIA SMI
  • Exec into the workload pod and run:
echo "get_server_list" | nvidia-cuda-mps-control
SERVERID
echo "set_active_thread_percentage $SERVERID 100" | nvidia-cuda-mps-control
  • Restart workload to apply new values.
  • Observe that workload GPU usage is decreasing, and therefore can scale up. Application is not slowing down.

Root Cause

set_default_active_thread_percentage is applied based on replica count, but MPS only runs a single daemon per GPU, so this setting is shared across all workloads.

Expected Behavior

The device plugin should not override active_thread_percentage unless explicitly configured by the user.

Per-GPU or per-pod resource tuning should not be attempted in this manner without awareness of global MPS constraints.

Environment Details
GPU: e.g. NVIDIA RTX 4000
Driver version: 580.82.07
Container image: nvcr.io/nvidia/k8s-device-plugin:v0.17.4
Kubernetes version: 1.32.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions