Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 23 additions & 12 deletions deployments/helm/nvidia-dra-driver-gpu/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -43,36 +43,47 @@ Create chart name and version as used by the chart label.
{{- end }}

{{/*
Common labels
Standard labels: documented at
https://helm.sh/docs/chart_best_practices/labels/
Apply this to all high-level objects (Deployment, DaemonSet, ...).
Pod template labels are included here to deliver name+instance.
*/}}
{{- define "nvidia-dra-driver-gpu.labels" -}}
helm.sh/chart: {{ include "nvidia-dra-driver-gpu.chart" . }}
{{ include "nvidia-dra-driver-gpu.templateLabels" . }}
{{- if .Chart.AppVersion }}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we always define that (and if we don't, I want to know about it)

app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{ include "nvidia-dra-driver-gpu.templateLabels" . }}
{{- end }}

{{/*
Template labels
Copy link
Collaborator Author

@jgehrcke jgehrcke May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly allergic to a thing named templateLabels being documented with "Template labels" :)

Apply this to all pod templates (a smaller set of labels compared to
the set of standard labels above, to not clutter individual pods too
much). Note that these labels cannot be used to distinguish
components within this Helm chart.
*/}}
{{- define "nvidia-dra-driver-gpu.templateLabels" -}}
app.kubernetes.io/name: {{ include "nvidia-dra-driver-gpu.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- if .Values.selectorLabelsOverride }}
{{ toYaml .Values.selectorLabelsOverride }}
{{- end }}
Copy link
Collaborator Author

@jgehrcke jgehrcke May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the selector label (single label, I propose) strictly independent.

"Template labels" I believe always meant "pod template labels" and I think it really always meant "apply a common set of labels to all pods of this Helm chart".

By that nature, this set of labels cannot distinguish individual groups of pods within the Helm chart.

Which is what "selector labels" want to do.

In that sense, those two concepts (template labels and selector labels) are generally incompatible. Which goes unnoticed for the special case of there being just one component (one relevant group of pods) in the Helm chart.

{{- end }}

{{/*
Selector labels
Selector label: precisely filter for just the pods of the corresponding
Deployment, DaemonSet, .... That is, this label key/value pair must be
different per-component (a component name is a required argument). This
could be many labels, but we want to use just one (with a sufficiently
unique key).

TOOD: remove the override feature, or make the override work per-component.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the override before GA? @klueska do we know of anyone with SPECIFIC selector labels?

*/}}
{{- define "nvidia-dra-driver-gpu.selectorLabels" -}}
{{- if .Values.selectorLabelsOverride -}}
{{ toYaml .Values.selectorLabelsOverride }}
{{- if and (hasKey . "componentName") (hasKey . "context") -}}
{{- if .context.Values.selectorLabelsOverride -}}
{{ toYaml .context.Values.selectorLabelsOverride }}
{{- else -}}
{{ include "nvidia-dra-driver-gpu.templateLabels" . }}
Copy link
Collaborator Author

@jgehrcke jgehrcke May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the template labels in the selector labels.

The current logic will write something like nvidia-dra-driver-gpu-component=controller into the matchLabels selector. That is precise, and not conflicting (unless one wants it to conflict with something).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the nvidia-dra-driver-gpu.templateLabels helper above also pulls in Values.selectorLabelsOverride. Here we only include the templateLabels if the override is not set -- in which case the templateLabels will NOT include the selector labels. This does seem recursive / convoluted though.

Copy link
Collaborator Author

@jgehrcke jgehrcke May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking deeply. Very appreciated.

I will review this more carefully tomorrow.

I want to understand use cases that we want to support, and then remove complexity that is not currently needed.

(this nifty setup will have to be documented better so that it can survive if it must survive!)

Based on what I understand today, I don't think that template labels and selector labels should be entangled.

I think template labels, as far as I understood, are just labels added to any and all component deployed by this Helm chart. That looks good, and keeps things more or less organized.

Selector labels, on the other hand, have a deep technical (and internal, for hat matter) meaning. I am not sure if we even want to let people override those.

Will think about it more tomorrow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went into history. NVIDIA/k8s-device-plugin@4820ac1#r157221509.

Note that the nvidia-dra-driver-gpu.templateLabels helper above also pulls in Values.selectorLabelsOverride

I think it's better to change that then, here. Will do.

It's already a challenge for me to put into English words what a good distinction is between "common labels" and "template labels". I will work on that.

For me, it's easier to see a clear distinction between [common labels, template labels] and [selector labels]. I want to reinforce that distinction.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new commit with commentary declaring intent. Not perfect, but I think it's important for us to express the system that we want to use here.

{{ .context.Chart.Name }}-component: {{ .componentName }}
{{- end }}
{{- else -}}
fail "selectorLabels: both arguments are required: context, componentName"
{{- end }}
{{- end }}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ spec:
replicas: 1
selector:
matchLabels:
{{- include "nvidia-dra-driver-gpu.selectorLabels" . | nindent 6 }}
{{- include "nvidia-dra-driver-gpu.selectorLabels" (dict "context" . "componentName" "controller") | nindent 6 }}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I have now learned that this left-chomping-nindent-technique is the holy grail of Helm templates: helm/helm#3854 (comment)

template:
metadata:
{{- with .Values.controller.podAnnotations }}
Expand All @@ -34,6 +34,7 @@ spec:
{{- end }}
labels:
{{- include "nvidia-dra-driver-gpu.templateLabels" . | nindent 8 }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that templateLabels still includes selectorLabels. Is this intended now that selectorLabels have been removed from selector.matchLabels above?

Copy link
Collaborator Author

@jgehrcke jgehrcke May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that templateLabels still includes selectorLabels

Yes. That's fine.

I wanted to be minimally invasive. To me, that first and foremost meant that I did not want to remove any labels.

In the longer run, I want us to question the purpose of "selectorLabels" defined in our helm chart. What was the premise behind it? I assume it may be ill-posed in our case here

edit: okay, changed to using the selectorLabels approach again (now it's a dynamic label)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another update:

Note that templateLabels still includes selectorLabels

not anymore, see commentary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks: I think making selectorLabels more specific (and a reduced set is correct).

{{- include "nvidia-dra-driver-gpu.selectorLabels" (dict "context" . "componentName" "controller") | nindent 8 }}
spec:
{{- if .Values.controller.priorityClassName }}
priorityClassName: {{ .Values.controller.priorityClassName }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ metadata:
spec:
selector:
matchLabels:
{{- include "nvidia-dra-driver-gpu.selectorLabels" . | nindent 6 }}
{{- include "nvidia-dra-driver-gpu.selectorLabels" (dict "context" . "componentName" "kubelet-plugin") | nindent 6 }}
{{- with .Values.kubeletPlugin.updateStrategy }}
updateStrategy:
{{- toYaml . | nindent 4 }}
Expand All @@ -37,6 +37,7 @@ spec:
{{- end }}
labels:
{{- include "nvidia-dra-driver-gpu.templateLabels" . | nindent 8 }}
{{- include "nvidia-dra-driver-gpu.selectorLabels" (dict "context" . "componentName" "kubelet-plugin") | nindent 8 }}
spec:
{{- if .Values.kubeletPlugin.priorityClassName }}
priorityClassName: {{ .Values.kubeletPlugin.priorityClassName }}
Expand Down
Loading