-
Notifications
You must be signed in to change notification settings - Fork 99
Use per-component matchLabels selector in pod specs (fix #355) #359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| {{- toYaml . | nindent 8 }} | ||
| {{- end }} | ||
| labels: | ||
| {{- include "nvidia-dra-driver-gpu.templateLabels" . | nindent 8 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that templateLabels still includes selectorLabels. Is this intended now that selectorLabels have been removed from selector.matchLabels above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that templateLabels still includes selectorLabels
Yes. That's fine.
I wanted to be minimally invasive. To me, that first and foremost meant that I did not want to remove any labels.
In the longer run, I want us to question the purpose of "selectorLabels" defined in our helm chart. What was the premise behind it? I assume it may be ill-posed in our case here
edit: okay, changed to using the selectorLabels approach again (now it's a dynamic label)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another update:
Note that templateLabels still includes selectorLabels
not anymore, see commentary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks: I think making selectorLabels more specific (and a reduced set is correct).
|
I'd rather see something that still used the selectorLabels / templateLabels, but maybe it takes an optional input parameter to the template function where we can insert a component-type into all the labels. |
Can look into that. Are you ok with the rendered outcome, though? That is, with |
b7c1b58 to
6a203b9
Compare
|
Changed the approach to make the |
| {{- if .Values.selectorLabelsOverride -}} | ||
| {{ toYaml .Values.selectorLabelsOverride }} | ||
| {{- else -}} | ||
| {{ include "nvidia-dra-driver-gpu.templateLabels" . }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the template labels in the selector labels.
The current logic will write something like nvidia-dra-driver-gpu-component=controller into the matchLabels selector. That is precise, and not conflicting (unless one wants it to conflict with something).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the nvidia-dra-driver-gpu.templateLabels helper above also pulls in Values.selectorLabelsOverride. Here we only include the templateLabels if the override is not set -- in which case the templateLabels will NOT include the selector labels. This does seem recursive / convoluted though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking deeply. Very appreciated.
I will review this more carefully tomorrow.
I want to understand use cases that we want to support, and then remove complexity that is not currently needed.
(this nifty setup will have to be documented better so that it can survive if it must survive!)
Based on what I understand today, I don't think that template labels and selector labels should be entangled.
I think template labels, as far as I understood, are just labels added to any and all component deployed by this Helm chart. That looks good, and keeps things more or less organized.
Selector labels, on the other hand, have a deep technical (and internal, for hat matter) meaning. I am not sure if we even want to let people override those.
Will think about it more tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went into history. NVIDIA/k8s-device-plugin@4820ac1#r157221509.
Note that the nvidia-dra-driver-gpu.templateLabels helper above also pulls in Values.selectorLabelsOverride
I think it's better to change that then, here. Will do.
It's already a challenge for me to put into English words what a good distinction is between "common labels" and "template labels". I will work on that.
For me, it's easier to see a clear distinction between [common labels, template labels] and [selector labels]. I want to reinforce that distinction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new commit with commentary declaring intent. Not perfect, but I think it's important for us to express the system that we want to use here.
| nvidia-dra-driver-gpu-component: {{ .componentName }} | ||
| {{- end }} | ||
| {{- else -}} | ||
| fail "selectorLabels: both arguments are required: context, componentName" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adopted this approach from some blog post: named and required function arguments, with a more or less good error message upon not providing arguments.
Note that the outer chart context in the function is now not . anymore, but .context.
Don't we love learning DSLs...and their funkiness :)
refs:
https://blog.palark.com/advanced-helm-templating/
https://jiminbyun.medium.com/understanding-the-chart-context-in-helm-templates-0524b520b120
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify: The fail here just checks that we're "invoking" this helper correctly? Meaning we need to set both the context and the component name where we include it in some other helm template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! It's like an assert statement.
b7e0091 to
3042b5a
Compare
I made it a required parameter. Because not encoding that component-specific aspect led to the previous over-selection of pods. Working in this Helm templating logic feels rather brittle. Let's land an improvement, and then iterate later. The public-facing aspects here of course we need to more or less settle on (the label key/value names, which people may end up using in their kubectl commands, and we don't want to break those too often :-)). |
| {{- end }} | ||
| {{- if and (hasKey . "componentName") (hasKey . "context") -}} | ||
| {{- if .context.Values.selectorLabelsOverride -}} | ||
| {{ toYaml .context.Values.selectorLabelsOverride }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Since this is indented, is that going to mess with indentation in the rendered template?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the question.
I still don't know exactly, but based on a little bit more reading I think you might have the right hunch here and I will add a commit to dedent.
Most templating languages bring this type of challenge. They either respect or ignore the indentation in the source code. In Jinja, for example, it's ignored (so you can indent the source code to make it more readable w/o having to worry about breaking the output). That's not always the perfect choice either (which reminds me of pallets/jinja#178 (comment) from 2013).
I obviously indented for making things slightly more readable. And I could test that manually. Better: we need a test suite for our Helm chart that not only renders, but also deploys for different input parameters. Otherwise we cannot make such changes with confidence.
This question brought me to https://www.reddit.com/r/kubernetes/comments/1be15b8/helm_templating_language_was_a_mistake_90_of_helm/ which has this statement in the title:
90%+ of helm chart commits are bug fixes for indent
The discussion also brought me to https://leebriggs.co.uk/blog/2019/02/07/why-are-we-templating-yaml in which the author says
Needless to say, this isn’t what I want to spend my time doing. If fiddling around with whitespace requirements in a templating system doing something it’s not really designed for is what suits you, then I’m not going to stop you.
For the record, I had great success working with jsonnet in the past in the context of https://github.com/prometheus-operator/kube-prometheus.
Btw, the official Helm templating docs are wishywashy on the topic:

(https://helm.sh/docs/chart_best_practices/templates/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way: Not sure whether the following is useful: https://github.com/NVIDIA/k8s-device-plugin/blob/main/tests/helm/helm_template_test.go
d5af042 to
83f6677
Compare
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
83f6677 to
5f6c7a5
Compare
| app.kubernetes.io/instance: {{ .Release.Name }} | ||
| {{- if .Values.selectorLabelsOverride }} | ||
| {{ toYaml .Values.selectorLabelsOverride }} | ||
| {{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the selector label (single label, I propose) strictly independent.
"Template labels" I believe always meant "pod template labels" and I think it really always meant "apply a common set of labels to all pods of this Helm chart".
By that nature, this set of labels cannot distinguish individual groups of pods within the Helm chart.
Which is what "selector labels" want to do.
In that sense, those two concepts (template labels and selector labels) are generally incompatible. Which goes unnoticed for the special case of there being just one component (one relevant group of pods) in the Helm chart.
| selector: | ||
| matchLabels: | ||
| {{- include "nvidia-dra-driver-gpu.selectorLabels" . | nindent 6 }} | ||
| {{- include "nvidia-dra-driver-gpu.selectorLabels" (dict "context" . "componentName" "controller") | nindent 6 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I have now learned that this left-chomping-nindent-technique is the holy grail of Helm templates: helm/helm#3854 (comment)
| {{- define "nvidia-dra-driver-gpu.labels" -}} | ||
| helm.sh/chart: {{ include "nvidia-dra-driver-gpu.chart" . }} | ||
| {{ include "nvidia-dra-driver-gpu.templateLabels" . }} | ||
| {{- if .Chart.AppVersion }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we always define that (and if we don't, I want to know about it)
| {{- end }} | ||
|
|
||
| {{/* | ||
| Template labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly allergic to a thing named templateLabels being documented with "Template labels" :)
| could be many labels, but we want to use just one (with a sufficiently | ||
| unique key). | ||
| TOOD: remove the override feature, or make the override work per-component. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove the override before GA? @klueska do we know of anyone with SPECIFIC selector labels?
elezar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jgehrcke.
I think this looks good from my side. Note that I don't have all the context required for all possible label combinations, but this PR does clean up the selector labels specifically. It also decouples these from the other sets of labels that are defined in the helpers which is a cleaner solution to start with.
|
Thanks for all the good feedback here. I am sure we want to do more work on this front (or even have to). |
A pragmatic proposal for addressing #355.
Consider a scenario with one controller and four plugin pods:
Without the patch, the following type/name selector for the controller yields all pods, and we after all do not see controller logs:
With the patch, the same command yields only controller logs, as expected:
With the patch, the following command refers to just the plugin pods (as expected):
(without the patch, this may erroneously yield controller logs as shown in #355).
Before the change, these are the labels for the controller selector:
I would love for others to chime in, but as far as I now understand the purpose of this selector the current value is wrong as it resolves to all pods deployed by the NVIDIA DRA Driver for GPUs. Instead, we want it to resolve to just the controller pods.
After the change:
CC @nojnhuh