-
Notifications
You must be signed in to change notification settings - Fork 128
Setting option for node draining by external controllers #952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Setting option for node draining by external controllers #952
Conversation
|
Thanks for your PR,
To skip the vendors CIs, Maintainers can use one of:
|
30699af to
cc4a257
Compare
Pull Request Test Coverage Report for Build 19669073220Details
💛 - Coveralls |
SchSeba
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I am fine with having this here. but can we please have a more generic name for the variable?
controllers/helper.go
Outdated
|
|
||
| // UseMaintenanceOperatorDrainer indicates if internal drain controller is disabled | ||
| // and draining will be done by external NVIDIA maintenance operator | ||
| func UseMaintenanceOperatorDrainer() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we please have this in the vars and consts folder and not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cc4a257 to
62632d8
Compare
62632d8 to
237e08b
Compare
16474e5 to
cd01dcf
Compare
a7fbd43 to
c3d7edf
Compare
daaf0f6 to
7fed536
Compare
|
/test-all |
pkg/daemon/daemon.go
Outdated
|
|
||
| // add external drainer annotation if enabled | ||
| if vars.UseExternalDrainer { | ||
| if err := utils.AnnotateNode(ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we annotate nodeState instead ?
as we store related drain state in nodeState obj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| } | ||
| } | ||
|
|
||
| func setupDrainController(mgr ctrl.Manager, restConfig *rest.Config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should take a bit different approach.
we could always start the drain controller and then skip drain requests if use-external-drainer annotation is set.
that way, if there are any "in-flight" drains they will complete even if we switched on the external-drainer functionality.
WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about using the same annotations as the regular drain flow?
- the daemon would set the sriovnetworknodestate.DesiredState == "Drain_Required"
- the in-operator drainer would do nothing
- the external drainer would drain the node using its own logic, then set the node.CurrentState=DrainComplete
would it be a cleaner implementation?
7fed536 to
ade293d
Compare
b2877d5 to
40509de
Compare
62ddbb0 to
24a2bfe
Compare
e0ne
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
b8c3b8c to
721a627
Compare
|
/test-all |
914739d to
30a21f9
Compare
|
|
||
| *NOTE:* In the future we are going to drop the node annotation and only use the SriovNetworkNodeState | ||
|
|
||
| *NOTE:* Internal drain controller can be disabled by exposing the following `USE_EXTERNAL_DRAINER` env variable. This means that drain operations will be done externally, for example by utilizing [NVIDIA maintenance OP](https://github.com/Mellanox/maintenance-operator). In addition, `SriovNetworkPoolConfig` will not take any effect during drain procedure, since the maintenance operator will be in charge of parallel node operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine adding that change also here not only in the README and update the updatedDate in the doc.
| func setupDrainController(mgr ctrl.Manager, restConfig *rest.Config, | ||
| platformsHelper platforms.Interface, scheme *runtime.Scheme) error { | ||
| if vars.UseExternalDrainer { | ||
| setupLog.Info("'UseExternalDrainer' is set, draining will be done externally") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a note on why we still setup the drain controller here even if UseExternalDrainer is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more complicated.
if we are in the middle, and we update the config-daemon yaml it will start new pods that will add the label.
in this case we will be in the middle of configuration with the 2 labels in parallel
adrianchiris
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM added minor nit to clarify why we need to start drain controller when external drainer is used.
30a21f9 to
fd9607c
Compare
| } | ||
|
|
||
| func createNode(ctx context.Context, nodeName string) (*corev1.Node, *sriovnetworkv1.SriovNetworkNodeState) { | ||
| func createNode(ctx context.Context, nodeName string, useExternalDrainer bool) (*corev1.Node, *sriovnetworkv1.SriovNetworkNodeState) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please, avoid the boolean parameter as it reduce readability.
as alternatives:
a. change the parameter to additionalAnnotation map[string]string
b. add a tweak parameter like
tweak func(*corev1.Node, *sriovnetworkv1.SriovNetworkNodeState) so that a function can customize the creation of the node objects
c. remove the parameter and update the k8s objects after the creation
I prefer b, but the others would work too
| Name: nodeName, | ||
| Namespace: vars.Namespace, | ||
| Labels: map[string]string{ | ||
| Annotations: map[string]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea about how these tests were working before this line change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I remember right the controller adds them if they don't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must say I don't follow this sorry.
the daemon will still request drain and this check https://github.com/k8snetworkplumbingwg/sriov-network-operator/pull/952/files#diff-a53b7b593d3d778e62eaeeafa40088656f9212bfa2c2b7991df15fa78e60b0f0R256
will never pass.
also you can have a race as we do the annotation update twice in
// add external drainer nodestate annotation if flag is enabled
if vars.UseExternalDrainer {
err := utils.AnnotateObject(ctx, desiredNodeState,
consts.NodeStateExternalDrainerAnnotation, "true", dn.client)
if err != nil {
funcLog.Error(err, "failed to add nodestate external drainer annotation")
return false, err
}
}
// annotate both node and node state with drain or reboot
annotation := consts.DrainRequired
if reqReboot {
annotation = consts.RebootRequired
}
return true, dn.annotate(ctx, desiredNodeState, annotation)
}
and I don't understand how the daemon will know that he is able to continue with the configuration because the drain was done.
| Name: nodeName, | ||
| Namespace: vars.Namespace, | ||
| Labels: map[string]string{ | ||
| Annotations: map[string]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I remember right the controller adds them if they don't exist.
|
|
||
| // remove external drainer nodestate annotation if exists | ||
| annotations := desiredNodeState.GetAnnotations() | ||
| if _, ok := annotations[consts.NodeStateExternalDrainerAnnotation]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why we need this here?
| func setupDrainController(mgr ctrl.Manager, restConfig *rest.Config, | ||
| platformsHelper platforms.Interface, scheme *runtime.Scheme) error { | ||
| if vars.UseExternalDrainer { | ||
| setupLog.Info("'UseExternalDrainer' is set, draining will be done externally") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more complicated.
if we are in the middle, and we update the config-daemon yaml it will start new pods that will add the label.
in this case we will be in the middle of configuration with the 2 labels in parallel
…sable SRIOV OP drain controller, in favor of using maintenance OP to drive node drain aspects Signed-off-by: Ido Heyvi <[email protected]>
…rnal-drainer=true' in case exteranl drainer is enabled The motivation is for external drainer verification, that SRIOV operator is set with external drainer Signed-off-by: Ido Heyvi <[email protected]>
fd9607c to
c050234
Compare
SRIOV OP internal drain controller can be disabled, through
USE_EXTERNAL_DRAINER. For example draining can be performed by external NVIDIA maintenance operator