-
Notifications
You must be signed in to change notification settings - Fork 99
Improve error msg in GetNodeIPs() (fix issue #321) #322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| if cd.Status.Nodes == nil { | ||
| return nil, fmt.Errorf("error getting status of nodes in ComputeDomain: %w", err) | ||
| return nil, fmt.Errorf("error getting status of nodes in ComputeDomain: no nodes set yet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not critical, but should this just be:
| return nil, fmt.Errorf("error getting status of nodes in ComputeDomain: no nodes set yet") | |
| return nil, fmt.Errorf("no nodes set") |
(I would remove the yet personally)
Going from Nodes == nil to error getting status of nodes implies intimate knowledge of GetComputDomain. Would it be better to just return an error from there instead?
We could also remove this check entirely sincelen(nil) == 0 it should fail the check bellow as well (assuming that NumNodes == 0 is not a valid configuration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Evan! Thanks for looking :)
... implies intimate knowledge of GetComputDomain. Would it be better to just return an error from there instead?
Good question. I'd say "no" because GetComputeDomain() is expected to return the CD object regardless of where the CD is in its lifecycle.
Hence, it can at times return a CD that does not have .Status.Nodes populated yet. That's an expected outcome IIUC.
Node status is added out-of-band via (potentially many calls from different nodes to) AddNodeStatusToComputeDomain() at an unpredictable point in time.
That is, via multiple calls to GetNodeIPs() we detect and respond to changes in a CD's lifecycle. Only within GetNodeIPs() it is an error when no node status has been set yet.
should this just be: ...
no nodes set
I like to shorten this further, yes! In terms of concatenation of error messages, let's optimize for readability in case where this is prefixed with "error getting node IPs: ..."
I like "error getting node IPs: no nodes set yet for ComputeDomain"
I would remove the yet personally
The "yet" does not convey critical information, agree. It does however vaguely communicate the notion of an expected, temporary error (as opposed to unexpected, permanent error -- a distinction that is best done not just with words like "yet" :-)).
We could also remove this check entirely sincelen(nil) == 0
Ack. But I think it's good to have the fine-grained output so that we can see state transitioning explicitly: no nodes seen -> not all nodes seen -> all nodes seen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I think having different error messages for the two cases is fine. On the use of "yet". Looking at the logic, it is not within the scope of this function to determine whether the error is treated as transient or not. I would then remove the "yet" and have intent to retry conveyed in the caller or through some other mechanism.
"error getting node IPs: no nodes set for ComputeDomain"
sounds good. (I won't fight hard on the "yet" :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to note is that the error getting node IPs would beis prepended in the caller, so here we would only have:
| return nil, fmt.Errorf("error getting status of nodes in ComputeDomain: no nodes set yet") | |
| return nil, fmt.Errorf("no nodes set for ComputeDomain") |
Update: The caller is:
nodeIPs, err := s.manager.GetNodeIPs(ctx, s.domain)
if err != nil {
return fmt.Errorf("error getting node IPs: %w", err)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate your arguments!
the "error getting node IPs" is prepended in the caller
:) yesss. That is what I meant when I wrote
where this is prefixed with "error getting node IPs: ..."
About "yet":
have intent to retry conveyed in the caller or through some other mechanism
it is not within the scope of this function to determine whether the error is treated as transient or not.
I appreciate this perspective. I share this perspective, had also shared it before.
I did weigh these above arguments against other arguments, because what's also true:
- no entity can distinguish those two cases -- only time does tell, in combination with a timeout criterion
- we know that typically this is a transient error (in the vast majority of the cases: all happy path cases)
- the fact that things "will (probably?) be retried again soon" is not easy to log in this case
Considering that big picture, adding the "yet" was my pragmatic way to indicate to the not-so-informed reader that this is generally not an error to worry about, a little bit of value added with just three letters.
I will remove the "yet": because we're all informed log readers :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done yet? Not yet, we haven't yet discussed:
not all nodes populated in ComputeDomain status yet
That's the current error message right below the one we're discussing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed commit, without the "yet" :).
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
elezar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jgehrcke. Looks great now.
A patch for #321.
To emit
instead of
If I am not mistaken, the condition of
cd.Status.Nodes == nilis not erroneous, but just a transient state where no node status has been set yet viaAddNodeStatusToComputeDomain().