-
Notifications
You must be signed in to change notification settings - Fork 0
Recreate firewall on unhealthy condition #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Gerrit91
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for coming up with a PR for this.
controllers/set/delete.go
Outdated
|
|
||
| for _, fw := range fws { | ||
| fw := fw | ||
| if c.isFirewallUnhealthy(fw) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With some small changes on your newly introduced struct I think something like this would be clearer and really point out that this is about timeouts:
status := evaluateFirewallConditions(fw)
switch {
case status.CreateTimeout || status.HealthTimeout:
r.Log.Info("firewall health or creation timeout exceeded, deleting from set", "firewall-name", fw.Name)
err := c.deleteFirewalls(r, fw)
if err != nil {
return nil, err
}
result = append(result, fw)
}The isProgressing that's used in setStatus would also not be required anymore as it can be derived in case all other cases do not match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it, what do you think now?
| if fw.Status.Phase == v2.FirewallPhaseCreating && timeSinceReconcile > allocationTimeout { | ||
| c.log.Info("create timeout reached") | ||
| return firewallConditionStatus{CreateTimeout: true} | ||
| } | ||
|
|
||
| if seedConnected && unhealthyTimeout != 0 && created && timeSinceReconcile > unhealthyTimeout { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if allocationTimeout is set to be able to disable this check
| fw.Status.ControllerStatus = &v2.ControllerConnection{} | ||
| } | ||
| //add a fake concile so the unhealty firewall gets deleted | ||
| fw.Status.ControllerStatus.SeedUpdated.Time = time.Now().Add(-20 * 24 * time.Hour) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate of L1984 ?
| fw.Status.ControllerStatus.SeedUpdated.Time = time.Now().Add(-20 * 24 * time.Hour) | ||
| err = k8sClient.Status().Update(ctx, &fw) | ||
| if err != nil { | ||
| fmt.Printf("Failed to update firewall status: %v\n", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| fmt.Printf("Failed to update firewall status: %v\n", err) | |
| return fmt.Errorf("Failed to update firewall status: %v\n", err) |
| fw.Status.ControllerStatus = &v2.ControllerConnection{} | ||
| } | ||
| //add a fake concile so the unhealty firewall gets deleted | ||
| fw.Status.ControllerStatus.SeedUpdated.Time = time.Now().Add(-20 * 24 * time.Hour) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better reuse the existing time constants for health and create timeout and add or substract a specific amount of time, otherwise it is hard to follow what you are trying to test
| } | ||
|
|
||
| // duration after which a firewall in the creation phase will be recreated, exceeded | ||
| if fw.Status.Phase == v2.FirewallPhaseCreating && timeSinceReconcile > allocationTimeout { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allocation timestamp must be used for checking the creation timeout because the firewall-controller was never able to connect in this phase?
Description
Closes #62.
This pr introduces the functionality for deleting firewalls if they exceed the firewallHealthTimeout which for now is set to 20 minutes.
Integration tests where added to make sure everything works as intended.
CA were updated, otherwise it is not possible to deploy to mini-lab.