Skip to content

Conversation

@Honigeintopf
Copy link
Collaborator

@Honigeintopf Honigeintopf commented Nov 4, 2024

Description

Closes #62.

This pr introduces the functionality for deleting firewalls if they exceed the firewallHealthTimeout which for now is set to 20 minutes.
Integration tests where added to make sure everything works as intended.

CA were updated, otherwise it is not possible to deploy to mini-lab.

@Honigeintopf Honigeintopf requested a review from a team as a code owner November 4, 2024 14:31
@Honigeintopf Honigeintopf linked an issue Nov 4, 2024 that may be closed by this pull request
@Honigeintopf Honigeintopf changed the title Firewall health check Firewall delete on unhealthy condition Nov 4, 2024
@Honigeintopf Honigeintopf requested a review from Gerrit91 November 4, 2024 14:33
@Gerrit91 Gerrit91 changed the title Firewall delete on unhealthy condition Recreate firewall on unhealthy condition Nov 4, 2024
Copy link
Contributor

@Gerrit91 Gerrit91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for coming up with a PR for this.


for _, fw := range fws {
fw := fw
if c.isFirewallUnhealthy(fw) {
Copy link
Contributor

@Gerrit91 Gerrit91 Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With some small changes on your newly introduced struct I think something like this would be clearer and really point out that this is about timeouts:

status := evaluateFirewallConditions(fw)

switch {
case status.CreateTimeout || status.HealthTimeout:
    r.Log.Info("firewall health or creation timeout exceeded, deleting from set", "firewall-name", fw.Name)

    err := c.deleteFirewalls(r, fw)
    if err != nil {
        return nil, err
    }

    result = append(result, fw)
}

The isProgressing that's used in setStatus would also not be required anymore as it can be derived in case all other cases do not match.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it, what do you think now?

@github-project-automation github-project-automation bot moved this to Review in Development Jun 5, 2025
@Gerrit91 Gerrit91 removed the status in Development Jun 13, 2025
@Gerrit91 Gerrit91 moved this to Upcoming in Development Oct 20, 2025
Comment on lines +38 to +43
if fw.Status.Phase == v2.FirewallPhaseCreating && timeSinceReconcile > allocationTimeout {
c.log.Info("create timeout reached")
return firewallConditionStatus{CreateTimeout: true}
}

if seedConnected && unhealthyTimeout != 0 && created && timeSinceReconcile > unhealthyTimeout {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if allocationTimeout is set to be able to disable this check

fw.Status.ControllerStatus = &v2.ControllerConnection{}
}
//add a fake concile so the unhealty firewall gets deleted
fw.Status.ControllerStatus.SeedUpdated.Time = time.Now().Add(-20 * 24 * time.Hour)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate of L1984 ?

fw.Status.ControllerStatus.SeedUpdated.Time = time.Now().Add(-20 * 24 * time.Hour)
err = k8sClient.Status().Update(ctx, &fw)
if err != nil {
fmt.Printf("Failed to update firewall status: %v\n", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fmt.Printf("Failed to update firewall status: %v\n", err)
return fmt.Errorf("Failed to update firewall status: %v\n", err)

fw.Status.ControllerStatus = &v2.ControllerConnection{}
}
//add a fake concile so the unhealty firewall gets deleted
fw.Status.ControllerStatus.SeedUpdated.Time = time.Now().Add(-20 * 24 * time.Hour)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better reuse the existing time constants for health and create timeout and add or substract a specific amount of time, otherwise it is hard to follow what you are trying to test

}

// duration after which a firewall in the creation phase will be recreated, exceeded
if fw.Status.Phase == v2.FirewallPhaseCreating && timeSinceReconcile > allocationTimeout {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allocation timestamp must be used for checking the creation timeout because the firewall-controller was never able to connect in this phase?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Upcoming

Development

Successfully merging this pull request may close these issues.

Firewall health check

4 participants