Skip to content

Conversation

@vignesh-goutham
Copy link
Contributor

Description of changes:
On removing containerd, the pods are not explicitly deleted. This is by design and containerd lets cri plugin or rather kubelet be the handler of these processes. When containerd exists, the pod sandboxes, containers, process are not terminated. Kubelet currently while attempts to stop pods, doesn't force clean up sandboxes. This leads to nodes to have pods running after nodeadm uninstall. This change finds and deletes running pod sandboxes and containers.

Tested manually.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@vignesh-goutham vignesh-goutham force-pushed the force-stop-pods branch 2 times, most recently from 8b2a106 to 754cf91 Compare May 16, 2025 21:09
// Client is a containerd runtime client wrapper
// Holds the internalapi.RuntimeService for pod/container operations
type Client struct {
Runtime internalapi.RuntimeService
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a constructor, does this need to be exported?

}

for _, sandbox := range podSandboxes {
zap.L().Info("Stopping pod..", zap.String("pod", sandbox.Metadata.Name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
zap.L().Info("Stopping pod..", zap.String("pod", sandbox.Metadata.Name))
zap.L().Info("Stopping pod...", zap.String("pod", sandbox.Metadata.Name))

for _, sandbox := range podSandboxes {
zap.L().Info("Stopping pod..", zap.String("pod", sandbox.Metadata.Name))
err := util.RetryExponentialBackoff(3, 2*time.Second, func() error {
if err := c.Runtime.StopPodSandbox(sandbox.Id); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this operation idempotent? if not, we might need two different retries blocks

return nil
})
if err != nil {
zap.L().Info("ignored error stopping pod", zap.Error(err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a comment explaining why we can ignore it and what are the consequences of leaving a sandbox pod running. there is a comment below this but it doesn't say why

Also, this might be a bit better

Suggested change
zap.L().Info("ignored error stopping pod", zap.Error(err))
zap.L().Info("Failed to stop sandbox pod, ignoring and continue with containers", zap.Error(err))

return nil
})
if err != nil {
zap.L().Info("ignored error removing container", zap.Error(err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo we should accumulate the errors here and return them at the end of the loop
It should be up to the caller to decide what to do with the error. From this method's perspective, we weren't able to do what our method promises to do, so we should not return success.

u.Logger.Info("Uninstalling containerd...")
client, err := containerd.NewClient()
if err != nil {
u.Logger.Info("ignored error creating containerd client", zap.Error(err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
u.Logger.Info("ignored error creating containerd client", zap.Error(err))
u.Logger.Info("Failed to create containerd client, skipping container removal", zap.Error(err))

u.Logger.Info("ignored error creating containerd client", zap.Error(err))
} else {
if err := client.RemovePods(); err != nil {
u.Logger.Info("ignored error stopping pods", zap.Error(err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
u.Logger.Info("ignored error stopping pods", zap.Error(err))
u.Logger.Info("Failed to delete pods, this might leave dangling processes after containerd is removed", zap.Error(err))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants