Skip to content

[tink worker] containers that fail to start aren't removed #972

@jacobweinstock

Description

@jacobweinstock

The function in Tink Worker that creates and starts containers (

func (w *Worker) execute(ctx context.Context, wfID string, action *proto.WorkflowAction) (proto.State, error) {
) doesn't remove them if the container fails to start. The clean up is defined too late. See https://github.com/tinkerbell/tink/blob/main/cmd/tink-worker/worker/worker.go#L189-L194
A failure in the start of a container leaves the container around on the host. The Workflow fails and cannot be re-run as is because the container name will have a conflict.

Expected Behaviour

Tink worker should always clean up containers properly.

Current Behaviour

Possible Solution

Move the defer function to immediately after the successful creation of the container.

Steps to Reproduce (for bugs)

Context

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:

  • Link to your project or a code example to reproduce issue:

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions