Skip to content

talos_machine_bootstrap does not wait for talos_machine_configuration_apply #265

@ErikThorsell

Description

@ErikThorsell

Potential bug

When using the Terraform provider (invoked via OpenTofu), it seems like the talos_machine_bootstrap resource does not correctly wait for (depends_on) the talos_machine_configuration_apply resource. tofu apply seems to think it's time to proceed to the bootstrap step, but for some reason the nodes are not ready to receive this command yet.

In the example below, I have only provided code relevant for the siderolabs/talos provider. My actual deployment also consists of deploying VMWare VM:s. I have not seen anything that indicates that the apply-step fails to wait for the VM:s to become available. On the contrary, looking at the dashboard for the nodes, I can see that the machine configuration is applied. Nodes get correct IP-addresses, hostnames, etc.

Steps to reproduce

Prerequisites

  • Use the siderolabs/talos provider, version 0.9.0-alpha.0.
  • Talos Linux (I have used version 1.10.3) running on a number of nodes.
  • A directory talos-conf/ with YAML patch files corresponding to your nodes.
    • Files starting with c will be used to configure the CP nodes.
    • Files starting with w will be used to configure the Worker nodes.
locals {
  cp_files = sort(fileset("${path.module}/talos-conf", "c*.yaml"))
  w_files  = sort(fileset("${path.module}/talos-conf", "w*.yaml"))

  num_control_plane_nodes = length(local.cp_files)
  num_worker_nodes        = length(local.w_files)

  control_plane_ips = ["10.0.1.11", "10.0.1.12", "10.0.1.13"]
  worker_ips        = ["10.0.1.15", "10.0.1.16"]
}

resource "talos_machine_secrets" "secrets" {}

data "talos_machine_configuration" "mc_cp" {
  count = local.num_control_plane_nodes

  cluster_name     = "test-cluster"
  cluster_endpoint = "https://10.0.1.10:6443"
  machine_type     = "controlplane"
  machine_secrets  = talos_machine_secrets.secrets.machine_secrets
  talos_version    = "v1.10.3"
}

data "talos_machine_configuration" "mc_w" {
  count = local.num_worker_nodes

  cluster_name     = "test-cluster"
  cluster_endpoint = "https://10.0.1.10:6443"
  machine_type     = "worker"
  machine_secrets  = talos_machine_secrets.secrets.machine_secrets
  talos_version    = "v1.10.3"
}

resource "talos_machine_configuration_apply" "apply_cp" {
  count = local.num_control_plane_nodes

  client_configuration        = talos_machine_secrets.secrets.client_configuration
  machine_configuration_input = data.talos_machine_configuration.mc_cp[count.index].machine_configuration
  node                        = local.control_plane_ips[count.index]
  config_patches              = [file("${path.module}/talos-conf/${local.cp_files[count.index]}")]
}

resource "talos_machine_configuration_apply" "apply_w" {
  count = local.num_worker_nodes

  client_configuration        = talos_machine_secrets.secrets.client_configuration
  machine_configuration_input = data.talos_machine_configuration.mc_w[count.index].machine_configuration
  node                        = local.worker_ips[count.index]
  config_patches              = [file("${path.module}/talos-conf/${local.w_files[count.index]}")]
}

resource "talos_machine_bootstrap" "bootstrap" {
  client_configuration = talos_machine_secrets.secrets.client_configuration
  node                 = local.control_plane_ips[0]

  depends_on = [
    talos_machine_configuration_apply.apply_cp,
    talos_machine_configuration_apply.apply_w
  ]
}

Run:

tofu apply

Expected behaviour

The different nodes are configured according to their respective YAML patch files, the cluster is bootstrapped and after a while a ready-to-use Kubernetes cluster is available.

What actually happens

The apply command will get stuck on the talos_machine_bootstrap step and will eventually time out. Re-running the apply command will solve the problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions