-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Potential bug
When using the Terraform provider (invoked via OpenTofu), it seems like the talos_machine_bootstrap resource does not correctly wait for (depends_on) the talos_machine_configuration_apply resource. tofu apply seems to think it's time to proceed to the bootstrap step, but for some reason the nodes are not ready to receive this command yet.
In the example below, I have only provided code relevant for the siderolabs/talos provider. My actual deployment also consists of deploying VMWare VM:s. I have not seen anything that indicates that the apply-step fails to wait for the VM:s to become available. On the contrary, looking at the dashboard for the nodes, I can see that the machine configuration is applied. Nodes get correct IP-addresses, hostnames, etc.
Steps to reproduce
Prerequisites
- Use the
siderolabs/talosprovider, version0.9.0-alpha.0. - Talos Linux (I have used version 1.10.3) running on a number of nodes.
- A directory
talos-conf/withYAMLpatch files corresponding to your nodes.- Files starting with
cwill be used to configure the CP nodes. - Files starting with
wwill be used to configure the Worker nodes.
- Files starting with
locals {
cp_files = sort(fileset("${path.module}/talos-conf", "c*.yaml"))
w_files = sort(fileset("${path.module}/talos-conf", "w*.yaml"))
num_control_plane_nodes = length(local.cp_files)
num_worker_nodes = length(local.w_files)
control_plane_ips = ["10.0.1.11", "10.0.1.12", "10.0.1.13"]
worker_ips = ["10.0.1.15", "10.0.1.16"]
}
resource "talos_machine_secrets" "secrets" {}
data "talos_machine_configuration" "mc_cp" {
count = local.num_control_plane_nodes
cluster_name = "test-cluster"
cluster_endpoint = "https://10.0.1.10:6443"
machine_type = "controlplane"
machine_secrets = talos_machine_secrets.secrets.machine_secrets
talos_version = "v1.10.3"
}
data "talos_machine_configuration" "mc_w" {
count = local.num_worker_nodes
cluster_name = "test-cluster"
cluster_endpoint = "https://10.0.1.10:6443"
machine_type = "worker"
machine_secrets = talos_machine_secrets.secrets.machine_secrets
talos_version = "v1.10.3"
}
resource "talos_machine_configuration_apply" "apply_cp" {
count = local.num_control_plane_nodes
client_configuration = talos_machine_secrets.secrets.client_configuration
machine_configuration_input = data.talos_machine_configuration.mc_cp[count.index].machine_configuration
node = local.control_plane_ips[count.index]
config_patches = [file("${path.module}/talos-conf/${local.cp_files[count.index]}")]
}
resource "talos_machine_configuration_apply" "apply_w" {
count = local.num_worker_nodes
client_configuration = talos_machine_secrets.secrets.client_configuration
machine_configuration_input = data.talos_machine_configuration.mc_w[count.index].machine_configuration
node = local.worker_ips[count.index]
config_patches = [file("${path.module}/talos-conf/${local.w_files[count.index]}")]
}
resource "talos_machine_bootstrap" "bootstrap" {
client_configuration = talos_machine_secrets.secrets.client_configuration
node = local.control_plane_ips[0]
depends_on = [
talos_machine_configuration_apply.apply_cp,
talos_machine_configuration_apply.apply_w
]
}Run:
tofu applyExpected behaviour
The different nodes are configured according to their respective YAML patch files, the cluster is bootstrapped and after a while a ready-to-use Kubernetes cluster is available.
What actually happens
The apply command will get stuck on the talos_machine_bootstrap step and will eventually time out. Re-running the apply command will solve the problem.