-
|
Hello All, I am setting up an on-prem Omni instance in an air-gapped (disconnected) environment. Omni is hosted on Talos based K8s cluster which has been provisioned manually. The setup uses self-signed CA for all the components (image factory and omni). Below is the snippet used to deploy Omni: ... output omitted ...
containers:
- name: omni
args:
- --account-id=XXX
- --advertised-api-url=https://omni.example.com/
- --advertised-kubernetes-proxy-url=https://kube.omni.example.com/
- --auth-auth0-enabled=false
- --auth-saml-enabled=true
- --auth-saml-url=https://auth.example.com/realms/omni/protocol/saml/descriptor
- --bind-addr=0.0.0.0:8080
- --cert=/etc/omni/ssl/omni-ingress-tls/tls.crt
- --event-sink-port=8091
- --image-factory-address=https://factory.example.com
- --k8s-proxy-bind-addr=0.0.0.0:8095
- --key=/etc/omni/ssl/omni-ingress-tls/tls.key
- --kubernetes-registry=r.example.com/siderolabs/kubelet
- --log-server-port=8092
- --machine-api-advertised-url=https://api.omni.example.com/
- --machine-api-bind-addr=0.0.0.0:8090
- --machine-api-cert=/etc/omni/ssl/api-omni-ingress-tls/tls.crt
- --machine-api-key=/etc/omni/ssl/api-omni-ingress-tls/tls.key
- --name=onprem-omni
- --private-key-source=file:///etc/omni/omni.asc
- '--registry-mirror="factory.talos.dev=r.example.com,docker.io=r.example.com,gcr.io=r.example.com,ghcr.io=r.example.com,registry.k8s.io=r.example.com,quay.io=r.example.com,k8s.gcr.io=r.example.com"'
- --siderolink-api-advertised-url=https://api.omni.example.com/
- --siderolink-use-grpc-tunnel=false
- --siderolink-wireguard-advertised-addr=wireguard.omni.example.com:50180
- --siderolink-wireguard-bind-addr=0.0.0.0:50180
- --talos-installer-registry=r.example.com/siderolabs/installer
image: ghcr.io/siderolabs/omni:v1.2.1
imagePullPolicy: IfNotPresent
ports:
- name: omni
protocol: TCP
containerPort: 8080
- name: siderolink
protocol: TCP
containerPort: 8090
- name: event-sink
protocol: TCP
containerPort: 8091
- name: log-server
protocol: TCP
containerPort: 8092
- name: k8s-proxy
protocol: TCP
containerPort: 8095
- name: wireguard
containerPort: 50180
protocol: UDP
volumeMounts:
- mountPath: /etc/ssl/certs/user-ca-bundle.crt
name: user-ca-bundle
subPath: tls.crt
readOnly: true
- mountPath: /etc/omni/ssl/omni-ingress-tls
name: omni-ingress-tls
readOnly: true
- mountPath: /etc/omni/ssl/api-omni-ingress-tls
name: api-omni-ingress-tls
readOnly: true
- name: omni-certificates
mountPath: /etc/omni/omni.asc
subPath: omni.asc
readOnly: true
restartPolicy: Always
volumes:
- name: omni-certificates
secret:
secretName: omni-certificates
- name: api-omni-ingress-tls
secret:
secretName: api-omni-ingress-tls
- name: omni-ingress-tls
secret:
secretName: omni-ingress-tls
- name: user-ca-bundle
secret:
secretName: ca-bundleOmni starts properly, and the integration with the image-factory works perfectly fine. Omni is able to communicate with image-factory and retrieve all the required data. Now comes the point where I need to use an image instead of the generated ISO. In addition, I don't want to manage my own custom secureboot certificates. So I rely on the disk image provided by factory.talos.dev, where I am using nocloud-amd64-secureboot.raw.xz for spinning up the virtual machines. Below is the sequence followed;
At this point the machine establish wireguard link properly and the machine joins Omni, and I can see it active and ready to be used. The problem occurs when creating a new cluster. The process gets stuck, and the controllers start emitting error like; Things I've tried during troubleshooting;
I cannot figure out what happens wrong in the process, and I might be doing something wrong. So any help is really appreciated. |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
|
Hi - can you let us know what customer or trialing company you are with so we can get you the right support? |
Beta Was this translation helpful? Give feedback.
-
|
Hi, actually this is a home lab not with a customer or a company. |
Beta Was this translation helpful? Give feedback.
-
|
Ah, it's rare to see airgapped at home. :-) |
Beta Was this translation helpful? Give feedback.
-
|
It looks like you get the machine to join Omni via a join config and trusted certificate chain but when you create a cluster Omni is removing the I would try downloading/booting an image with Let us know if that works. |
Beta Was this translation helpful? Give feedback.
-
|
@rothgar Thank you so much for pointing me to the right direction. Indeed loading the CA during VM boot made it work. However, I didn't try explicity using I am trying to avoid building or customizing anything that could lead to requiring manual intervention. Repeating my thanks to you and the amazing team at SideroLabs. |
Beta Was this translation helpful? Give feedback.
It looks like you get the machine to join Omni via a join config and trusted certificate chain but when you create a cluster Omni is removing the
TrustedRootsConfigpatch that was applied with the Omni connection config.I would try downloading/booting an image with
talos.config.inlineset with the custom CA you need instead of applying it as a patch to a vanilla Talos image. You can see how to do it here. https://docs.siderolabs.com/talos/v1.11/reference/kernel#talos-config-early-and-talos-config-inlineLet us know if that works.