Problem
The Tailscale component installer fails when trying to deploy the Connector CRD immediately after Helm installation completes:
Error: failed to apply manifest (kind=Connector, name=foundry-vip-connector): the server could not find the requested resource
Root Cause
The installation flow in v1/internal/component/tailscale/install.go executes these steps sequentially:
- Helm installs the Tailscale operator
- Immediately tries to deploy Connector CRD
- Immediately tries to deploy DNSConfig CRD
However, the operator needs time to:
- Start up (pod becomes Running)
- Register its CRDs with the Kubernetes API server (eventually consistent operation)
This timing gap causes the Connector deployment to fail with "resource not found".
Proposed Solution
Add a waitForOperatorReady() method between Helm installation and CRD deployment that:
- Waits for operator pod to be Running
- Waits for CRD registration - Poll
CRDExists() for connectors.tailscale.com
- Adds buffer time for full propagation
Implementation
File: v1/internal/component/tailscale/install.go
Add after Helm install (Step 4) and before Connector deployment (Step 5):
// Step 4.5: Wait for operator to be ready and CRDs to be registered
if err := i.waitForOperatorReady(ctx); err != nil {
return fmt.Errorf("failed waiting for operator to be ready: %w", err)
}
New method:
func (i *Installer) waitForOperatorReady(ctx context.Context) error {
timeout := 120 * time.Second
deadline := time.Now().Add(timeout)
fmt.Println("Waiting for Tailscale operator to be ready...")
// Wait for operator pod to be running
for time.Now().Before(deadline) {
pods, err := i.kubeClient.GetPods(ctx, DefaultNamespace)
if err == nil {
for _, pod := range pods {
if strings.Contains(pod.Name, "operator") && pod.Status == "Running" {
fmt.Println("✓ Operator pod is running")
goto checkCRD
}
}
}
time.Sleep(2 * time.Second)
}
return fmt.Errorf("timeout waiting for operator pod to be ready")
checkCRD:
// Wait for Connector CRD to be registered
for time.Now().Before(deadline) {
exists, err := i.kubeClient.CRDExists(ctx, "connectors.tailscale.com")
if err == nil && exists {
fmt.Println("✓ Connector CRD is registered")
time.Sleep(2 * time.Second)
return nil
}
time.Sleep(2 * time.Second)
}
return fmt.Errorf("timeout waiting for Connector CRD to be registered")
}
Alternative Approach
Use retry with exponential backoff in DeployConnector() for better resilience.
Testing
- Fresh cluster installation with
use_tailscale: true
- Verify operator starts successfully
- Verify Connector and DNSConfig deploy without errors
- Test timeout scenarios
Context
- Discovered during pedro-ops cluster deployment testing
- Required for automated
foundry stack install with Tailscale enabled
- Follow-up to PR #2i: Tailscale component registry integration
Problem
The Tailscale component installer fails when trying to deploy the Connector CRD immediately after Helm installation completes:
Root Cause
The installation flow in
v1/internal/component/tailscale/install.goexecutes these steps sequentially:However, the operator needs time to:
This timing gap causes the Connector deployment to fail with "resource not found".
Proposed Solution
Add a
waitForOperatorReady()method between Helm installation and CRD deployment that:CRDExists()forconnectors.tailscale.comImplementation
File:
v1/internal/component/tailscale/install.goAdd after Helm install (Step 4) and before Connector deployment (Step 5):
New method:
Alternative Approach
Use retry with exponential backoff in
DeployConnector()for better resilience.Testing
use_tailscale: trueContext
foundry stack installwith Tailscale enabled