Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions csil/v1/components/k3s.csil
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ Config = {
server_url: text @go_name("ServerURL"),
dns_servers: [* text] @go_name("DNSServers"),
? additional_registries: [* AdditionalRegistry] @go_name("AdditionalRegistries"),
? etcd_args: [* text] @go_name("EtcdArgs"),
? allow_cgnat_vip: bool @go_name("AllowCGNATVIP"),
}

; Additional registry configuration for user-defined registries
Expand Down
20 changes: 20 additions & 0 deletions csil/v1/components/tailscale.csil
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
; Tailscale component configuration
;
; Configuration for Tailscale operator integration, enabling secure connectivity
; to Tailscale overlay networks with automated VIP route advertisement and DNS.

options {
go_module: "github.com/catalystcommunity/foundry/v1",
go_package: "github.com/catalystcommunity/foundry/v1/internal/component/tailscale"
}

; Tailscale operator configuration
; OAuth credentials can be literal values or OpenBAO references: ${secret:path:key}
; Default values will be set in Go code SetDefaults() function
Config = {
? oauth_client_id: text @go_name("OAuthClientID"),
? oauth_client_secret: text @go_name("OAuthClientSecret"),
? operator_image: text,
? advertise_routes: [* text],
? tags: [* text]
}
4 changes: 3 additions & 1 deletion csil/v1/config/network-simple.csil
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,9 @@ ClusterConfig = {
name: text,
? domain: text,
primary_domain: text @go_name("PrimaryDomain"),
vip: text @go_name("VIP")
vip: text @go_name("VIP"),
? allow_cgnat_vip: bool @go_name("AllowCGNATVIP"),
? use_tailscale: bool @go_name("UseTailscale")
}

; Map of component names to their configurations
Expand Down
241 changes: 241 additions & 0 deletions docs/tailscale-integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
# Using Foundry with Tailscale Networks

This guide covers deploying Foundry clusters on Tailscale overlay networks using CGNAT IP addresses (RFC 6598 Shared Address Space, 100.64.0.0/10).

## Overview

Tailscale uses the CGNAT IP range (100.64.0.0/10) for its overlay network, which is outside the traditional RFC 1918 private IP ranges. By default, Foundry's VIP validation only accepts RFC 1918 addresses. The `allow_cgnat_vip` configuration flag enables support for Tailscale and similar overlay networks.

## Prerequisites

- Tailscale installed and configured on all cluster nodes
- Nodes tagged appropriately (e.g., `tag:k8s`)
- Tailscale ACL configured to allow inter-node communication

## Required Tailscale ACL Configuration

Your Tailscale ACL must allow:
1. **Your local machine → cluster nodes** (for Foundry SSH access)
2. **Cluster nodes → cluster nodes** (for K3s cluster formation)

### Example ACL

```json
{
"acls": [
{
"action": "accept",
"src": ["*"],
"dst": ["*:*"]
}
],
"ssh": [
{
"action": "accept",
"src": ["autogroup:members"],
"dst": ["tag:k8s"],
"users": ["root", "ubuntu"]
},
{
"action": "accept",
"src": ["tag:k8s"],
"dst": ["tag:k8s"],
"users": ["root"]
}
],
"tagOwners": {
"tag:k8s": ["autogroup:admin"]
}
}
```

**Critical:** The second SSH rule (`tag:k8s` → `tag:k8s`) allows cluster nodes to SSH to each other, which is required for K3s agent installation on worker nodes.

## Configuration

### Single Control Plane Setup (Recommended)

For single control plane deployments, the simplest approach is to use the control plane's Tailscale IP as the VIP:

```yaml
cluster:
name: my-cluster
primary_domain: example.local
vip: 100.81.89.62 # Control plane's Tailscale IP
allow_cgnat_vip: true

hosts:
- hostname: control-plane
address: 100.81.89.62
user: root
- hostname: worker-1
address: 100.70.90.12
user: root
- hostname: worker-2
address: 100.125.196.1
user: root
```

**Why this works:**
- Single control plane means no HA failover needed
- VIP is just a stable endpoint for workers to connect to
- Using the control plane's actual IP avoids routing complexity

### High Availability (Multi-Control-Plane) Setup

For HA setups with multiple control planes, you need to make the VIP routable via Tailscale:

#### Option 1: Tailscale Subnet Routes

Advertise the VIP as a subnet route from the active control plane:

```bash
# On the control plane node
tailscale up --advertise-routes=100.81.89.100/32
```

Then approve the route in the Tailscale admin console.

```yaml
cluster:
name: my-cluster
primary_domain: example.local
vip: 100.81.89.100 # Dedicated VIP
allow_cgnat_vip: true
```

**Note:** kube-vip will manage the VIP assignment, but you need to ensure the route is advertised from whichever node currently holds the VIP.

#### Option 2: Tailscale Operator (Recommended for HA)

Install the Tailscale operator on control planes to automatically manage subnet route advertisements:

```yaml
# Future enhancement - see "Roadmap" section
cluster:
name: my-cluster
primary_domain: example.local
vip: 100.81.89.100
allow_cgnat_vip: true
use_tailscale: true # Not yet implemented
```

This will be available in a future Foundry release.

## Network Routing Considerations

### Understanding VIP Routing on Tailscale

Traditional kube-vip assumes Layer 2 networking where the VIP can "float" between nodes via ARP announcements. Tailscale is a Layer 3 overlay network where:

- **IPs are routed, not bridged** - Nodes communicate via Tailscale's WireGuard tunnels
- **No ARP** - IP routing is managed by Tailscale's coordination server
- **Explicit routes required** - Any IP that isn't a node's primary Tailscale IP needs to be advertised as a subnet route

### VIP Reachability

For worker nodes to reach the VIP:

**Single control plane:**
- VIP = control plane IP → Always routable (it's the node's primary IP)

**Multiple control planes:**
- VIP = dedicated IP → Must be advertised as subnet route
- Route must be updated when VIP moves between control planes
- Tailscale operator can automate this

## Troubleshooting

### Workers Can't Join Cluster

**Symptom:**
```
Failed to validate connection to cluster at https://100.81.89.100:6443:
failed to get CA certs: context deadline exceeded
```

**Diagnosis:**
Worker nodes cannot reach the VIP. Check:

```bash
# On a worker node
curl -k https://<VIP>:6443/version --max-time 5

# If it times out, the VIP is not routable
```

**Solution:**
- Single control plane: Set `vip` to control plane's IP
- Multi control plane: Advertise VIP as subnet route from active control plane

### SSH Connection Refused Between Nodes

**Symptom:**
```
tailscale: tailnet policy does not permit you to SSH to this node
```

**Diagnosis:**
Tailscale ACL doesn't allow SSH between cluster nodes.

**Solution:**
Add SSH rule allowing `tag:k8s` → `tag:k8s` as shown in the ACL example above.

### VIP Assigned But Not Reachable

**Symptom:**
- `ip addr show` on control plane shows VIP assigned
- Workers still can't reach it

**Diagnosis:**
VIP is assigned to the local interface but not advertised to Tailscale.

**Solution:**
```bash
# On control plane
tailscale up --advertise-routes=<VIP>/32

# Then approve in Tailscale admin console
```

## Validation Checklist

Before deploying:

- [ ] All nodes have Tailscale installed and connected
- [ ] Nodes are tagged appropriately (e.g., `tag:k8s`)
- [ ] Tailscale ACL allows SSH from your machine to nodes
- [ ] Tailscale ACL allows SSH between nodes (`tag:k8s` → `tag:k8s`)
- [ ] For HA setups: VIP subnet route is configured and approved
- [ ] `allow_cgnat_vip: true` is set in cluster config
- [ ] Workers can reach the VIP: `curl -k https://<VIP>:6443/version`

## Roadmap

Future enhancements planned for Tailscale integration:

1. **Tailscale Operator Integration** (`use_tailscale: true`)
- Automatic operator installation on control planes
- Automated VIP subnet route management
- Support for cross-pod network policies via Tailscale ACLs

2. **Multi-Cluster Mesh**
- Connect multiple Foundry clusters via Tailscale
- Cross-cluster service discovery
- Unified network policy across clusters

3. **GitOps for Tailscale ACLs**
- Version control for network policies
- CI/CD automation for ACL updates
- Integration with Foundry stack management

## References

- [RFC 6598 - Shared Address Space (CGNAT)](https://www.rfc-editor.org/rfc/rfc6598)
- [Tailscale ACL Documentation](https://tailscale.com/kb/1018/acls/)
- [Tailscale Subnet Routes](https://tailscale.com/kb/1019/subnets/)
- [kube-vip Documentation](https://kube-vip.io/)

## Contributing

Found an issue or have suggestions for Tailscale integration? Please open an issue on the [Foundry GitHub repository](https://github.com/catalystcommunity/foundry).
2 changes: 2 additions & 0 deletions v1/cmd/foundry/commands/cluster/init.go
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,7 @@ func InitializeCluster(ctx context.Context, cfg *config.Config) error {
fmt.Sprintf("%s.%s", cfg.Cluster.Name, cfg.Cluster.PrimaryDomain),
},
DisableComponents: []string{"traefik", "servicelb"},
AllowCGNATVIP: cfg.Cluster.AllowCGNATVIP,
}

// Parse additional registries and etcd args from component config
Expand Down Expand Up @@ -345,6 +346,7 @@ func InitializeCluster(ctx context.Context, cfg *config.Config) error {
DisableComponents: k3sConfig.DisableComponents,
RegistryConfig: k3sConfig.RegistryConfig,
EtcdArgs: k3sConfig.EtcdArgs,
AllowCGNATVIP: k3sConfig.AllowCGNATVIP,
}

// Join control plane
Expand Down
5 changes: 3 additions & 2 deletions v1/internal/component/k3s/install.go
Original file line number Diff line number Diff line change
Expand Up @@ -294,8 +294,9 @@ func waitForK3sReady(executor SSHExecutor, retryCfg RetryConfig) error {
func setupKubeVIP(ctx context.Context, executor SSHExecutor, cfg *Config) error {
// Determine VIP config
vipConfig := &VIPConfig{
VIP: cfg.VIP,
Interface: cfg.Interface,
VIP: cfg.VIP,
Interface: cfg.Interface,
AllowCGNATVIP: cfg.AllowCGNATVIP,
}

// Generate kube-vip manifests
Expand Down
3 changes: 1 addition & 2 deletions v1/internal/component/k3s/types.gen.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 8 additions & 1 deletion v1/internal/component/k3s/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,11 @@ func ParseConfig(cfg component.ComponentConfig) (*Config, error) {
config.VIP = vip
}

// Allow CGNAT VIP
if allowCGNAT, ok := cfg.GetBool("allow_cgnat_vip"); ok {
config.AllowCGNATVIP = &allowCGNAT
}

// Interface
if iface, ok := cfg.GetString("interface"); ok {
config.Interface = iface
Expand Down Expand Up @@ -194,7 +199,9 @@ func (c *Config) Validate() error {
return fmt.Errorf("VIP is required")
}

if err := ValidateVIP(c.VIP); err != nil {
// Dereference AllowCGNATVIP pointer (defaults to false if nil)
allowCGNAT := c.AllowCGNATVIP != nil && *c.AllowCGNATVIP
if err := ValidateVIP(c.VIP, allowCGNAT); err != nil {
return fmt.Errorf("VIP validation failed: %w", err)
}

Expand Down
Loading