Validating ARM64 Kubernetes in the cloud before committing to bare-metal
Smart validation strategy: Test first, buy hardware second
Transform a 128Β°F office space heater (aka home lab) into a cloud-validated, ARM64-first Kubernetes deployment that:
- β Validates ARM64 architecture on t4g instances before Raspberry Pi purchase
- β Runs experiments within reasonable budget (baseline: $0.08/month, validation: <$15/month)
- β Netboots Talos Linux with custom extensions (Spin + Tailscale subnet router)
- β Demonstrates WebAssembly on ARM64 in production-like conditions
- β Proves when cloud makes sense vs. efficient home lab hardware
Target: Live demo at CozySummit Virtual 2025 on December 3, 2025
β Working Tests (4/5):
- β Patch validation (upstream conformance) - FIXED
- β GitHub Actions workflow syntax
- β Dependency verification (crane, skopeo, jq)
- β Patch directory cleanliness (3 patches) - FIXED
β Failing Tests (1/5):
- β ADR-003 documentation validation - Missing file expected by test
π§ Image Build Tests (1/3 passing):
- β Container image pulls (need actual published images)
- β OCI manifest validation (images not yet published)
- β Cost tracking validation
π― Conformance Achieved:
- Upstream CozyStack integration β
- Separate repository strategy β
- ARM64 native builds β
- Test suite reality alignment β NEW
Home Lab Status: π₯
Office Temperature: 93Β°F (ambient, with the door closed)
Electricity Bill: π
Wife's Patience: π
Running x86 workloads 24/7 in a home lab is:
- HOT - Space heater in every season
- EXPENSIVE - Power consumption adds up
- LOUD - Fans, lots of fans
- INFLEXIBLE - Can't easily scale down
The Solution? Validate in the cloud, then bring it home on ARM64 (Raspberry Pi CM3).
Internet β DD-WRT Router (10.17.12.1)
ββ Front Subnet (10.17.12.0/24)
ββ Mikrotik Router (dual-homed)
ββ Inner Subnet (10.17.13.0/24)
ββ Netboot Infrastructure
β ββ dnsmasq (DHCP)
β ββ matchbox (PXE)
β ββ 5x registry caches
β ββ pi-hole (DNS)
ββ Talos Nodes
ββ CozyStack
VPC: 10.10.0.0/16 (eu-west-1)
ββ Public Subnet (10.10.0.0/24)
ββ Bastion: 10.10.0.100 (ENI + IPv6, t4g.small)
β ββ Services: registry caches, Wireguard NAT, Tailscale
β
ββ Talos Gateway: 10.10.0.101 (t4g.medium)
β ββ Extensions: spin, tailscale (subnet router)
β
ββ Talos Compute: 10.10.0.102 (t4g.medium)
β ββ Extensions: spin only
β
ββ Talos Compute: 10.10.0.103 (t4g.medium)
ββ Extensions: spin only
Boot: boot-to-talos installs OCI images (no AMI management)
Cost: ~$16-20/month (mostly EBS, t4g free tier covers compute)
π AWS Design Summary - Ready for Stakpak agent
π·οΈ Package Naming Cleanup - Fix those ugly package names!
Key Innovation: Exact replica of home lab topology in AWS, staying within free tier limits.
Talos Linux Β· CozyStack Β· WebAssembly (Spin) Β· Tailscale Subnet Router Β· AWS Graviton
Key insight: We use Tailscale's subnet router mode (not mesh!) to create clean network bridges between:
- AWS VPC private networks (
10.20.0.0/16) - Kubernetes pod CIDR (managed by CozyStack's CNI)
- Service networks (MetalLB load balancers in ARP mode)
- Home lab networks (
10.17.13.0/24)
Architecture: Single privileged Talos node runs subnet router, other nodes use standard Kubernetes networking. This preserves CNI while providing seamless VPC access.
See landing page for complete technical implementation details.
Why Talos? It's CozySummit and CozyStack is built on it. End of justification! π―
What makes it compelling:
- Immutable OS: Fewer binaries = smaller attack surface
- Kubernetes-first: No SSH, no shell, just API-driven infrastructure
- ARM64 native: First-class support, not an afterthought
- Security by design: Minimal surface area, everything locked down
Real talk: We're not here to justify Talos vs. other distros. It's proven, it works, and it's what CozyStack uses. Moving on.
Why CozyStack over vanilla Kubernetes? Because it looks like something I'd build if I had unlimited time, and I want that to exist.
The compelling architecture:
- Helm-first design: Platform built for teams that demand "Helm only"
- Flux integration: GitOps workflows that actually work
- Cloud-native foundation: CNCF projects with (hopefully) spectacular ARM64 support
- Platform-as-code: Infrastructure that scales with your team, not against it
Author's note: As a Flux maintainer, I've seen enough infrastructure built on Helm to know this is the right abstraction level. CozyStack delivers that vision.
Why WebAssembly? Faster, cheaper, architecture-independent. Perfect for ARM64 validation.
The Spin advantage:
- Cold start performance: Sub-millisecond startup vs. container seconds
- Scale-to-zero efficiency: Actually works, unlike most "serverless" promises
- Local registry caching: Artifact caching that makes cold starts even faster
- Architecture portability: Same binary runs on x86 home lab and ARM64 cloud
Real-world impact: We've been demoing Spin for years. The performance story is proven - now we're validating it on ARM64 at cloud scale before hardware investment.
Why Graviton? It's available ARM64 in the cloud and currently free under AWS free tier usage.
The pragmatic choice:
- Virtualization extensions: Hopefully has what Raspberry Pi lacks for advanced CozyStack features
- Known platform: AWS is familiar territory for cloud validation
- Risk mitigation: Test architecture before $650+ hardware investment
- Uncertain alternatives: Ampere? Chinese Raspberry Pi clones? Unknown landscape.
Honest assessment: We think Graviton has the virtualization support that consumer ARM64 hardware might lack. We'll find out! But we'd rather discover limitations in the cloud than after buying hardware.
The Problem: Adding Tailscale to ALL cluster nodes breaks everything.
What we learned (the hard way):
- Kubernetes Ready condition: Nodes wait for ALL configured extensions to become active
- Multiple subnet routers: Every node tries to configure as Tailscale subnet router
- Configuration conflicts: Multiple nodes compete for same routing role
- Cluster formation failure: Nodes hang indefinitely, never reach Ready state
The Solution: Role-based image architecture
- Compute nodes (
spin-only): WebAssembly runtime only, quick Ready state - Gateway nodes (
spin-tailscale): WebAssembly + Tailscale subnet router, one per cluster
Discovery method: "Walking the grounds and tilling the soil" - not systematic testing, but real-world cluster building experience on AMD64 that informed our ARM64 strategy.
Impact: This architectural insight is why our ARM64 validation will work. We've already solved the hard problems.
Baseline Infrastructure (no experiments):
Bastion (t4g.small, 5hrs/day): $0.00 (free tier)
EBS volumes (during runtime): $0.04/month
NAT Gateway (minimal usage): $0.04/month
-------------------------------------------------
Baseline cost: $0.08/month
Validation Phase (5 experiments, 2-3 hours each):
3x Talos nodes (t4g.small): $0.00 (free tier < 750hrs/month)
4x EBS volumes (8GB each): $0.25-0.50/session
NAT Gateway (active egress): $0.15-0.35/session
-------------------------------------------------
Per experiment session: $0.40-0.85
Target validation budget: <$15/month
Break-even Analysis:
- Home lab power consumption: $30-50/month
- Cloud validation phase: Target <$15/month
- Production cloud cost: $25-70/month (estimated)
- Decision point: When cloud exceeds $40/month, efficient ARM64 home lab wins
Strategy: Validate in cloud for less than the cost of buying wrong hardware ($500+ Raspberry Pi mistake), then deploy with confidence.
This project follows the Test-Driven Generation methodology created by Chanwit Kaewkasi.
Principle: Write tests FIRST, then generate code to make them pass.
- π Chanwit's Article: "I Was Wrong About Test-Driven Generation"
- π§° TDG Skill (Open Source)
- π Our TDG Plan
| Test Category | Status | Details |
|---|---|---|
| Patch Validation | β PASSING | 4/5 tests passing (validate-complete.sh) |
| Image Build Tests | π§ PARTIAL | 1/3 passing (need published images) |
| Cost Tracking | β PASSING | AWS cost validation working |
| Phase | Tests | Status |
|---|---|---|
| Network Foundation | 1-3 | π DEFINED (TDG-PLAN.md) |
| Bastion & Netboot | 4-6 | π DEFINED (TDG-PLAN.md) |
| CozyStack Deployment | 7-9 | π DEFINED (TDG-PLAN.md) |
| Integration Tests | 10-12 | π DEFINED (SpinApp + KubeVirt + Moonlander) |
Run current tests: ./validate-complete.sh and ./tests/run-all-custom-image-tests.sh
Next: Implement TDG infrastructure tests from TDG-PLAN.md
Integration Test Highlights:
- β¨ Test 10: SpinApp GitOps deployment with MetalLB external access
- π Test 11: KubeVirt + Cluster-API nested Kubernetes clusters
- π Test 12: Moonlander + Harvey cross-cluster management via Crossplane
- π¨ Genesis Design Doc - Original vision
- π§ͺ TDG Plan - Test-driven development roadmap
- πΊοΈ Repository Overview - Full constellation map
- π README - This README.md, gen. Claude Desktop
- π° COST
This project integrates with 8+ repositories:
| Repo | Purpose | Status |
|---|---|---|
| urmanac/aws-accounts | Infrastructure Terraform | β Active |
| kingdon-ci/cozy-fleet | Flux GitOps | β Active |
| cozystack/talm | GitOps Talos Management | π― Core Tool |
| kingdon-ci/kaniko-builder | Custom image builds | π§ Tool |
| kingdon-ci/time-tracker | Session tracking | βοΈ Optional |
| kingdonb/mecris | MCP server patterns | π Reference |
| kingdon-ci/noclaude | Self-hosted AI | π€ Future |
| chanwit/tdg | TDG Methodology | π Methodology |
See: docs/REPO-OVERVIEW.md for full dependency graph.
-
Home Lab Reality Check π₯
- Temperature monitoring
- Power consumption
- The space heater problem
-
AWS Economics π°
- Live cost explorer query
- $0.04/month current state
- Free tier breakdown
-
Netboot Magic β‘
- Launch t4g.small instance
- Watch Talos netboot (< 5 min)
- CozyStack dashboard
-
SpinKube on ARM64 π―
- Deploy demo app
- Show running workload
- Verify ARM64 architecture
-
The Exit πͺ
- Terminate instance
- Return to $0.04/month
- Compare to home lab costs
- πΊ YouTube: @yebyen/streams
- π₯ CozyStack Speed Runs: Previous demos and validation runs
# AWS CLI with MFA-authenticated profile
aws configure --profile sb-terraform-mfa-session
# Terraform (or OpenTofu)
brew install opentofu
# kubectl + talosctl
brew install kubectl
brew install siderolabs/tap/talosctl
# Flux CLI
brew install fluxcd/tap/flux# Clone this repo
git clone https://github.com/urmanac/cozystack-moon-and-back.git
cd cozystack-moon-and-back
# Review TDG tests
./tests/run-all.sh --dry-run
# Deploy network foundation (Test 1)
cd terraform/network
terraform init
terraform plan
terraform apply
# Deploy bastion (Test 2-3)
cd ../bastion
terraform apply
# Verify netboot infrastructure (Test 3)
ssh [email protected] "docker ps"
# Launch Talos node (Test 4)
# (Manual for now, see docs/BOOTSTRAP.md)# Get talos config
talosctl -n 10.20.13.x config
# Bootstrap cluster
talosctl -n 10.20.13.x bootstrap
# Install CozyStack
# (See docs/COZYSTACK.md for detailed steps)This project demonstrates:
- β¨ Hybrid Cloud Economics - When cloud makes sense vs. home lab
- ποΈ Infrastructure Replication - Exact topology in AWS and home
- π§ ARM64 Validation - Test before bare-metal deployment
- π Network Architecture - Private-first, GDPR-safe design
- π¦ Custom Talos Images - Extensions for Spin + Tailscale
- π GitOps with Flux - Including new ExternalArtifact features
- π° Cost Optimization - Free tier strategies and monitoring
- π§ͺ TDG Methodology - Test-driven infrastructure generation
- Tests 1-6 passing (Network β Demo workload)
- Live netboot < 5 minutes
- SpinKube demo runs on ARM64
- Cost stays under $0.10/month
- Audience can replicate in their own AWS account
- Home lab transitions to Raspberry Pi CM3 modules
- Office temperature drops 15Β°F
- Power bill decreases measurably
- Wife's approval rating improves π
Speaker: Kingdon Barrett
Flux Maintainer, DevOps Engineer at Navteca, LLC
Working on Science Cloud for NASA Goddard Space Flight Center
Methodology: Chanwit Kaewkasi
TDG Innovator
Platform: Andrei Kvapil
CozyStack Creator
Built with:
- π€ Claude (Anthropic) - Infrastructure design & TDG implementation
- π§° CozyStack - Kubernetes platform for bare metal
- π§ Talos Linux - Immutable Kubernetes OS
- βοΈ AWS - Free tier cloud validation
- π Flux - GitOps toolkit
- π SpinKube - WebAssembly on Kubernetes
| Date | Milestone |
|---|---|
| Nov 16 | π¬ Project kickoff, TDG tests defined |
| Nov 23 | ποΈ Network foundation + bastion deployed |
| Nov 30 | π§ First Talos node netboots successfully |
| Dec 3 | π€ Live demo at CozySummit Virtual 2025 |
| Dec 31 | π Home lab transitions to Raspberry Pi |
Free tier expires: December 2025 (t4g instances)
This is a conference talk demo, but if you want to replicate or improve:
- Follow TDG - Write tests first
- Reference, don't duplicate - Reuse existing repos
- Document your journey - Others can learn from your experience
- Share costs - Transparency helps everyone
Open issues for questions, PRs for improvements!
Apache 2.0 - See LICENSE for details.
- π€ CozySummit Virtual 2025
- πΊ YouTube: @yebyen/streams
- π¦ Follow updates on Twitter (add your handle)
- π¬ Join CozyStack Community (add Discord/Slack)
"It's 2025 - If you're running a cluster, why not host it in the cloud first?"
π β βοΈ β π β π₯§
From basement to cloud and back to Raspberry Pi