|
| 1 | +--- |
| 2 | +title: "Announcing AKS Automatic managed system node pools (preview) and the Pod readiness SLA" |
| 3 | +description: "Learn how AKS Automatic now offers managed system node pools to ship apps faster. The Pod readiness SLA guarantees your apps are serving users, beyond a healthy control plane." |
| 4 | +date: 2025-11-26 |
| 5 | +authors: ["ahmed-sabbour"] |
| 6 | +tags: |
| 7 | + - aks-automatic |
| 8 | +--- |
| 9 | + |
| 10 | +In Azure Kubernetes Service (AKS), nodes with the same configuration (operating system and VM size) are grouped into *node pools*. AKS clusters use two node pool modes: *system node pools* host critical platform components that keep your cluster running, while *user node pools* run your application workloads. Traditionally, you manage both types yourself. You select VM sizes, set node counts, configure autoscaling, and plan capacity for system components. As your cluster grows or workload requirements change, you must revisit these settings to maintain resiliency. |
| 11 | + |
| 12 | +AKS Automatic simplifies this by enabling teams to ship applications with production-grade defaults from day one. With **managed system node pools (preview)**, AKS takes this further. The system pool is now fully managed by Microsoft. Core cluster components run on Microsoft-owned infrastructure, so you no longer provision, patch, or scale system nodes. You focus on your apps while AKS handles the operational overhead of keeping the cluster healthy. |
| 13 | + |
| 14 | +Automatic clusters with managed system node pools also introduce the **Pod readiness Service Level Agreement (SLA)**. Beyond API server uptime, AKS now guarantees your pods reach readiness and serve users. |
| 15 | + |
| 16 | +<!-- truncate --> |
| 17 | + |
| 18 | +:::info |
| 19 | + |
| 20 | +Learn more in the official documentation: [Managed system node pools on AKS Automatic (preview)](https://learn.microsoft.com/azure/aks/automatic/aks-automatic-managed-system-node-pools-about) |
| 21 | + |
| 22 | +::: |
| 23 | + |
| 24 | +## Why it matters |
| 25 | + |
| 26 | +- **Reduced operational overhead:** AKS handles provisioning, patching, upgrades, and scaling for the system pool, so you spend less time on infrastructure maintenance. |
| 27 | +- **Managed add-on hosting at lower cost:** Core services like Azure Monitor collectors, CoreDNS, KEDA, VPA, Konnectivity, Eraser, and Metrics Server run on Microsoft-owned infrastructure. Some add-ons and DaemonSets still run on nodes in your subscription. |
| 28 | +- **Built-in security policies:** Deployment Safeguards enforce pod security standards, restrict access to platform namespaces, and block risky configurations by default. |
| 29 | +- **Automatic upgrades:** AKS keeps platform components current, reducing the risk of running outdated or vulnerable system software. |
| 30 | +- **Pod readiness SLA:** A financially backed guarantee that your pods reach readiness and serve traffic, not just that your cluster is healthy. Refer to the [SLA](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services) for details. |
| 31 | + |
| 32 | + |
| 33 | + |
| 34 | +## Components running on managed system node pools |
| 35 | + |
| 36 | +AKS manages the following platform components on the managed system node pool. You don't need to provision capacity for these services. |
| 37 | + |
| 38 | +| Component | Description | |
| 39 | +| --- | --- | |
| 40 | +| [Azure Monitor](https://learn.microsoft.com/azure/aks/monitor-aks) | Collects container logs, scrapes Prometheus metrics, and gathers Kubernetes object state for observability and alerting | |
| 41 | +| [CoreDNS](https://learn.microsoft.com/azure/aks/coredns-custom) | Provides cluster DNS resolution for service discovery | |
| 42 | +| [Eraser](https://learn.microsoft.com/azure/aks/image-cleaner) | Removes unused and vulnerable container images from nodes | |
| 43 | +| [KEDA](https://learn.microsoft.com/azure/aks/keda-about) | Scales workloads based on event-driven metrics such as queue length or HTTP traffic | |
| 44 | +| Konnectivity | Maintains secure connectivity between the control plane and nodes | |
| 45 | +| [Metrics Server](https://learn.microsoft.com/azure/aks/monitor-aks-reference) | Exposes resource metrics for Horizontal Pod Autoscaler and kubectl top | |
| 46 | +| [VPA](https://learn.microsoft.com/azure/aks/vertical-pod-autoscaler) | Recommends and applies optimal CPU and memory requests for pods | |
| 47 | +| [Workload Identity webhook](https://learn.microsoft.com/azure/aks/workload-identity-overview) | Injects Azure environment variables and projected service account tokens into pods for Microsoft Entra ID authentication | |
| 48 | + |
| 49 | +Other add-ons and extensions, outside of that list, run on `aks-system-surge` nodes, with scaling handled by [Node Auto-Provisioning (NAP)](https://learn.microsoft.com/azure/aks/node-auto-provisioning). `DaemonSets` run on both managed system node pools and nodes in your subscription. |
| 50 | + |
| 51 | +## How managed system node pools differ from traditional system node pools |
| 52 | + |
| 53 | +| Aspect | AKS Standard system pool | AKS Automatic managed system pool | |
| 54 | +| --- | --- | --- | |
| 55 | +| **Provisioning** | You create the pool, select VM SKUs, set node count, and configure OS disk size | AKS provisions and sizes the pool for you automatically | |
| 56 | +| **Capacity planning** | You [estimate headroom for system components](https://learn.microsoft.com/azure/aks/use-system-pools?tabs=azure-cli#system-and-user-node-pools) like CoreDNS, Konnectivity, metrics-server, and any add-ons; scale manually or configure cluster autoscaler with min/max counts | AKS right-sizes capacity for platform components and scales automatically when add-ons need more room without taking up quota in your subscription | |
| 57 | +| **Cost** | System nodes are billed as standard VMs to your subscription; you pay for system pool capacity | System nodes do not run on your subscription | |
| 58 | +| **Service Level Agreements (SLAs)** | API server uptime SLA | API server uptime SLA and pod readiness SLA | |
| 59 | + |
| 60 | + |
| 61 | + |
| 62 | +## Guardrails for security and reliability |
| 63 | + |
| 64 | +Security misconfigurations are a leading cause of container breaches. AKS Automatic addresses this by enforcing [Deployment Safeguards](https://learn.microsoft.com/azure/aks/deployment-safeguards) that validate every workload against the [Kubernetes Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/) before it reaches your cluster. Baseline policies block dangerous privilege escalations while restricted policies enforce maximum hardening. Compliance flows into Azure Policy dashboards automatically. |
| 65 | + |
| 66 | +These policies also improve workload reliability. Resource limits prevent runaway containers from starving neighbors. Health probes ensure traffic reaches only healthy pods. Anti-affinity rules spread replicas across failure domains. PodDisruptionBudget validation keeps node maintenance on schedule. |
| 67 | + |
| 68 | +Since AKS manages the system node pool on your behalf, additional restrictions protect platform stability. User workloads cannot run on the managed system node pool and all create, update, and delete operations on managed system pool resources are denied since Microsoft hosts the system node pool outside of your subscription, as are pod `exec`, `attach`, and `kubectl debug` operations. |
| 69 | + |
| 70 | +**Preventing container escapes:** Blocking privileged containers, host namespaces, host ports, and hostPath volumes for alignment with security best practices. |
| 71 | + |
| 72 | +**Reducing attack surface:** Restricting Linux capabilities to a minimal set means processes run with only the permissions they need. Fewer capabilities translate directly to fewer exploitation opportunities. |
| 73 | + |
| 74 | +**Enforcing least privilege:** Requiring containers to run as non-root and disabling privilege escalation limits the blast radius of any vulnerability. |
| 75 | + |
| 76 | +**Maintaining kernel protections:** Seccomp, AppArmor, and SELinux profiles filter system calls and confine container behavior. Policies ensure these protections stay active. |
| 77 | + |
| 78 | +**Enabling safe cluster operations:** Limiting `sysctls` to safe parameters and protecting node objects ensures platform components run undisturbed and node drains proceed smoothly. |
| 79 | + |
| 80 | +For detailed specifications, see the [Deployment Safeguards documentation](https://learn.microsoft.com/azure/aks/deployment-safeguards). |
| 81 | + |
| 82 | +## Pod Readiness SLA for AKS Automatic |
| 83 | + |
| 84 | + |
| 85 | + |
| 86 | +Uptime means more than a healthy control plane; it means your applications are actually serving users. The [Pod Readiness SLA](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services) guarantees that pods reach readiness targets, closing the gap between "the cluster is healthy" and "my app is ready." |
| 87 | + |
| 88 | +- **Faster recovery during failures:** Node failures and scale events trigger remediation so pods return to a ready state within defined thresholds. |
| 89 | +- **Predictable reliability:** Availability planning aligns with measurable guarantees instead of best-effort behavior. |
| 90 | +- **Reduced operational overhead:** Platform automation handles remediation, eliminating manual firefighting during disruptions. |
| 91 | +- **Business continuity at scale:** Mission-critical services experience minimal disruption even during infrastructure events. |
| 92 | + |
| 93 | +## Pricing |
| 94 | + |
| 95 | +AKS Automatic pricing includes a fixed monthly cluster fee and per-vCPU charges on top of standard VM compute costs. This pricing includes financially backed SLAs for both API server uptime and pod readiness. For current rates and a full breakdown by VM category, see the [Azure Kubernetes Service pricing page](https://azure.microsoft.com/pricing/details/kubernetes-service#pricing). |
| 96 | + |
| 97 | +## Getting started |
| 98 | + |
| 99 | +### Prerequisites |
| 100 | + |
| 101 | +- Azure CLI 2.77.0 or later. |
| 102 | +- `aks-preview` extension 19.0.0b15 or later. |
| 103 | + |
| 104 | +```bash |
| 105 | +# Install or update the aks-preview extension |
| 106 | +az extension add --name aks-preview |
| 107 | +az extension update --name aks-preview |
| 108 | +``` |
| 109 | + |
| 110 | +### Register the preview feature |
| 111 | + |
| 112 | +```bash |
| 113 | +az feature register --name AKS-AutomaticHostedSystemProfilePreview --namespace Microsoft.ContainerService |
| 114 | +``` |
| 115 | + |
| 116 | +### Create the cluster |
| 117 | + |
| 118 | +Select a region where managed system node pools are available. Check the [supported regions for managed system node pools](https://aka.ms/aks/automatic/managed-systempool-regions). |
| 119 | + |
| 120 | +#### Set your variables |
| 121 | + |
| 122 | +```bash |
| 123 | +RESOURCE_GROUP="myResourceGroup" |
| 124 | +CLUSTER_NAME="myAKSCluster" |
| 125 | +LOCATION="westcentralus" # Choose a supported region (see: https://aka.ms/aks/automatic/managed-systempool-regions) |
| 126 | +``` |
| 127 | + |
| 128 | +#### Create the resource group |
| 129 | + |
| 130 | +```bash |
| 131 | +az group create --name $RESOURCE_GROUP --location $LOCATION |
| 132 | +``` |
| 133 | + |
| 134 | +#### Create an Automatic cluster with a managed system node pool |
| 135 | + |
| 136 | +```bash |
| 137 | +az aks create \ |
| 138 | +--resource-group $RESOURCE_GROUP \ |
| 139 | +--name $CLUSTER_NAME \ |
| 140 | +--location $LOCATION \ |
| 141 | +--sku automatic \ |
| 142 | +--enable-hosted-system |
| 143 | +``` |
| 144 | + |
| 145 | +The output includes `"hostedSystemProfile": { "enabled": true }` confirming the feature is active. |
| 146 | + |
| 147 | +### Connect to the cluster and deploy an application |
| 148 | + |
| 149 | +Get credentials for your cluster and deploy the [AKS Store demo application](https://github.com/Azure-Samples/aks-store-demo): |
| 150 | + |
| 151 | +```bash |
| 152 | +az aks get-credentials --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME |
| 153 | + |
| 154 | +kubectl create ns aks-store-demo |
| 155 | +kubectl apply -n aks-store-demo -f https://aka.ms/aks/quickstarts/store.yaml |
| 156 | +``` |
| 157 | + |
| 158 | +Check the ingress address and open it in your browser once an IP is assigned: |
| 159 | + |
| 160 | +```bash |
| 161 | +kubectl get ingress store-front -n aks-store-demo --watch |
| 162 | +``` |
| 163 | + |
| 164 | + |
| 165 | + |
| 166 | +Your workload runs on user node pools in your subscription that Node Auto Provisioning will create, while system services stay on the managed pool. |
| 167 | + |
| 168 | + |
| 169 | + |
| 170 | +:::tip |
| 171 | + |
| 172 | +Prefer a graphical experience? [AKS Desktop](https://learn.microsoft.com/azure/aks/aks-desktop-overview) lets you manage clusters, view workloads, and troubleshoot issues without leaving your desktop. |
| 173 | + |
| 174 | +::: |
| 175 | + |
| 176 | +The managed system nodes will not be running in your Azure subscription. |
| 177 | + |
| 178 | + |
| 179 | + |
| 180 | +## Looking ahead |
| 181 | + |
| 182 | +Upcoming improvements include custom virtual network support, optimized platform components with reduced resource overhead, faster cluster provisioning, and a streamlined path to Deployment Safeguards compliance. Longer term, managed system node pools will extend to all existing AKS Automatic clusters. |
| 183 | + |
| 184 | +Follow the [AKS public roadmap](https://aka.ms/aks/roadmap) for updates on these features. |
| 185 | + |
| 186 | +## Next steps |
| 187 | + |
| 188 | +Ready to get started? |
| 189 | + |
| 190 | +1. **Try it now:** Follow the [managed system node pools quickstart](https://learn.microsoft.com/azure/aks/automatic/aks-automatic-managed-system-node-pools). |
| 191 | +2. **Share feedback:** Open issues or ideas in [AKS GitHub Issues](https://github.com/Azure/AKS/issues). |
| 192 | +3. **Join the community:** Subscribe to the [AKS Community YouTube](https://www.youtube.com/@theakscommunity) and follow [@theakscommunity](https://x.com/theakscommunity) on X. |
| 193 | + |
| 194 | +Share your experience with how managed system node pools simplify your operations and where the service can continue to improve. |
0 commit comments