Skip to content

Commit c92f65d

Browse files
authored
Automatic managed system nodes blog post (#5466)
Added AKS Automatic managed system node pool and pod readiness SLA post
1 parent 7c9b1f9 commit c92f65d

File tree

8 files changed

+200
-0
lines changed

8 files changed

+200
-0
lines changed
399 KB
Loading

website/blog/2025-11-26-aks-automatic-managed-system-node-pools/aks-managed-arch.svg

Lines changed: 1 addition & 0 deletions
Loading
146 KB
Loading
122 KB
Loading
529 KB
Loading
Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
---
2+
title: "Announcing AKS Automatic managed system node pools (preview) and the Pod readiness SLA"
3+
description: "Learn how AKS Automatic now offers managed system node pools to ship apps faster. The Pod readiness SLA guarantees your apps are serving users, beyond a healthy control plane."
4+
date: 2025-11-26
5+
authors: ["ahmed-sabbour"]
6+
tags:
7+
- aks-automatic
8+
---
9+
10+
In Azure Kubernetes Service (AKS), nodes with the same configuration (operating system and VM size) are grouped into *node pools*. AKS clusters use two node pool modes: *system node pools* host critical platform components that keep your cluster running, while *user node pools* run your application workloads. Traditionally, you manage both types yourself. You select VM sizes, set node counts, configure autoscaling, and plan capacity for system components. As your cluster grows or workload requirements change, you must revisit these settings to maintain resiliency.
11+
12+
AKS Automatic simplifies this by enabling teams to ship applications with production-grade defaults from day one. With **managed system node pools (preview)**, AKS takes this further. The system pool is now fully managed by Microsoft. Core cluster components run on Microsoft-owned infrastructure, so you no longer provision, patch, or scale system nodes. You focus on your apps while AKS handles the operational overhead of keeping the cluster healthy.
13+
14+
Automatic clusters with managed system node pools also introduce the **Pod readiness Service Level Agreement (SLA)**. Beyond API server uptime, AKS now guarantees your pods reach readiness and serve users.
15+
16+
<!-- truncate -->
17+
18+
:::info
19+
20+
Learn more in the official documentation: [Managed system node pools on AKS Automatic (preview)](https://learn.microsoft.com/azure/aks/automatic/aks-automatic-managed-system-node-pools-about)
21+
22+
:::
23+
24+
## Why it matters
25+
26+
- **Reduced operational overhead:** AKS handles provisioning, patching, upgrades, and scaling for the system pool, so you spend less time on infrastructure maintenance.
27+
- **Managed add-on hosting at lower cost:** Core services like Azure Monitor collectors, CoreDNS, KEDA, VPA, Konnectivity, Eraser, and Metrics Server run on Microsoft-owned infrastructure. Some add-ons and DaemonSets still run on nodes in your subscription.
28+
- **Built-in security policies:** Deployment Safeguards enforce pod security standards, restrict access to platform namespaces, and block risky configurations by default.
29+
- **Automatic upgrades:** AKS keeps platform components current, reducing the risk of running outdated or vulnerable system software.
30+
- **Pod readiness SLA:** A financially backed guarantee that your pods reach readiness and serve traffic, not just that your cluster is healthy. Refer to the [SLA](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services) for details.
31+
32+
![Architecture diagram showing managed system node pools hosted on Microsoft infrastructure with platform components separated from user workloads](aks-managed-arch.svg)
33+
34+
## Components running on managed system node pools
35+
36+
AKS manages the following platform components on the managed system node pool. You don't need to provision capacity for these services.
37+
38+
| Component | Description |
39+
| --- | --- |
40+
| [Azure Monitor](https://learn.microsoft.com/azure/aks/monitor-aks) | Collects container logs, scrapes Prometheus metrics, and gathers Kubernetes object state for observability and alerting |
41+
| [CoreDNS](https://learn.microsoft.com/azure/aks/coredns-custom) | Provides cluster DNS resolution for service discovery |
42+
| [Eraser](https://learn.microsoft.com/azure/aks/image-cleaner) | Removes unused and vulnerable container images from nodes |
43+
| [KEDA](https://learn.microsoft.com/azure/aks/keda-about) | Scales workloads based on event-driven metrics such as queue length or HTTP traffic |
44+
| Konnectivity | Maintains secure connectivity between the control plane and nodes |
45+
| [Metrics Server](https://learn.microsoft.com/azure/aks/monitor-aks-reference) | Exposes resource metrics for Horizontal Pod Autoscaler and kubectl top |
46+
| [VPA](https://learn.microsoft.com/azure/aks/vertical-pod-autoscaler) | Recommends and applies optimal CPU and memory requests for pods |
47+
| [Workload Identity webhook](https://learn.microsoft.com/azure/aks/workload-identity-overview) | Injects Azure environment variables and projected service account tokens into pods for Microsoft Entra ID authentication |
48+
49+
Other add-ons and extensions, outside of that list, run on `aks-system-surge` nodes, with scaling handled by [Node Auto-Provisioning (NAP)](https://learn.microsoft.com/azure/aks/node-auto-provisioning). `DaemonSets` run on both managed system node pools and nodes in your subscription.
50+
51+
## How managed system node pools differ from traditional system node pools
52+
53+
| Aspect | AKS Standard system pool | AKS Automatic managed system pool |
54+
| --- | --- | --- |
55+
| **Provisioning** | You create the pool, select VM SKUs, set node count, and configure OS disk size | AKS provisions and sizes the pool for you automatically |
56+
| **Capacity planning** | You [estimate headroom for system components](https://learn.microsoft.com/azure/aks/use-system-pools?tabs=azure-cli#system-and-user-node-pools) like CoreDNS, Konnectivity, metrics-server, and any add-ons; scale manually or configure cluster autoscaler with min/max counts | AKS right-sizes capacity for platform components and scales automatically when add-ons need more room without taking up quota in your subscription |
57+
| **Cost** | System nodes are billed as standard VMs to your subscription; you pay for system pool capacity | System nodes do not run on your subscription |
58+
| **Service Level Agreements (SLAs)** | API server uptime SLA | API server uptime SLA and pod readiness SLA |
59+
60+
![Comparison diagram showing AKS Standard requiring manual system pool management versus AKS Automatic with fully managed system pools](aks-standard-automatic.png)
61+
62+
## Guardrails for security and reliability
63+
64+
Security misconfigurations are a leading cause of container breaches. AKS Automatic addresses this by enforcing [Deployment Safeguards](https://learn.microsoft.com/azure/aks/deployment-safeguards) that validate every workload against the [Kubernetes Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/) before it reaches your cluster. Baseline policies block dangerous privilege escalations while restricted policies enforce maximum hardening. Compliance flows into Azure Policy dashboards automatically.
65+
66+
These policies also improve workload reliability. Resource limits prevent runaway containers from starving neighbors. Health probes ensure traffic reaches only healthy pods. Anti-affinity rules spread replicas across failure domains. PodDisruptionBudget validation keeps node maintenance on schedule.
67+
68+
Since AKS manages the system node pool on your behalf, additional restrictions protect platform stability. User workloads cannot run on the managed system node pool and all create, update, and delete operations on managed system pool resources are denied since Microsoft hosts the system node pool outside of your subscription, as are pod `exec`, `attach`, and `kubectl debug` operations.
69+
70+
**Preventing container escapes:** Blocking privileged containers, host namespaces, host ports, and hostPath volumes for alignment with security best practices.
71+
72+
**Reducing attack surface:** Restricting Linux capabilities to a minimal set means processes run with only the permissions they need. Fewer capabilities translate directly to fewer exploitation opportunities.
73+
74+
**Enforcing least privilege:** Requiring containers to run as non-root and disabling privilege escalation limits the blast radius of any vulnerability.
75+
76+
**Maintaining kernel protections:** Seccomp, AppArmor, and SELinux profiles filter system calls and confine container behavior. Policies ensure these protections stay active.
77+
78+
**Enabling safe cluster operations:** Limiting `sysctls` to safe parameters and protecting node objects ensures platform components run undisturbed and node drains proceed smoothly.
79+
80+
For detailed specifications, see the [Deployment Safeguards documentation](https://learn.microsoft.com/azure/aks/deployment-safeguards).
81+
82+
## Pod Readiness SLA for AKS Automatic
83+
84+
![Diagram showing two SLA guarantees for AKS Automatic: 99.95% API server uptime and 99.9% pod readiness within 5 minutes](automatic-slas.png)
85+
86+
Uptime means more than a healthy control plane; it means your applications are actually serving users. The [Pod Readiness SLA](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services) guarantees that pods reach readiness targets, closing the gap between "the cluster is healthy" and "my app is ready."
87+
88+
- **Faster recovery during failures:** Node failures and scale events trigger remediation so pods return to a ready state within defined thresholds.
89+
- **Predictable reliability:** Availability planning aligns with measurable guarantees instead of best-effort behavior.
90+
- **Reduced operational overhead:** Platform automation handles remediation, eliminating manual firefighting during disruptions.
91+
- **Business continuity at scale:** Mission-critical services experience minimal disruption even during infrastructure events.
92+
93+
## Pricing
94+
95+
AKS Automatic pricing includes a fixed monthly cluster fee and per-vCPU charges on top of standard VM compute costs. This pricing includes financially backed SLAs for both API server uptime and pod readiness. For current rates and a full breakdown by VM category, see the [Azure Kubernetes Service pricing page](https://azure.microsoft.com/pricing/details/kubernetes-service#pricing).
96+
97+
## Getting started
98+
99+
### Prerequisites
100+
101+
- Azure CLI 2.77.0 or later.
102+
- `aks-preview` extension 19.0.0b15 or later.
103+
104+
```bash
105+
# Install or update the aks-preview extension
106+
az extension add --name aks-preview
107+
az extension update --name aks-preview
108+
```
109+
110+
### Register the preview feature
111+
112+
```bash
113+
az feature register --name AKS-AutomaticHostedSystemProfilePreview --namespace Microsoft.ContainerService
114+
```
115+
116+
### Create the cluster
117+
118+
Select a region where managed system node pools are available. Check the [supported regions for managed system node pools](https://aka.ms/aks/automatic/managed-systempool-regions).
119+
120+
#### Set your variables
121+
122+
```bash
123+
RESOURCE_GROUP="myResourceGroup"
124+
CLUSTER_NAME="myAKSCluster"
125+
LOCATION="westcentralus" # Choose a supported region (see: https://aka.ms/aks/automatic/managed-systempool-regions)
126+
```
127+
128+
#### Create the resource group
129+
130+
```bash
131+
az group create --name $RESOURCE_GROUP --location $LOCATION
132+
```
133+
134+
#### Create an Automatic cluster with a managed system node pool
135+
136+
```bash
137+
az aks create \
138+
--resource-group $RESOURCE_GROUP \
139+
--name $CLUSTER_NAME \
140+
--location $LOCATION \
141+
--sku automatic \
142+
--enable-hosted-system
143+
```
144+
145+
The output includes `"hostedSystemProfile": { "enabled": true }` confirming the feature is active.
146+
147+
### Connect to the cluster and deploy an application
148+
149+
Get credentials for your cluster and deploy the [AKS Store demo application](https://github.com/Azure-Samples/aks-store-demo):
150+
151+
```bash
152+
az aks get-credentials --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME
153+
154+
kubectl create ns aks-store-demo
155+
kubectl apply -n aks-store-demo -f https://aka.ms/aks/quickstarts/store.yaml
156+
```
157+
158+
Check the ingress address and open it in your browser once an IP is assigned:
159+
160+
```bash
161+
kubectl get ingress store-front -n aks-store-demo --watch
162+
```
163+
164+
![Screenshot of the deployed application on an AKS Automatic cluster](contoso-pet-store.png)
165+
166+
Your workload runs on user node pools in your subscription that Node Auto Provisioning will create, while system services stay on the managed pool.
167+
168+
![Screenshot of AKS desktop application showing the nodes in the cluster](aks-desktop-nodes.png)
169+
170+
:::tip
171+
172+
Prefer a graphical experience? [AKS Desktop](https://learn.microsoft.com/azure/aks/aks-desktop-overview) lets you manage clusters, view workloads, and troubleshoot issues without leaving your desktop.
173+
174+
:::
175+
176+
The managed system nodes will not be running in your Azure subscription.
177+
178+
![Screenshot of the Azure portal showing that the managed system nodes are not there](portal-vms.png)
179+
180+
## Looking ahead
181+
182+
Upcoming improvements include custom virtual network support, optimized platform components with reduced resource overhead, faster cluster provisioning, and a streamlined path to Deployment Safeguards compliance. Longer term, managed system node pools will extend to all existing AKS Automatic clusters.
183+
184+
Follow the [AKS public roadmap](https://aka.ms/aks/roadmap) for updates on these features.
185+
186+
## Next steps
187+
188+
Ready to get started?
189+
190+
1. **Try it now:** Follow the [managed system node pools quickstart](https://learn.microsoft.com/azure/aks/automatic/aks-automatic-managed-system-node-pools).
191+
2. **Share feedback:** Open issues or ideas in [AKS GitHub Issues](https://github.com/Azure/AKS/issues).
192+
3. **Join the community:** Subscribe to the [AKS Community YouTube](https://www.youtube.com/@theakscommunity) and follow [@theakscommunity](https://x.com/theakscommunity) on X.
193+
194+
Share your experience with how managed system node pools simplify your operations and where the service can continue to improve.
128 KB
Loading

website/blog/tags.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,11 @@ airflow:
2727
permalink: /airflow
2828
description: Using Apache Airflow for orchestrating data and machine learning workflows on AKS.
2929

30+
aks-automatic:
31+
label: AKS Automatic
32+
permalink: /aks-automatic
33+
description: AKS Automatic for simplified, fully managed Kubernetes cluster operations.
34+
3035
aks-mcp:
3136
label: AKS-MCP
3237
permalink: /aks-mcp

0 commit comments

Comments
 (0)