Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 22 additions & 1 deletion blog/_data/authors.yml
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,15 @@ Vaibhav Arora:
icon: "fab fa-fw fa-linkedin"
url: "https://www.linkedin.com/in/varora24/"

Wilson Darko:
name : "Wilson Darko"
bio : "Product Manager for Azure Kubernetes Service"
avatar : "https://avatars.githubusercontent.com/u/172273315?v=4"
links:
- label: "LinkedIn"
icon: "fab fa-fw fa-linkedin"
url: "https://www.linkedin.com/in/wilsondarko/"

Mitch Shao:
name : "Mitch Shao"
bio : "Senior Software Engineer for Azure Kubernetes Service."
Expand Down Expand Up @@ -353,4 +362,16 @@ Steve Womack:
url: "https://github.com/stwomack"
- label: "LinkedIn"
icon: "fab fa-fw fa-linkedin"
url: "https://www.linkedin.com/in/steve-womack-4725ba2/"
url: "https://www.linkedin.com/in/steve-womack-4725ba2/"

Wilson Darko:
name : "Wilson Darko"
bio : "Product Manager at Microsoft"
avatar : https://avatars.githubusercontent.com/u/172273315
links:
- label: "LinkedIn"
icon: "fab fa-fw fa-linkedin"
url: "https://www.linkedin.com/in/wilsondarko/"
- label: "GitHub"
icon: "fab fa-fw fa-github"
url: "https://github.com/wdarko1"
76 changes: 76 additions & 0 deletions blog/_posts/2025-08-14-node-autoprovisioning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
title: "Announcement - Node Auto Provisioning"
description: "Learn about Node Auto Provisioning for Azure Kubernetes Service, based on Karpenter, and how it can simplify the scaling experience for your workloads on AKS."
date: 2025-08-14
author: Wilson Darko # must match the authors.yml in the _data folder
categories:
- general, add-ons, compute
---

We’re excited to announce the General Availability (GA) of Node Auto Provisioning (NAP) for Azure Kubernetes Service (AKS)!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start / lead with the problem statement for Why NAP came into being - what was the status before this . Who were the customers / personas requesting for this and why .


Managing dynamic workloads in Kubernetes often leads to overprovisioning, idle resources, and operational overhead from maintaining pre-configured node pools. [Karpenter](https://karpenter.sh/), the open-source CNCF project, allows for compute-optimized node provisioning, that offers flexibility, cost savings, and simplicity.

With node auto provisioning (NAP), our managed add-on for the open-source [Karpenter](https://karpenter.sh/) project on AKS, AKS now automatically provisions single-instance nodes (VMs) in response to pending pod requirements, and optimizes the scaling experience by provisioning right-sized compute in the more efficient and cost-effective manner. With NAP, we bring a new approach to autoscaling and provisioning that outperforms cluster autoscaler in compute efficiency, cost, and ease of use. NAP offers greater flexibility, allowing many SKU sizes, spot and on-demand capacity, and multiple architecture types such as AMD and ARMx64 in the same cluster, without the need for separate node pools.

![nap-ga-announcement](/assets/images/aks-nap/nap-ga-announcement.jpg)

## What’s New with NAP since Preview Launch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth deep diving into a section on 1 or 2 key important bits from the new announcement in your blog for example: Improved ugprade experience and Observability could be a deep dive topic ?

- More networking options: Azure CNI with or without Overlay is supported, Cilium dataplane is supported; BYO CNI is allowed. Custom Virtual Networks (VNet) is supported. For more on networking, visit our [NAP networking documentation](https://learn.microsoft.com/azure/aks/node-autoprovision-networking)
- New method of node bootstrapping; better reliability, from compatibility with AKS VM image releases and AKS bootstrapping configuration updates.
- Improved upgrade experience, including support for [AKS Maintenance Windows](https://learn.microsoft.com/azure/aks/planned-maintenance) and [Karpenter disruption budgets](https://learn.microsoft.com/azure/aks/node-autoprovision-disruption#disruption-budgets), which allow users to control the speed of disruption in the cluster.
- New Karpenter core capabilities integrated and supported: v1 NodePool API, expanded support for [disruption budgets](https://learn.microsoft.com/en-us/azure/aks/node-autoprovision-disruption) features such as terminationGracePeriod, consolidateAfter, forceful expiration, node repair, and more. [NodePool configuration files](https://learn.microsoft.com/azure/aks/node-autoprovision-node-pools) now have status conditions that indicate if they are ready. See core releases for details.
- New Azure Karpenter Provider capabilities: support for Azure Linux (including v3), ephemeral disk placement, Linux admin username, custom kubelet configuration, tagging of Azure resources, artifact streaming, non-zonal regions and VM SKUs, zone constraint in NodePool requirements (and generally cleaner set of selectors), node auto-repair, network interface garbage collection, support for NVMe-only VM SKUs, AKS/Kubernetes 1.30-33, readiness status in AKSNodeClass, and more. See [Azure Karpenter provider releases](https://github.com/Azure/karpenter-provider-azure/releases) for details.
- Improved observability, via rich set of metrics (accessible through [Managed Prometheus](https://learn.microsoft.com/azure/azure-monitor/metrics/prometheus-metrics-overview)), accessible and improved logs through [Azure Monitor](https://learn.microsoft.com/azure/azure-monitor/metrics/data-platform-metrics), NodePool / AKSNodeClass / NodeClaim conditions (including reasons for drift or provisioning failures), and events.
- Default NAP node pools are now optional; [disabling NAP is supported](https://learn.microsoft.com/azure/aks/node-autoprovision#disabling-node-autoprovisioning).
- Better performance, reliability, error handling (including handling of provisioning failures), and security. Numerous bugfixes and issues resolved.
- [Extensive test coverage](https://github.com/Azure/karpenter-provider-azure/tree/main/test). Eight new E2E test suites, ~100 total scenarios. 95% unit test coverage.
- Github Contributions welcome! (GitHub Codespace-based dev/test environment in 5 min)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a section on how the Preview metrics / adoption has been so far on this.

## How node auto provisioning works

Node auto provisioning takes the requirements set by the user in the workload deployment file, and custom resource definition (CRD) files such as the [NodePool](https://learn.microsoft.com/azure/aks/node-autoprovision-node-pools) and [AKSNodeClass](https://learn.microsoft.com/azure/aks/node-autoprovision-aksnodeclass), and provisions nodes that will meet these criteria.

![nap-how-it-works](/assets/images/aks-nap/nap-how-it-works-image.png)

NAP works at the infrastructure level, and adjusts the quantity and sizes of VMs. NAP can be paired with application-level scalers, which affect CPU/Memory resource allocation and pod replica count, such as [KEDA](https://learn.microsoft.com/azure/aks/keda-about), [Horizontal Pod Autoscaler](https://learn.microsoft.com/azure/aks/concepts-scale#horizontal-pod-autoscaler), or [Vertical Pod Autoscaler](https://learn.microsoft.com/en-us/azure/aks/vertical-pod-autoscaler). For more on this, check out our [AKS Scalers Deep Dive on Youtube](https://www.youtube.com/watch?v=oILHg5hsZQ0).

## Node auto provisioning vs. cluster autoscaler
Copy link
Contributor

@kevinkrp93 kevinkrp93 Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be a couple of examples (i.e scenario based - since it is a blog post) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a customer when should i be using CAS and when should i be using NAP . Can you give me some real world examples with suggestions.


Cluster autoscaler, the standard Kubernetes autoscaler solution, requires the use of same VM size node pools, and scales pre-existing node pools up or down. NAP works instead at the cluster level, and manages single-instance VMs, also handling the provisioning experience for multiple VM sizes and architecture at once. NAP allows for better bin-packing, cost savings, and performance than cluster autoscaler.
Copy link
Contributor

@kevinkrp93 kevinkrp93 Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CAS also works t cluster level but we can disable or enable multiple nodepools at a time. can we rephrase it? (because it might confuse some customers)


## NAP vs Self-Hosted Karpenter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again could you be more specific on an opinion for us to have on why you should go with Karpenter and when / where NAP is better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the challenges with OSS version.

Karpenter is the OSS project that schedules workloads for efficient compute usage. Our [AKS Karpenter Provider (self-hosted)](https://github.com/Azure/karpenter-provider-azure) makes use of Karpenter on Azure available. Node Auto-provisioning (NAP) is our managed add-on for Karpenter on AKS that manages certain aspects of the Karpenter experience on Azure. NAP is the recommended mode for most users for a few reasons.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"on Azure available" sounds like there was more to be said here


NAP manages:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide links to all these sections always assume a reader may not be familiar with some of these things.

- Node Image upgrades (Linux)
- Kubernetes version upgrades
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give some description on how each of these are managed and potential deep dive into more salient / frequently asked one.

- Karpenter version updates
- VM OS disk updates
- Karpenter Logs (through Azure Monitor)
- Metrics (through Managed Prometheus)

In self-hosted Karpenter, users are responsible for managing these processes. Self-hosted mode is useful for advanced users who want to customize or experiment with Karpenter's deployment. The managed add-on NAP simplifies this experience and allows you to focus on your workloads rather than infrastructure.

## Getting Started with NAP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THis should be at the top alongside prereq . It comes way too late.


To get started with NAP, you can visit the [Node Auto Provisioning documentation](https://learn.microsoft.com/azure/aks/node-autoprovision). The documentation includes resources, requirements, and all the info needed to enable node auto provisioning in your cluster today.

## Roadmap + Next Steps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like Sean suggested - Blogs are a funnel into how AKS PG are thinking , you can tell the audience why we are prioritizing something first vs the other be more descriptive and provide behind the scenes view - think of power user in mind as they are the audience here.


We’re continuing to expand the capabilities for NAP for additional feature support and performance. Our upcoming roadmap of feature support includes:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide a little bit more color what are doing first , milestones tentatively , GH roadmap links to these ?

- Sovereign Cloud + Air-Gapped Cloud Support
- FIPS compliant node image support
- Disk Encryption Sets
- Custom CA Certificates
- Windows support
- Private Cluster Support
- and more...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we tell how we are differentiated from the compete briefly here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to AKS / Karpenter roadmap too?


## Get Involved

Contribute to the open-source [Azure Karpenter Provider](https://github.com/Azure/karpenter-provider-azure), which Node Auto Provisioning is based on. The provider features over 37 releases, 1000+ PRs, 14,000 CI runs, and a growing communitiy of contributors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide a link to our AKS me series here and your YT video too . This is a good place for external segway .

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.