Skip to content

Conversation

@wdarko1
Copy link
Contributor

@wdarko1 wdarko1 commented Aug 15, 2025

Blog post announcing NAP GA

@wdarko1 wdarko1 requested review from a team and palma21 as code owners August 15, 2025 00:59
@wdarko1 wdarko1 requested a review from thomas1206 August 15, 2025 00:59
Copy link
Contributor

@sabbour sabbour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed a portion


## Node auto provisioning vs. cluster autoscaler

Cluster autoscaler, the standard Kubernetes autoscaler solution, requires the use of same VM size node pools, and scales pre-existing node pools up or down. NAP works instead at the cluster level, and manages single-instance VMs, also handling the provisioning experience for multiple VM sizes and architecture at once. NAP allows for better bin-packing, cost savings, and performance than cluster autoscaler.
Copy link
Contributor

@kevinkrp93 kevinkrp93 Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CAS also works t cluster level but we can disable or enable multiple nodepools at a time. can we rephrase it? (because it might confuse some customers)

Copy link
Contributor

@kevinkrp93 kevinkrp93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrasing required


NAP works at the infrastructure level, and adjusts the quantity and sizes of VMs. NAP can be paired with application-level scalers, which affect CPU/Memory resource allocation and pod replica count, such as [KEDA](https://learn.microsoft.com/azure/aks/keda-about), [Horizontal Pod Autoscaler](https://learn.microsoft.com/azure/aks/concepts-scale#horizontal-pod-autoscaler), or [Vertical Pod Autoscaler](https://learn.microsoft.com/en-us/azure/aks/vertical-pod-autoscaler). For more on this, check out our [AKS Scalers Deep Dive on Youtube](https://www.youtube.com/watch?v=oILHg5hsZQ0).

## Node auto provisioning vs. cluster autoscaler
Copy link
Contributor

@kevinkrp93 kevinkrp93 Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be a couple of examples (i.e scenario based - since it is a blog post) ?

Copy link
Contributor

@kevinkrp93 kevinkrp93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left comments

Copy link
Contributor

@kaarthis kaarthis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL lets add more color, behind the scenes stuff for the power user even for Introductory blog.

![nap-ga-announcement](/assets/images/aks-nap/nap-ga-announcement.jpg)

## What’s New with NAP since Preview Launch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth deep diving into a section on 1 or 2 key important bits from the new announcement in your blog for example: Improved ugprade experience and Observability could be a deep dive topic ?


NAP works at the infrastructure level, and adjusts the quantity and sizes of VMs. NAP can be paired with application-level scalers, which affect CPU/Memory resource allocation and pod replica count, such as [KEDA](https://learn.microsoft.com/azure/aks/keda-about), [Horizontal Pod Autoscaler](https://learn.microsoft.com/azure/aks/concepts-scale#horizontal-pod-autoscaler), or [Vertical Pod Autoscaler](https://learn.microsoft.com/en-us/azure/aks/vertical-pod-autoscaler). For more on this, check out our [AKS Scalers Deep Dive on Youtube](https://www.youtube.com/watch?v=oILHg5hsZQ0).

## Node auto provisioning vs. cluster autoscaler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a customer when should i be using CAS and when should i be using NAP . Can you give me some real world examples with suggestions.

Cluster autoscaler, the standard Kubernetes autoscaler solution, requires the use of same VM size node pools, and scales pre-existing node pools up or down. NAP works instead at the cluster level, and manages single-instance VMs, also handling the provisioning experience for multiple VM sizes and architecture at once. NAP allows for better bin-packing, cost savings, and performance than cluster autoscaler.

## NAP vs Self-Hosted Karpenter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again could you be more specific on an opinion for us to have on why you should go with Karpenter and when / where NAP is better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the challenges with OSS version.

Karpenter is the OSS project that schedules workloads for efficient compute usage. Our [AKS Karpenter Provider (self-hosted)](https://github.com/Azure/karpenter-provider-azure) makes use of Karpenter on Azure available. Node Auto-provisioning (NAP) is our managed add-on for Karpenter on AKS that manages certain aspects of the Karpenter experience on Azure. NAP is the recommended mode for most users for a few reasons.

NAP manages:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide links to all these sections always assume a reader may not be familiar with some of these things.

NAP manages:

- Node Image upgrades (Linux)
- Kubernetes version upgrades
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give some description on how each of these are managed and potential deep dive into more salient / frequently asked one.


To get started with NAP, you can visit the [Node Auto Provisioning documentation](https://learn.microsoft.com/azure/aks/node-autoprovision). The documentation includes resources, requirements, and all the info needed to enable node auto provisioning in your cluster today.

## Roadmap + Next Steps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like Sean suggested - Blogs are a funnel into how AKS PG are thinking , you can tell the audience why we are prioritizing something first vs the other be more descriptive and provide behind the scenes view - think of power user in mind as they are the audience here.

## Roadmap + Next Steps

We’re continuing to expand the capabilities for NAP for additional feature support and performance. Our upcoming roadmap of feature support includes:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide a little bit more color what are doing first , milestones tentatively , GH roadmap links to these ?


## Get Involved

Contribute to the open-source [Azure Karpenter Provider](https://github.com/Azure/karpenter-provider-azure), which Node Auto Provisioning is based on. The provider features over 37 releases, 1000+ PRs, 14,000 CI runs, and a growing communitiy of contributors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide a link to our AKS me series here and your YT video too . This is a good place for external segway .

- general, add-ons, compute
---

We’re excited to announce the General Availability (GA) of Node Auto Provisioning (NAP) for Azure Kubernetes Service (AKS)!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start / lead with the problem statement for Why NAP came into being - what was the status before this . Who were the customers / personas requesting for this and why .

- Better performance, reliability, error handling (including handling of provisioning failures), and security. Numerous bugfixes and issues resolved.
- [Extensive test coverage](https://github.com/Azure/karpenter-provider-azure/tree/main/test). Eight new E2E test suites, ~100 total scenarios. 95% unit test coverage.
- Github Contributions welcome! (GitHub Codespace-based dev/test environment in 5 min)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a section on how the Preview metrics / adoption has been so far on this.

- Custom CA Certificates
- Windows support
- Private Cluster Support
- and more...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we tell how we are differentiated from the compete briefly here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to AKS / Karpenter roadmap too?


## NAP vs Self-Hosted Karpenter

Karpenter is the OSS project that schedules workloads for efficient compute usage. Our [AKS Karpenter Provider (self-hosted)](https://github.com/Azure/karpenter-provider-azure) makes use of Karpenter on Azure available. Node Auto-provisioning (NAP) is our managed add-on for Karpenter on AKS that manages certain aspects of the Karpenter experience on Azure. NAP is the recommended mode for most users for a few reasons.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"on Azure available" sounds like there was more to be said here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants