Skip to content

Commit efc6f6f

Browse files
authored
added the documentation for workflow with Windows prefix delegation (#263)
Prefix delegation for Windows nodes is a new feature wherein instead of allocating secondary IPv4 addresses, controller would allocate IPv4 prefixes. This commit adds the documentation related to the same including HLD, troubleshooting guide and various configuration options.
1 parent 6589feb commit efc6f6f

7 files changed

+228
-2
lines changed

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,23 @@ Note: The SecurityGroupPolicy CRD only supports up to 5 security groups per cust
2626

2727
The controller manages the IPv4 Addresses for all the Windows Node in EKS Cluster and allocates IPv4 Address to Windows Pods. The Networking on the host is setup by [amazon-vpc-cni-plugins](https://github.com/aws/amazon-vpc-cni-plugins).
2828

29+
The controller supports the following modes for IPv4 address management on Windows-
30+
- **Secondary IPv4 address mode** → Secondary private IPv4 addresses are assigned to the primary instance ENI and the same are allocated to the Windows pods.
31+
<br/><br/>
32+
For more details about the high level workflow, please visit our documentation [here](docs/windows/secondary_ip_mode_workflow.md).
33+
34+
35+
- **Prefix delegation mode** &rarr; /28 IPv4 prefixes are assigned to the primary instance ENI and the IP addresses from the prefix are allocated to the Windows pods.
36+
<br/><br/>
37+
For more details about the configuration options with *prefix delegation*, please visit our documentation [here](docs/windows/prefix_delegation_config_options.md).
38+
39+
For more details about the high level workflow, please visit our documentation [here](docs/windows/prefix_delegation_hld_workflow.md).
40+
2941
Please follow this [guide](https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html) for enabling Windows Support on your EKS cluster.
3042

43+
## Troubleshooting
44+
For troubleshooting issues related to Security group for pods or Windows IPv4 address management, please visit our troubleshooting guide [here](docs/troubleshooting.md).
45+
3146
## License
3247

3348
This library is licensed under the Apache 2.0 License.
42 KB
Loading
193 KB
Loading

docs/troubleshooting.md

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,33 @@
1+
# Troubleshooting Guide
2+
3+
## Table of Contents
4+
- [Troubleshooting Windows](#troubleshooting-windows)
5+
- [Verify if your EKS Cluster is on the required Platform Version](#verify-if-your-eks-cluster-is-on-the-required-platform-version)
6+
- [Verify Windows IPAM is enabled in the ConfigMap](#verify-windows-ipam-is-enabled-in-the-configmap)
7+
- [Verify Node has the Resource Capacity](#verify-node-has-the-resource-capacity)
8+
- [Verify Pod has the resource limits](#verify-pod-has-the-resource-limit)
9+
- [Verify Pod has the IPv4 Address Annotation](#verify-pod-has-the-ipv4-address-annotation)
10+
- [Look for Issues on the Windows Host](#look-for-issues-on-the-windows-host)
11+
- [Troubleshooting Security Group for Pods](#troubleshooting-security-group-for-pods)
12+
- [Verify ENI Trunking is Enabled](#verify-eni-trunking-is-enabled)
13+
- [Verify Trunk ENI is created](#verify-trunk-eni-is-created)
14+
- [Verify Pod has the resource limit](#verify-pod-has-the-resource-limit)
15+
- [Verify Pod has the pod-eni annotation](#verify-pod-has-the-pod-eni-annotation)
16+
- [Check Issues with VPC CNI](#check-issues-with-vpc-cni)
17+
- [Troubleshooting Prefix Delegation for Windows](#troubleshooting-prefix-delegation-for-windows)
18+
- [Verify Windows prefix delegation is enabled in the ConfigMap](#verify-windows-prefix-delegation-is-enabled-in-the-configmap)
19+
- [Check both pod events and node events for any specific error](#check-both-pod-events-and-node-events-for-any-specific-error)
20+
- [Verify Node has the required Resource Capacity](#verify-node-has-the-required-resource-capacity)
21+
- [Verify Pod has the required resource limits](#verify-pod-has-the-required-resource-limits)
22+
- [Verify Pod has the required IPv4 Address Annotation](#verify-pod-has-the-required-ipv4-address-annotation)
23+
- [Verify the configuration options set for windows prefix delegation](#verify-the-configuration-options-set-for-windows-prefix-delegation)
24+
- [Look for networking issues on the Windows Host](#look-for-networking-issues-on-the-windows-host)
25+
- [List of Common Issues](#list-of-common-issues)
26+
- [PSP Blocking Controller Annotations](#psp-blocking-controller-annotations)
27+
- [Missing IAM Permissions on the Cluster Role](#missing-iam-permissions-on-the-cluster-role)
28+
- [ENI/IP Exhaustion](#eniip-exhaustion)
29+
- [Disable prefix delegation feature for Windows](#disable-prefix-delegation-feature-for-windows)
30+
131
## Troubleshooting Windows
232

333
Please follow the troubleshooting guide in the chronological order to debug issues with Windows Node and Pods.
@@ -227,6 +257,80 @@ If the Pod is still stuck in `ContainerCreating` you can,
227257
- Check the CNI Logs from the collected logs.
228258
- Open an [Issue](https://github.com/aws/amazon-vpc-resource-controller-k8s/issues/new/choose) in this repository if the problem still persists.
229259

260+
## Troubleshooting Prefix Delegation for Windows
261+
Please follow the troubleshooting steps here for issues with Windows Node and Pods when using `prefix delegation` mode.
262+
263+
The following steps should be checked in chronological order to find out any issues with the workflow.
264+
### Verify Windows prefix delegation is enabled in the ConfigMap
265+
266+
To get the ConfigMap and the data field
267+
268+
```bash
269+
kubectl get configmaps -n kube-system amazon-vpc-cni -o custom-columns=":data"
270+
```
271+
272+
You should have the ConfigMap with the following data in the string,
273+
```
274+
enable-windows-ipam:true enable-windows-prefix-delegation:true
275+
```
276+
277+
**Resolution**
278+
279+
If the ConfigMap is missing or doesn't have the above field, you can create or update the `amazon-vpc-cni` ConfigMap with the required fields-
280+
```
281+
enable-windows-ipam: "true"
282+
enable-windows-prefix-delegation: "true"
283+
```
284+
285+
**Note**: Windows IPAM needs to be enabled in order to use windows prefix delegation feature.
286+
287+
### Check both pod events and node events for any specific error
288+
In case the controller encounters any error during it's prefix delegation workflow which needs to be acted upon by the customer, it will emit the errors as pod events and/or node events. Therefore, checking the same can be a good starting point to root cause the issue.
289+
290+
You can obtain the pod events using the following command.
291+
```bash
292+
kubectl get events --all-namespaces
293+
```
294+
295+
In case there is any explicit error, the same needs to be looked into.
296+
297+
For example, if the error states that there are insufficient space in the subnet to carve a /28 prefix, then the subnet needs to be looked into to ensure that /28 ranges are available which can be allocated as prefixes.
298+
299+
### Verify Node has the required Resource Capacity
300+
Same as [Verify Node has the Resource Capacity](#verify-node-has-the-resource-capacity)
301+
302+
### Verify Pod has the required resource limits
303+
Same as [Verify Pod has the resource limits](#verify-pod-has-the-resource-limit)
304+
305+
### Verify Pod has the required IPv4 Address Annotation
306+
Same as [Verify Pod has the IPv4 Address Annotation](#verify-pod-has-the-ipv4-address-annotation)
307+
308+
### Verify the configuration options set for windows prefix delegation
309+
Configuration options can be used to fine-tune the behaviour of prefix delegation on Windows. The details about the options are available [here](windows/prefix_delegation_config_options.md).
310+
311+
To get the ConfigMap and the data field
312+
313+
```bash
314+
kubectl get configmaps -n kube-system amazon-vpc-cni -o custom-columns=":data"
315+
```
316+
317+
If you see any of the following keys in the data-
318+
```
319+
minimum-ip-target
320+
warm-ip-target
321+
warm-prefix-target
322+
```
323+
Then the configuration options have been set.
324+
325+
**Resolution**
326+
327+
Verify if the configuration is correct as mentioned in the [documentation](windows/prefix_delegation_config_options.md).
328+
329+
Alternatively, to isolate the issue, try removing the above keys from the config map.
330+
331+
### Look for networking issues on the Windows Host
332+
Same as [Look for Issues on the Windows Host](#look-for-issues-on-the-windows-host)
333+
230334
## List of Common Issues
231335

232336
### PSP Blocking Controller Annotations
@@ -265,3 +369,21 @@ aws ec2 describe-subnets --subnet-id subnet-id-here
265369
```
266370

267371
From the response you can look for how many IPv4 address are available in the Subnet from the field `AvailableIpAddressCount`
372+
373+
### Disable prefix delegation feature for Windows
374+
375+
You should check if the feature is enabled via ConfigMap. To get the ConfigMap and the data field
376+
377+
```bash
378+
kubectl get configmaps -n kube-system amazon-vpc-cni -o custom-columns=":data"
379+
```
380+
381+
If have the ConfigMap with the following data in the string,
382+
```
383+
enable-windows-prefix-delegation:true
384+
```
385+
then the feature is enabled.
386+
387+
**Resolution**
388+
389+
You can disable the feature by editing your config map and setting `enable-windows-prefix-delegation` as `"false"`.
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Configuration options with Prefix Delegation mode on Windows
2+
3+
We provide multiple configurations which allow you to fine tune the pre-scaling and dynamic scaling behaviour. These configuration options can be set in the `amazon-vpc-cni` config map.
4+
5+
* **warm-ip-target** &rarr; The number of IP addresses to be allocated in excess of current need. When used with prefix delegation, the controller allocates a new prefix to the ENI if the number of free IP addresses from the existing prefixes is less than this value on the node.
6+
7+
For example, consider that we set warm-ip-target to 15. Initially when the node starts, the ENI has 1 prefix i.e. 16 IP addresses allocated to it. When we launch 2 pods, then the number of available IP addresses becomes 14 and therefore, a new prefix will be allocated to the ENI, which brings the total count of available IP addresses to 30.
8+
9+
10+
* **warm-prefix-target** &rarr; The number of prefixes to be allocated in excess of current need. This will cause a new prefix (/28) to be allocated even if a single IP from the existing prefix is used. Therefore, use this configuration only if needed after careful consideration. A good use case is a scenario where there can be sudden spikes leading to scheduling of substantially high number of pods.
11+
12+
For example, consider that we set warm-prefix-target to 2. Initially when the node starts, 2 prefixes will be allocated to the ENI. Since there won’t be any running pods, the current need would be 0 and therefore, both the prefixes would be unused. If we run even a single pod then the current need would be 1 IP address which would come from 1 prefix. Therefore, only 1 prefix would be in excess of the current need. This would lead to 1 more prefix being allocated to the ENI bringing the total count of prefixes on the ENI to 3.
13+
14+
15+
* **minimum-ip-target** &rarr; The minimum number of IP addresses to be available at any time. This behaves identically to warm-ip-target except that instead of setting a target number of free IP addresses to keep available at all times, it sets a target number for a floor on how many total IP addresses are allocated.
16+
17+
For example, consider that we set minimum-ip-target to 20. This means that the total number of IP addresses (free and allocated to pods) should be at least 20. Therefore, even before the pods are scheduled, there should be at least 20 IP addresses available. Since 1 prefix has 16 IP addresses, the controller would allocate 2 prefixes bringing the total count of available IP address on the node to 32 which is greater than the set value of 20.
18+
19+
### Considerations while using the above configuration options
20+
- These configuration options work only with the prefix delegation mode.
21+
- The settings for these values would depend upon your use case. If set, `warm-ip-target` and/or `minimum-ip-target` will take precedence over `warm-prefix-target`.
22+
- The default values used by the controller when these configurations have not been explicitly specified are-
23+
```
24+
warm-ip-target: "1"
25+
minimum-ip-target: "3"
26+
```
27+
- Setting either `warm-prefix-target` or both `warm-ip-target` and `minimum-ip-target` to zero/negative values is not supported. In such cases, default values as above will be used by the controller.
28+
- If only `minimum-ip-target` is set, `warm-ip-target` defaults to 1. If only `warm-ip-target` is set, `minimum-ip-target` defaults to 3.
29+
- If the values of `warm-prefix-target`, `warm-ip-target` or `minimum-ip-target` are set such that the max node IPv4 capacity is exceeded, then the maximum allocated IP addresses would be limited to the max node IPv4 capacity. For example, if a node has 14 secondary IP slots and we set `warm-prefix-target` to 20, then only 14 prefixes will be allocated on node startup.
30+
31+
### Examples
32+
The following table shows the various parameters when the configuration options for prefix delegation are set. You can fine-tune these values as per your expected workload.
33+
34+
|warm-prefix-target|warm-ip-target|minimum-ip-target|Pods|Attached Prefixes|Pod per Prefixes|Unused Prefixes|Unused IPs|
35+
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
36+
|-|-|-|0|1|0|1|16|
37+
|-|-|-|15|1|15|0|1|
38+
|-|-|-|16|2|16,0|1|16|
39+
|1|-|-|0|1|0|1|16|
40+
|3|-|-|0|3|0|3|48|
41+
|1|-|-|5|2|5|1|27|
42+
|1|-|-|17|3|16,1|1|31|
43+
|-|-|1|0|1|0|1|16|
44+
|-|-|1|5|1|5|0|11|
45+
|-|-|1|15|1|15|0|1|
46+
|-|-|1|16|2|16,0|1|16|
47+
|-|5|-|0|1|0|1|16|
48+
|-|5|-|5|1|5|0|11|
49+
|-|5|-|11|1|11|0|5|
50+
|-|5|-|12|2|12,0|1|20|
51+
|-|1|1|0|1|0|1|16|
52+
|-|1|1|5|1|5|0|11|
53+
|-|1|1|17|2|16,1|0|15|
54+
|-|3|1|14|2|14,0|1|18|
55+
|-|7|20|0|2|0,0|2|32|
56+
|-|7|20|5|2|5,0|1|27|
57+
|-|7|20|17|2|16,1|0|15|
58+
|-|7|20|26|3|16,10,0|1|22|
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Windows Event Workflows in IPv4 Prefix Delegation mode
2+
This document presents high level workflow diagram for Events associated with Windows Nodes and Pods when using the IPv4 prefix delegation mode.
3+
4+
## Adding a Windows Node to the Cluster
5+
6+
<img alt="New Windows Node Create Event Diagram" height="50%" src="../images/windows-prefix-delegation-node-create-events.jpg" width="50%"/>
7+
8+
1. Controller watches for Node Event from the Kube API server.
9+
2. User Adds a Windows Node to the Cluster with the label `kubernetes.io/os: windows`.
10+
3. Resource controller would start managing an IP address warm pool for the Windows node. It would invoke EC2 APIs on behalf of the customer to allocate /28 prefixes to the primary ENI. Internally, it would deconstruct the prefix into IP addresses and the pods would later be assigned one of the IP address from the prefix range.
11+
12+
In order to reduce latency after pod creation, controller would warm up a prefix beforehand. Customer can control the pre-scaling/warm settings using configuration options as specified [here](prefix_delegation_config_options.md).
13+
4. Controller updates the resource capacity on this node to `vpc.amazonaws.com/PrivateIPv4Address: # (Secondary IP per interface -1)*16`. This limits the Number of Windows Pod that can be scheduled on Windows Node based on the number of available IPv4 addresses.
14+
15+
## Creating a new Windows Pod
16+
17+
<img alt="New Windows Pod Create Event Diagram" height="50%" src="../images/windows-prefix-delegation-pod-create-events.jpg" width="50%"/>
18+
19+
1. User Creates a new Windows Pod with the nodeSelector `kubernetes.io/os: windows`.
20+
2. Webhook mutates the Create Pod request by adding the following resource limit and capacity `vpc.amazonaws.com/PrivateIPv4Address: 1`. This tells the scheduler that the Pod has to be scheduled on a Node with 1 available IPv4 Address.
21+
3. Controller receives the Pod Create event and allocates a IPv4 address from the Prefix Warm Pool. The IP address assigned to the pod would be from the range of one of the prefixes assigned to the primary ENI on the node.
22+
23+
It is worthwhile to note that the controller would assign the IP address to the pods such that the prefix with the fewest remaining IP addresses would be consumed first. This means that if there are 2 prefixes on the node such that 10 IP addresses from the second prefix are yet to be allocated and 5 from the first, then newer pods will be allocated the IP addresses from the first prefix while it has unassigned IP addresses.
24+
25+
4. Controller annotates the Pod with `vpc.amazonaws.com/PrivateIPv4Address: IPv4 Address`.
26+
5. VPC CNI Plugin Binary on the Windows host reads the IPv4 address present in the annotation from API Server and sets up the Networking for the Pod
27+
28+
29+
## Delete events
30+
31+
When the pods are terminated, the IP addresses are released back into the warmpool. If the available IP addresses in the warmpool are greater than the required number, then controller will release the free prefixes. This essentially means that a prefix is released back only if all the IP addresses from its range are unallocated from the pods.

docs/windows/workflow.md renamed to docs/windows/secondary_ip_mode_workflow.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
# Windows Event Workflows
2-
This document presents high level workflow diagram for Events associated with Windows Nodes and Pods.
1+
# Windows Event Workflows in Secondary IPv4 address mode
2+
This document presents high level workflow diagram for Events associated with Windows Nodes and Pods when using the secondary IPv4 address mode.
33

44
## Adding a Windows Node to the Cluster
55
![New Windows Node Create Event Diagram](../images/windows-node-create-event.png)

0 commit comments

Comments
 (0)