Skip to content

Commit 6a9a405

Browse files
committed
Implement the network plugin as a parameter.
1 parent 8dee07d commit 6a9a405

File tree

3 files changed

+62
-20
lines changed

3 files changed

+62
-20
lines changed

Cluster.bicep

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,14 @@ param maxPodsPerNode int = 40
4242
// @description('Optional. The address prefix (CIDR) for the vnet')
4343
// param nodeSubnetPrefix string = '10.100.10.0/24'
4444

45+
@description('Optional. If not set, you must install your own CNI before the cluster will be functional (See README)')
46+
@allowed(['none', 'azure'])
47+
param networkPlugin string = 'none'
48+
49+
@description('Optional. Only takes effect if the networkPlugin is set to "azure". Only the azure dataplane supports Windows containers, but this defaults to cilium.')
50+
@allowed(['cilium', 'azure'])
51+
param networkDataplane string = 'cilium'
52+
4553
@description('Optional. Service CIDR for this cluster. Defaults to our shared service CIDR: 10.100.0.0/16')
4654
param serviceCidr string = '10.100.0.0/16'
4755

@@ -279,6 +287,8 @@ module aks 'modules/managedCluster.bicep' = {
279287
AutoscaleProfile: AutoscaleProfile
280288
maxPodsPerNode: maxPodsPerNode
281289
// logAnalyticsWorkspaceResourceID: logAnalytics.id
290+
networkPlugin: networkPlugin
291+
networkDataplane: networkDataplane
282292
serviceCidr: serviceCidr
283293
podCidr: podCidr
284294
systemNodePoolOption: systemNodePoolOption

README.md

Lines changed: 40 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -60,36 +60,61 @@ flux bootstrap github --owner PoshCode --repository cluster --path=clusters/posh
6060

6161
But if you need to customize workload identity, it can get a bit more complex, but Workload Identity is supported now for access to [Azure DevOps](https://fluxcd.io/flux/components/source/gitrepositories/#azure) and [GitHub](https://fluxcd.io/flux/components/source/gitrepositories/#github), at least.
6262

63-
## CURRENT STATUS WARNING
63+
## ⚠️ CURRENT STATUS _WARNING_ ⚠️
6464

65-
I'm playing with Cilium Gateway API, so I've set the network plugin to "none" so that I can take control of the cilium install.
65+
I am testing with Cilium Gateway API. Gateway API support in Cilium is part of _their_ Service Mesh functionality, and it doesn't seem Azure's AKS team is terribly keen on making it an option to let Cilium take over all of networking, so although they have an "Azure CNI powered by Cilium" it doesn't look like I can get the Gateway API from cilium if I install it with Azure CNI, so in order to use Cilium fully...
6666

67-
Since the Gateway API in Cilium is part of their Service Mesh, it doesn't seem Azure's AKS team is too keen on getting it working out of the box, so I have to install it manually.
67+
**IMPORTANT**: Upgrading Cilium is currently a process that is fraught with peril. _Look, it's my duty as a knight to sample as much peril as I can_. You may feel differently. [Check at least one version of their upgrade guides before you decide to use Cilium](https://docs.cilium.io/en/stable/operations/upgrade/).
6868

69-
NOTE: in order to use the Gateway API, you need to install the Gateway CRDs. That's handled after the cluster install by Flux.
70-
However, Cilium CNI has to be installed _before the nodes can even connect_, so it's basically a multipart install, which I have not automated.
69+
Additionally, you need to use Windows containers, you must use azure networking CNI.
7170

72-
1. Install the cluster with the network plugin set to "none"
73-
2. Install Cilium CNI, and then the nodes will come up.
74-
3. Install the Gateway API CRDs, and then the Gateway API will be available.
75-
4. "Upgrade" cilium to enable the gateway API.
71+
If you set the 'networkPlugin' parameter to 'azure' you'll get Azure CNI powered by Cilium. If you need to use Windows containers, also set the 'networkDataplane' to 'azure' (otherwise, Azure CNI powered by Cilium is clearly the fastest network available out of the box in AKS).
7672

77-
Installing the cilium tools is as simple as downloading the right release from their GitHub release pages and unzipping.
73+
### I have set the network plugin to "none"
74+
75+
NOTE: The easiest way to into cilium is to use the cilium CLI (it actually includes helm, and the helm chart). But to do this, it needs to discover details about your cluster usint the the `az` CLI tool _and the `aks-preview` extension_.
76+
77+
Make sure you have the latest version of those installed, and if you can run the equivalent of this command, the cilium install will work:
78+
79+
```powershell
80+
az aks show --resource-group rg-poshcode --name aks-poshcode
81+
```
82+
83+
### Installling Cilium
84+
85+
Installing the cilium CLI tool locally is as simple as downloading the right release from their GitHub release pages and unzipping.
7886

7987
```PowerShell
8088
Install-GitHubRelease cilium cilium-cli
81-
Install-GitHubRelease cilium hubble
8289
```
8390

84-
And installing it into the AKS cluster is just this, using the same `"rg-$name"` value as the resource group deployment:
91+
Installing cillium into the AKS cluster can be done _part of the way through the Bicep deployment_. With "none" as the network plugin, the nodes won't come up "ready" and the flux deployment will time out. If you run the cilium install while ARM is still trying to install Flux, it will succeed in a single pass.
92+
93+
1. You want to `Import-AzAksCredential` as soon as the cluster shows up in Azure.
94+
2. Try `kubectl get nodes` until it shows your nodes (they won't come up ready, because they won't have a network)
95+
3. Then run the `cilium install` command, using the correct for the resourceGroup name
8596

8697
```PowerShell
87-
cilium install --version 1.17.0 --set azure.resourceGroup="rg-$name" --set kubeProxyReplacement=true --set gatewayAPI.enabled=true
98+
cilium install --version 1.17.0 --set azure.resourceGroup="rg-$name" --set kubeProxyReplacement=true
8899
```
89100

90-
If you want to complete the deployment in a single pass, you have to `Import-AzAksCredential` as soon as the cluster shows up in Azure, and then once `kubectl get nodes` shows all your nodes (they won't come up ready, because they won't have a network), you can run the `cilium install` while Azure is showing the Flux deployment is still running (it won't complete successfully until after cilium is installed, so if you don't run the install, it will fail after the time-out, and you'll have to re-run the deployment).
101+
If you are not fast enough, it is not a big deal -- the deployment will fail after the time-out, but you can just re-run the deployment after you finish the cilium install.
102+
103+
### Configuring the Cilium Gateway API
104+
105+
In order to use the Gateway API, we need to [install the Gateway CRDs](https://gateway-api.sigs.k8s.io/guides/). That's handled (after the cluster install) by Flux. Of course that means that we have to re-configure Cilium _after_ the initial deployment. I haven't automated this part yet (because I didn't want to make the GitOps deployment _depend_ on Cilium), but it's pretty straightforward:
106+
107+
First install the Gateway API CRDs (in my deployment, this is handled by Flux)
108+
109+
```PowerShell
110+
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.2.0/config/crd/standard/gateway.networking.k8s.io_gatewayclasses.yaml
111+
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.2.0/config/crd/standard/gateway.networking.k8s.io_gateways.yaml
112+
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.2.0/config/crd/standard/gateway.networking.k8s.io_httproutes.yaml
113+
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.2.0/config/crd/standard/gateway.networking.k8s.io_referencegrants.yaml
114+
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.2.0/config/crd/standard/gateway.networking.k8s.io_grpcroutes.yaml
115+
```
91116

92-
Once the cluster is up, and you've installed the Gateway API CRDs, you can run the `cilium upgrade` command to enable the Gateway API. I'm _also_ enabling hubble and prometheus:
117+
Then [redeploy the cilium chart, to enable the gateway API](https://docs.cilium.io/en/stable/network/servicemesh/gateway-api/gateway-api/). I'm _also_ enabling hubble and prometheus:
93118

94119
```PowerShell
95120
cilium install --version 1.17.0 --set azure.resourceGroup="rg-$name" `
@@ -102,4 +127,3 @@ cilium install --version 1.17.0 --set azure.resourceGroup="rg-$name" `
102127
--set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,httpV2:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction}"
103128
```
104129

105-
Given it's been more than a year, and Azure's "CNI powered by Cilium" still lists L7 policy enforcement as a limmitation, I still have not tried to use that _and_ cilium gateway, so I should probably go ahead and get the Cilium Helm Chart into my GitOps repo 😒

modules/managedCluster.bicep

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,13 @@ param location string = resourceGroup().location
1010
@description('Optional. Tags for this resource. Defaults to resourceGroup().tags')
1111
param tags object = resourceGroup().tags
1212

13+
@description('Optional. If not set, you must install your own CNI before the cluster will be functional (See README)')
14+
@allowed(['none', 'azure'])
15+
param networkPlugin string = 'none'
16+
17+
@description('Optional. If the networkPlugin is azure, you can specify the dataplane as either cilium or azure. Only azure supports Windows.')
18+
@allowed(['cilium', 'azure'])
19+
param networkDataplane string = 'azure'
1320

1421
@description('Required. The base version of Kubernetes to use')
1522
param kubernetesVersion string
@@ -283,10 +290,11 @@ resource cluster 'Microsoft.ContainerService/managedClusters@2024-10-01' = {
283290
}
284291
networkProfile: {
285292
// Going to try BYOCNI to use Cilium as the Gateway
286-
networkPlugin: 'none'
287-
// networkPluginMode: 'overlay'
288-
// networkDataplane: 'cilium'
289-
// networkPolicy: 'cilium'
293+
networkPlugin: networkPlugin
294+
// networkPlugin: 'azure'
295+
networkPluginMode: networkPlugin == 'none' ? null : 'Overlay'
296+
networkDataplane: networkPlugin == 'none' ? null : networkDataplane
297+
networkPolicy: networkPlugin == 'none' ? null : networkDataplane == 'cilium' ? 'cilium' : 'azure'
290298
outboundType: 'loadBalancer'
291299
// This is the cluster load balancer, not the outbound
292300
loadBalancerSku: 'Standard'

0 commit comments

Comments
 (0)