diff --git a/README.md b/README.md index cf9a60ef..4b75e349 100644 --- a/README.md +++ b/README.md @@ -383,6 +383,49 @@ Finally, you can run the following to cleanup your environment and delete the ./demo/delete-cluster.sh ``` +## Installing the example driver on a GKE cluster +It is also possible to run the example driver on a GKE cluster. For this, we +will use the pre-built image for the kubelet plugin, so there is no need +to build anything. All that is needed is a Google Cloud Platform account, +the gcloud CLI and Helm. + +To keep things simple and identical to the Kind example, we will use a +single-node GKE cluster. + +CDI must be enabled in containerd for the DRA driver to work. CDI is +enabled by default in GKE since 1.32.1-gke.1489001, so we will create +a cluster in the rapid channel to make sure we get a recent version. + +Since DRA is still a beta feature, we need to explicitely enable it +when the cluster is created. + +First, create a GKE cluster with gcloud. +```bash +gcloud container clusters create dra-example-driver-cluster \ +--location=us-central1-c \ +--release-channel=rapid \ +--num-nodes=1 \ +--enable-kubernetes-unstable-apis=resource.k8s.io/v1beta1/deviceclasses,resource.k8s.io/v1beta1/resourceclaims,resource.k8s.io/v1beta1/resourceclaimtemplates,resource.k8s.io/v1beta1/resourceslices +``` + +Once the cluster is ready, we can install the DRA using Helm. + +The kubelet plugin in the example driver is set up to run with priority class +`system-node-critical`. On GKE, pods are by default restricted from running +with this priority class, so we need to use a ResourceQuota to allow it. The +Helm chart supports, this, we just have to enable it. + +```bash +helm upgrade -i \ + --create-namespace \ + --namespace dra-example-driver \ + --set=resourcequota.enabled=true \ + dra-example-driver \ + deployments/helm/dra-example-driver +``` + +The examples in `demo/gpu-test{1,2,3,4,5}.yaml` works just like with Kind. + ## Anatomy of a DRA resource driver TBD diff --git a/deployments/helm/dra-example-driver/Chart.yaml b/deployments/helm/dra-example-driver/Chart.yaml index eb1baf6e..ec554ffb 100644 --- a/deployments/helm/dra-example-driver/Chart.yaml +++ b/deployments/helm/dra-example-driver/Chart.yaml @@ -25,4 +25,6 @@ version: 0.0.0-dev # It is recommended to use it with quotes. appVersion: "v0.1.0" -kubeVersion: "1.32.x" +# The "-0" suffix is to make sure the chart works on GKE clusters, which uses versions on +# the format 1.32.1-gke.1234567. +kubeVersion: "1.32.x-0" diff --git a/deployments/helm/dra-example-driver/templates/resourcequota.yaml b/deployments/helm/dra-example-driver/templates/resourcequota.yaml new file mode 100644 index 00000000..0153f338 --- /dev/null +++ b/deployments/helm/dra-example-driver/templates/resourcequota.yaml @@ -0,0 +1,15 @@ +{{- if .Values.resourcequota.enabled }} +apiVersion: v1 +kind: ResourceQuota +metadata: + name: {{ include "dra-example-driver.fullname" . }}-resourcequota + namespace: {{ include "dra-example-driver.namespace" . }} +spec: + hard: + pods: {{ .Values.resourcequota.pods }} + {{- with .Values.resourcequota.scopeSelector.matchExpressions }} + scopeSelector: + matchExpressions: + {{- toYaml . | nindent 4 }} + {{- end }} +{{- end }} diff --git a/deployments/helm/dra-example-driver/values.yaml b/deployments/helm/dra-example-driver/values.yaml index 9e12a810..0faaca54 100644 --- a/deployments/helm/dra-example-driver/values.yaml +++ b/deployments/helm/dra-example-driver/values.yaml @@ -87,3 +87,14 @@ webhook: # The name of the service account to use. # If not set and create is true, a name is generated using the fullname template name: "" + +resourcequota: + enabled: false + pods: 10 + scopeSelector: + matchExpressions: + - operator: In + scopeName: PriorityClass + values: + - system-node-critical + - system-cluster-critical