Skip to content

Commit 34fdba2

Browse files
committed
Adding a GCP Gemma on TPU Example. This uses Project as external reference (#537)
1 parent 8aebe07 commit 34fdba2

File tree

2 files changed

+440
-0
lines changed

2 files changed

+440
-0
lines changed
Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
2+
# GemmaOnTPUServer
3+
4+
A **Platform Administrator** wants to give end users in their organization self-service access to deploy Gemma on TPU in a GKE cluster. The platform administrator creates a kro ResourceGraphDefinition called *gemmaontpuserver.kro.run* that defines the required Kubernetes resources and a CRD called *GemmaOnTPUServer* that exposes only the options they want to be configurable by end users. The ResourceGraphDefinition defines the following resources ([KCC](https://github.com/GoogleCloudPlatform/k8s-config-connector) to provide the mappings from K8s CRDs to Google Cloud APIs):
5+
6+
* GCP Project (external reference)
7+
* IAMServiceAccount
8+
* IAMPolicyMember
9+
* IAMPartialPolicy
10+
* StorageBucket
11+
12+
It also defines these Kubernetes resources that use the GCP resources:
13+
* ServiceAccount (annotation)
14+
* Job
15+
* Deployment
16+
* Service
17+
18+
Everything related to these resources would be hidden from the end user, simplifying their experience.
19+
20+
## End User: GemmaOnTPUServer
21+
22+
The end user creates a `GemmaOnTPUServer` resource something like this:
23+
24+
```yaml
25+
apiVersion: kro.run/v1alpha1
26+
kind: GemmaOnTPUServer
27+
metadata:
28+
name: gemma-tpu
29+
namespace: config-connector
30+
spec:
31+
kaggelSecret: kaggle-credentials
32+
replicas: 1
33+
```
34+
35+
They can then check the status of the applied resource:
36+
37+
```
38+
kubectl get gemmaontpuservers
39+
kubectl get gemmaontpuservers gemma-tpu -n config-connector -o yaml
40+
```
41+
42+
Once done, the user can delete the `GemmaOnTPUServer` instance:
43+
44+
```
45+
kubectl delete gemmaontpuserver gemma-tpu -n config-connector
46+
```
47+
48+
## Administrator
49+
50+
### 1. Set Environment variables
51+
52+
```bash
53+
export PROJECT_ID=k8sai-${USERNAME?}
54+
export REGION=us-central1 # << CHANGE region here
55+
```
56+
57+
### 2. GKE Autopilot Cluster with KCC and KRO
58+
59+
#### Create GKE Cluster
60+
61+
```bash
62+
export CLUSTER_NAME="inference-cluster" # name for the admin cluster
63+
export CHANNEL="rapid" # or "regular"
64+
65+
## Create a cluster with kcc addon
66+
gcloud container clusters create-auto ${CLUSTER_NAME} \
67+
--release-channel ${CHANNEL} \
68+
--location=${REGION}
69+
```
70+
71+
Setup Kubectl to target the cluster
72+
73+
```bash
74+
gcloud container clusters get-credentials ${CLUSTER_NAME} --project ${PROJECT_ID} --location ${REGION}
75+
```
76+
77+
#### Install KCC
78+
79+
Install KCC from manifests
80+
```bash
81+
gcloud storage cp gs://configconnector-operator/latest/release-bundle.tar.gz release-bundle.tar.gz
82+
tar zxvf release-bundle.tar.gz
83+
kubectl apply -f operator-system/autopilot-configconnector-operator.yaml
84+
85+
# wait for the pods to be ready
86+
kubectl wait -n configconnector-operator-system --for=condition=Ready pod --all
87+
```
88+
89+
#### Give KCC permissions to manage GCP project
90+
91+
Create SA and bind with KCC KSA
92+
93+
```bash
94+
# Instructions from here: https://cloud.google.com/config-connector/docs/how-to/install-manually#identity
95+
96+
# Create KCC operator KSA
97+
gcloud iam service-accounts create kcc-operator
98+
99+
# Add GCP iam role bindings and use WI bind with KSA
100+
101+
## project owner role
102+
gcloud projects add-iam-policy-binding ${PROJECT_ID}\
103+
--member="serviceAccount:kcc-operator@${PROJECT_ID}.iam.gserviceaccount.com" \
104+
--role="roles/owner"
105+
106+
## storage admin role
107+
gcloud projects add-iam-policy-binding ${PROJECT_ID}\
108+
--member="serviceAccount:kcc-operator@${PROJECT_ID}.iam.gserviceaccount.com" \
109+
--role="roles/storage.admin"
110+
111+
gcloud iam service-accounts add-iam-policy-binding kcc-operator@${PROJECT_ID}.iam.gserviceaccount.com \
112+
--member="serviceAccount:${PROJECT_ID}.svc.id.goog[cnrm-system/cnrm-controller-manager]" \
113+
--role="roles/iam.workloadIdentityUser"
114+
```
115+
116+
Create the `ConfigConnector` object that sets up the KCC controller
117+
118+
```bash
119+
# from here: https://cloud.google.com/config-connector/docs/how-to/install-manually#addon-configuring
120+
121+
kubectl apply -f - <<EOF
122+
apiVersion: core.cnrm.cloud.google.com/v1beta1
123+
kind: ConfigConnector
124+
metadata:
125+
name: configconnector.core.cnrm.cloud.google.com
126+
spec:
127+
mode: cluster
128+
googleServiceAccount: "kcc-operator@${PROJECT_ID?}.iam.gserviceaccount.com"
129+
stateIntoSpec: Absent
130+
EOF
131+
```
132+
133+
#### Setup Team namespace
134+
135+
Create a namespace for KCC resources
136+
```bash
137+
export NAMESPACE=config-connector # or team-a
138+
# from here: https://cloud.google.com/config-connector/docs/how-to/install-manually#specify
139+
kubectl create namespace ${NAMESPACE}
140+
141+
# associate the gcp project with this namespace
142+
kubectl annotate namespace ${NAMESPACE} cnrm.cloud.google.com/project-id=${PROJECT_ID?}
143+
```
144+
145+
Verify KCC Installation
146+
```bash
147+
# wait for namespace reconcilers to be created
148+
kubectl get pods -n cnrm-system
149+
150+
# wait for namespace reconcilers to be ready
151+
kubectl wait -n cnrm-system --for=condition=Ready pod --all
152+
```
153+
154+
#### Create KCC Project object
155+
156+
Create the `Project` object that is used as an external reference in the RGD.
157+
158+
```bash
159+
export GCP_PROJECT_PARENT_TYPE=`gcloud projects describe ${PROJECT_ID} --format json | jq -r ".parent.type"`
160+
export GCP_PROJECT_PARENT_ID=`gcloud projects describe ${PROJECT_ID} --format json | jq -r ".parent.id"`
161+
162+
parentRefKey=$(if [[ "$GCP_PROJECT_PARENT_TYPE" == "organization" ]]; then echo "organizationRef"; else echo "folderRef"; fi)
163+
164+
kubectl apply -f - <<EOF
165+
apiVersion: resourcemanager.cnrm.cloud.google.com/v1beta1
166+
kind: Project
167+
metadata:
168+
annotations:
169+
cnrm.cloud.google.com/auto-create-network: "false"
170+
name: acquire-namespace-project
171+
namespace: ${NAMESPACE}
172+
spec:
173+
name: ""
174+
resourceID: ${PROJECT_ID}
175+
${parentRefKey}:
176+
external: "${GCP_PROJECT_PARENT_ID}"
177+
EOF
178+
```
179+
180+
#### Install KRO
181+
182+
Install KRO following [instructions here](https://kro.run/docs/getting-started/Installation/)
183+
184+
```bash
185+
export KRO_VERSION=$(curl -sL \
186+
https://api.github.com/repos/kro-run/kro/releases/latest | \
187+
jq -r '.tag_name | ltrimstr("v")'
188+
)
189+
echo $KRO_VERSION
190+
191+
helm install kro oci://ghcr.io/kro-run/kro/kro \
192+
--namespace kro \
193+
--create-namespace \
194+
--version=${KRO_VERSION}
195+
196+
helm -n kro list
197+
198+
kubectl wait -n kro --for=condition=Ready pod --all
199+
```
200+
### 3. Model Registry access
201+
202+
#### Kaggle API access
203+
* **Kaggle Account:** You need a Kaggle account.
204+
* **Accept Gemma License:** You must accept the Gemma model license terms and usage policy on Kaggle for the specific model version you intend to use.
205+
* **Kaggle API Credentials:**
206+
* You will need your Kaggle username and a Kaggle API key.
207+
* To get these, download your `kaggle.json` API token from your Kaggle account page (typically `https://www.kaggle.com/YOUR_USERNAME/account`, navigate to the "API" section, and click "Create New Token").
208+
* The downloaded `kaggle.json` file contains your username and key. You will use these individual values for Kubernetes secret literals.
209+
210+
#### Create Kubernetes Secret for Kaggle
211+
212+
```bash
213+
export KAGGLE_USERNAME=`jq -r .username kaggle.json` #username from kaggle.json
214+
export KAGGLE_KEY=`jq -r .key kaggle.json` #key from kaggle.json
215+
kubectl create secret generic kaggle-secret \
216+
--namespace=${NAMESPACE} \
217+
--from-literal=username=$KAGGLE_USERNAME \
218+
--from-literal=key=$KAGGLE_KEY
219+
```
220+
221+
### 4. Install the KRO RGDs
222+
223+
```bash
224+
225+
kubectl apply -f rgd.yaml
226+
```
227+
228+
Validate the RGD is installed correctly:
229+
230+
```
231+
kubectl get rgd gemmaontpuserver.kro.run
232+
```
233+
234+
## Cleanup
235+
236+
Once all user created instances are deleted, the administrator can choose to deleted the RGD.

0 commit comments

Comments
 (0)