Skip to content

Commit 78caf16

Browse files
Modifying the tpu v6e files to add HdB support
1 parent c17a052 commit 78caf16

File tree

2 files changed

+118
-0
lines changed

2 files changed

+118
-0
lines changed

community/examples/gke-tpu-v6/README.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -248,3 +248,48 @@ Once deployed, the `Lustre` filesystem is available to the cluster as a `Persist
248248
```
249249

250250
The pod will start, and the Managed Lustre filesystem will be available inside the container at `/mnt/lustre`.
251+
252+
### Understanding Hyperdisk Balanced Integration
253+
The blueprint also supports [Hyperdisk Balanced](https://docs.cloud.google.com/compute/docs/disks/hyperdisks), Google Cloud's high-performance, persistent block storage solution.
254+
255+
To enable Hyperdisk Balanced integration, you must make these changes before deploying:
256+
257+
1. Ensure the GKE cluster is configured to support standard Persistent Disks (the Hyperdisk CSI driver runs automatically once enabled). Verify the `gke-tpu-v6-cluster` module setting `enable_persistent_disk_csi: true` is present.
258+
259+
2. Find the section commented `--- HYPERDISK BALANCED ADDITIONS ---`. Uncomment the entire block containing the following two modules:
260+
* `hyperdisk-balanced-setup`: This module creates a **StorageClass** and a **PersistentVolumeClaim (PVC)** that will dynamically provision a Hyperdisk Balanced volume in your cluster.
261+
* `fio-bench-job-hyperdisk`: This job is pre-configured to mount the Hyperdisk volume and run performance tests.
262+
263+
After making these changes, run the `gcluster deploy` command as usual.
264+
265+
#### Testing the Hyperdisk Balanced Mount
266+
267+
1. Connect to your cluster:
268+
269+
```sh
270+
gcloud container clusters get-credentials DEPLOYMENT_NAME --region=REGION --project_id=PROJECT_ID
271+
```
272+
273+
Replace the `DEPLOYMENT_NAME`,`REGION` and `PROJECT_ID` with the ones used in the blueprint.
274+
2. Apply the generated FIO Job manifest, whose path is provided in the final deployment instructions.
275+
276+
```sh
277+
kubectl apply -f <path/to/fio-benchmark.yaml>
278+
```
279+
280+
The job created in the cluster will be named `fio-benchmark`.
281+
282+
3. Monitor the job until it completes and obtain the list of pods:
283+
284+
```bash
285+
kubectl get jobs
286+
kubectl get pods
287+
```
288+
289+
4. View the logs of the completed pod to check the benchmark results:
290+
291+
```bash
292+
kubectl logs <pod-name>
293+
```
294+
295+
The logs of the pod verifies the disk is mounted successfully and performs a mixed I/O test to validate the disk's provisioned performance.

community/examples/gke-tpu-v6/gke-tpu-v6-advanced.yaml

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,6 +170,7 @@ deployment_groups:
170170
configure_workload_identity_sa: true
171171
enable_gcsfuse_csi: true
172172
enable_managed_lustre_csi: true
173+
enable_persistent_disk_csi: true # enable Hyperdisk for the cluster
173174
master_authorized_networks:
174175
- cidr_block: $(vars.authorized_cidr) # Allows your machine to run the kubectl command. Required for multi network setup.
175176
display_name: "kubectl-access-network"
@@ -397,3 +398,75 @@ deployment_groups:
397398
rm -rf /{training,checkpoint}-data/fio-benchmarks-${TAG}
398399
399400
outputs: [instructions]
401+
402+
# # --- HYPERDISK BALANCED ADDITIONS ---
403+
# # To enable Hyperdisk-balanced support please uncomment this hyperdisk-balanced-setup and fio-bench-job-template module.
404+
# # Set up storage class and persistent volume claim for Hyperdisk
405+
# - id: hyperdisk-balanced-setup
406+
# source: modules/file-system/gke-storage
407+
# use: [gke-tpu-v6-cluster]
408+
# settings:
409+
# storage_type: Hyperdisk-balanced
410+
# access_mode: ReadWriteOnce
411+
# sc_volume_binding_mode: Immediate
412+
# sc_reclaim_policy: Delete
413+
# sc_topology_zones: [$(vars.zone)]
414+
# pvc_count: 1
415+
# capacity_gb: 100
416+
417+
# # This is an example job that will install and run an `fio`benchmark against the hyperdisk volumes.
418+
# # For more FIO tests, see https://cloud.google.com/compute/docs/disks/benchmark-hyperdisk-performance
419+
# - id: fio-bench-job-hyperdisk
420+
# source: modules/compute/gke-job-template
421+
# use:
422+
# - gke-tpu-v6-pool
423+
# - hyperdisk-balanced-setup
424+
# settings:
425+
# name: fio-benchmark
426+
# image: ubuntu:latest
427+
# security_context: # to make sure the job have enough access to install the fio packages
428+
# - key: runAsUser
429+
# value: 0
430+
# - key: runAsGroup
431+
# value: 100
432+
# - key: fsGroup
433+
# value: 100
434+
# command:
435+
# - bash
436+
# - -c
437+
# - |
438+
439+
# set -eux
440+
441+
# cleanup() {
442+
# # This function will be called on script exit
443+
# if [ -n "${TAG:-}" ]; then
444+
# echo "--- Cleaning up temporary directories for tag ${TAG} ---"
445+
# rm -rf "/data/hyperdisk-balanced-pvc-0/fio-benchmarks-${TAG}"
446+
# fi
447+
# }
448+
# trap cleanup EXIT
449+
450+
# export DEBIAN_FRONTEND=noninteractive
451+
452+
# # Install fio
453+
# apt update -y && apt install -y fio
454+
455+
# # Use a tag to create a unique path for tests
456+
# TAG=`date +%s`
457+
458+
# # Verify mountpoints
459+
# df -h
460+
# mountpoint /data/hyperdisk-balanced-pvc-0
461+
462+
# # Create temporary directory for fio benchmarks
463+
# mkdir -p "/data/hyperdisk-balanced-pvc-0/fio-benchmarks-${TAG}"
464+
465+
# # Perform hyperdisk balanced performance (Mixed IOPS) test
466+
# fio --name=hyperdisk-balanced-iops --ioengine=libaio --iodepth=256 --rw=randrw \
467+
# --bs=4k --direct=1 --size=10G --numjobs=16 --group_reporting --time_based --runtime=300s \
468+
# --ramp_time=10s --iodepth_batch_submit=256 --iodepth_batch_complete_max=256 \
469+
# --directory="/data/hyperdisk-balanced-pvc-0/fio-benchmarks-${TAG}" --filename_format=fiotest-balanced-iops
470+
# node_count: 1
471+
472+
# outputs: [instructions]

0 commit comments

Comments
 (0)