This automation deploys a full environment on Oracle Kubernetes Engine (OKE) to demonstrate data locality between Apache Cassandra and Apache Spark using pod affinity and node labeling. The setup ensures Spark reads data from colocated Cassandra pods, reducing cross-node traffic.
Using Terraform, the stack provisions:
Network Module
- VCN (Virtual Cloud Network) (unless using an existing one)
- CIDR block configurable via VCN_CIDR
- Internet Gateway, NAT Gateway, Service Gateway
- Subnets
- Public subnet (edge) for the bastion host
- Private subnet for worker nodes
- Route Tables
- Public subnet with route to Internet Gateway
- Private subnet with route to NAT and Service Gateway
- Security lists
OKE Module
- OKE cluster
- Configurable Kubernetes version (e.g. v1.33.1)
- Private control plane by default
- OKE Node Pool
- 3 worker nodes
- Flex shape support (configurable OCPUs and memory)
Bastion Module
- Compute instance
- Publich IP for SSH access
- Automatically installs:
- kubectl, helm, oci cli, python 3.9(via venv)
- Cloud-native tools configured for instance principal auth
- Cloud-init script executes the full demo:
- Installs K8ssandra Operator (v1.7.1)
- Deploys a 2-node Cassandra cluster
- Applies node affinity (spark-locality) to place Cassandra on labeled nodes
- Initializes test data in Cassandra
- Deploys Spark master + 2 workers
- Runs a Spark job that reads from Cassandra and outputs results
- OCI tenancy and a compartment
- Policies for Instance Principal access to:
Allow dynamic-group <your-dynamic-group> to manage cluster-family in compartment <compartment> Allow dynamic-group <your-dynamic-group> to manage virtual-network-family in compartment <compartment>
Option 1: Deploy via OCI Resource Manager
Click below to open the stack in the OCI console:
Follow the guided flow to:
- Select your compartment
- Configure the VCN, cluster name, and node shapes
- Launch the stack
Option 2: Deploy via Terraform CLI
-
Install prerequisites:
-
Prepare your variables file: Rename the profided file terraform.tfvars.template to terraform.tfvars.
Then edit terraform.tfvars and fill in your own values for:
- user_ocid
- fingerprint
- tenancy_ocid
- region
- compartment_ocid
- ssh_provided_public_key
-
Run Terraform:
terraform init
terraform plan
terraform apply
Note: The initial infrastructure provisioning completes within about 15 minutes, but the full setup (via cloud-init on the bastion) takes about 20 minutes to install Helm, deploy Cassandra and Spark, and run the read job.
To monitor the process, SSH into the bastion and run:
tail -f /var/log/oke-automation.log.
After deployment completes:
-
SSH into the bastion (public IP available in OCI Console)
-
Run
kubectl get nodesandkubectl get pods -A -o wideto observe:- 2 Cassandra pods scheduled on 2 labeled nodes
- 2 Spark workers colocated on the same nodes as Cassandra
- The 3rd OKE node remains unused (no Spark/Cassandra workload)
-
Run this to see Spark read output:
kubectl logs job/spark-read-cassandra -n spark
To confirm that the demo is working as expected:
-
VCN Flow Logs
- Enable Flow Logs on the worker subnet (via OCI Console)
- Check the Cassandra pod traffic. You should not see inter-node traffic to the unused node - Spark is reading from Cassandra pods on the same nodes.
-
kubectl output
Check pod placement:
kubectl get pods -A -o wide
kubectl get nodes --show-labels
- Cassandra deployed using K8ssandra Operator v1.7.1
- Data written to PVCs via oci block volume storage class
- Spark reads via Datastax Cassandra Connector with token-aware logic
- Spark job includes:
- Python pyspark script reading from testks.users
- Packaged via a ConfigMap and run as a Job
Before destroying the stack, it's recommended to clean up Kubernetes resources to ensure no pods or CRDs block the node pool or namespace deletion:
# Uninstall Helm releases
helm uninstall k8ssandra-operator -n k8ssandra-operator || true
helm uninstall cert-manager -n cert-manager || true
# Delete namespaces (and wait for resources to terminate)
kubectl delete namespace spark k8ssandra-operator cert-manager --ignore-not-found --wait=true
# Delete CRDs to avoid lingering finalizers
kubectl delete crd k8ssandraclusters.k8ssandra.io --ignore-not-found
Once cleanup completes, you can safely destroy the stack:
terraform destroy
Or use OCI Resource Manager to destroy the stack from the console.