Skip to content

chloecrozier/runai_on_brev

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Run:ai on Brev

Ansible playbooks to deploy self-hosted Run:ai on a GPU-accelerated Kubernetes cluster on Brev instances.

Instance Selection

You need at least 24 total CPUs across your cluster. Example configurations:

Setup Control Plane Workers Total CPUs
3 nodes 1x 8 CPU 2x 8 CPU 24
2 nodes 1x 12 CPU 1x 12 CPU 24
Minimal 1x 8 CPU 2x 8 CPU (GPU) 24

Recommended: Use the same cloud platform and region for all nodes to ensure network compatibility.

Network Setup

Open these ports in Brev UI before setup:

Node Ports
Control Plane 6443, 80, 443
Workers 10250

Quick Start

1. Install Dependencies (on Control Plane)

ssh ubuntu@<control-plane-ip>
sudo apt update && sudo apt install -y git ansible python3-pip
sudo snap install yq

2. Clone Repository

git clone https://github.com/chloecrozier/runai_on_brev.git
cd runai_on_brev/

3. Configure

cp config.yaml.example config.yaml
nano config.yaml

Fill in these required values:

Tip: Run brev shell <instance-name> then hostname -I to get internal IPs.

all:
  hosts:
    runai-control-plane:
      ansible_host: ""      # Control plane internal IP
      external_ip: ""       # Control plane external IP listed in the Brev UI
      node_role: control
    runai-worker-01:
      ansible_host: ""      # Worker internal IP
      node_role: worker
    runai-worker-02:
      ansible_host: ""      # Worker internal IP
      node_role: worker
  vars:
    runai_jfrog_token: ""   # Your JFrog token

4. Deploy

ansible-playbook -i config.yaml deployment/bring_up_cluster.yaml

5. Access Run:ai UI

On your local computer, add the hosts entry:

sudo sed -i '' '/runai.local/d' /etc/hosts && sudo bash -c 'echo "<EXTERNAL_IP> runai.local" >> /etc/hosts'

Then visit: https://runai.local

Default credentials: [email protected] / Abcd!234

6. Register the Cluster

In the Run:ai UI:

  1. Go to ClustersNew Cluster
  2. Enter a cluster name
  3. Select "Run:ai control plane is on the same cluster"
  4. Enter cluster URL: https://runai.local
  5. Copy the Helm command and add --set global.customCA.enabled=true:
helm upgrade -i runai-cluster runai/runai-cluster -n runai \
  --set controlPlane.url=runai.local \
  --set controlPlane.clientSecret=<FROM_UI> \
  --set cluster.uid=<FROM_UI> \
  --set cluster.url=runai.local \
  --version="<FROM_UI>" \
  --create-namespace \
  --set global.customCA.enabled=true  # <-- ADD THIS LINE
  1. Watch pods come up:
watch -n 2 'kubectl get pods -n runai'

Adding More Workers

  1. Add to config.yaml:

    runai-worker-03:
      ansible_host: <new-ip>
      node_role: worker
  2. Generate a fresh join token:

    kubeadm token create --print-join-command | sudo tee /root/kubeadm_join.sh
  3. Run the deployment for just the new worker:

    ansible-playbook -i config.yaml deployment/bring_up_cluster.yaml --limit runai-worker-03
  4. Restart Run:ai to detect new GPUs:

    kubectl rollout restart deployment -n runai runai-agent
    kubectl rollout restart deployment -n runai metrics-exporter

About

Ansible playbooks to deploy Run:AI on a GPU-accelerated kubernetes cluster on Brev instances

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published