Skip to content

Commit 39ed040

Browse files
authored
update tests (#111)
Signed-off-by: Dmitry Shmulevich <[email protected]>
1 parent bf6f029 commit 39ed040

File tree

10 files changed

+18
-27
lines changed

10 files changed

+18
-27
lines changed

docs/examples/kueue/kueue.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
Install `kueue` by following these [instructions](https://kueue.sigs.k8s.io/docs/installation/):
44

55
```bash
6-
KUEUE_VERSION=v0.8.0
6+
KUEUE_VERSION=v0.9.0
77
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/${KUEUE_VERSION}/manifests.yaml
88

99
kubectl apply -f charts/overrides/kueue/priority.yaml

resources/benchmarks/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ To run the benchmark test for Run:ai
3636

3737
## Scaling Benchmark Test
3838

39-
The scaling benchmark workflow operates on 500 virtual GPU nodes with tho workflows. The first [workflow](scaling/workflows/run-test-multi.yaml) submits is a job with 500 replicas, the second [workflow](scaling/workflows/run-test-single.yaml) submits a batch of 500 single-node jobs.
39+
The scaling benchmark workflow operates on 700 virtual GPU nodes with tho workflows. The first [workflow](scaling/workflows/run-test-multi.yaml) submits is a job with 700 replicas, the second [workflow](scaling/workflows/run-test-single.yaml) submits a batch of 700 single-node jobs.
4040

4141
### Example
4242

resources/benchmarks/nwtopo/templates/runai/mpijob.yaml

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -43,15 +43,6 @@ spec:
4343
affinity:
4444
podAffinity:
4545
preferredDuringSchedulingIgnoredDuringExecution:
46-
- weight: 20
47-
podAffinityTerm:
48-
labelSelector:
49-
matchExpressions:
50-
- key: app
51-
operator: In
52-
values:
53-
- {{._NAME_}}
54-
topologyKey: net-layer-3
5546
- weight: 70
5647
podAffinityTerm:
5748
labelSelector:

resources/benchmarks/scaling/workflows/config-nodes.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,14 @@
1313
# limitations under the License.
1414

1515
name: config-nodes
16-
description: create 500 virtual GPU nodes
16+
description: create 700 virtual GPU nodes
1717
tasks:
1818
- id: configure
1919
type: Configure
2020
params:
2121
nodes:
2222
- type: dgxa100.80g
23-
count: 500
23+
count: 700
2424
labels:
2525
nvidia.com/gpu.count: "8"
2626
timeout: 5m

resources/benchmarks/scaling/workflows/config-yunikorn.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,5 +40,5 @@ tasks:
4040
submitacl: '*'
4141
resources:
4242
max:
43-
{memory: 360Gi, vcore: 50000m, nvidia.com/gpu: 4000}
43+
{memory: 360Gi, vcore: 70000m, nvidia.com/gpu: 5600}
4444
timeout: 1m

resources/benchmarks/scaling/workflows/run-test-multi.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,13 @@
1313
# limitations under the License.
1414

1515
name: test-scaling-multi-node-job
16-
description: deploy a 500-replicas job
16+
description: deploy a 700-replicas job
1717
tasks:
1818
- id: job
1919
type: SubmitObj
2020
params:
2121
refTaskId: register
2222
count: 1
2323
params:
24-
replicas: 500
25-
ttl: 2m
24+
replicas: 700
25+
ttl: 5m

resources/benchmarks/scaling/workflows/run-test-single.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,13 @@
1313
# limitations under the License.
1414

1515
name: test-scaling-single-node-jobs
16-
description: deploy 500 single-replica jobs
16+
description: deploy 700 single-replica jobs
1717
tasks:
1818
- id: job
1919
type: SubmitObj
2020
params:
2121
refTaskId: register
22-
count: 500
22+
count: 700
2323
params:
2424
replicas: 1
25-
ttl: 2m
25+
ttl: 5m

resources/benchmarks/scaling/workflows/runai-test-multi.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,13 @@
1313
# limitations under the License.
1414

1515
name: test-scaling
16-
description: deploy a 500-replicas job
16+
description: deploy a 700-replicas job
1717
tasks:
1818
- id: job
1919
type: SubmitObj
2020
params:
2121
refTaskId: register-mpi
2222
count: 1
2323
params:
24-
workers: 499
25-
ttl: 2m
24+
workers: 699
25+
ttl: 5m

resources/benchmarks/scaling/workflows/runai-test-single.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,12 @@
1313
# limitations under the License.
1414

1515
name: test-scaling
16-
description: deploy 500 single-replica jobs
16+
description: deploy 700 single-replica jobs
1717
tasks:
1818
- id: job
1919
type: SubmitObj
2020
params:
2121
refTaskId: register-trainingworkload
22-
count: 500
22+
count: 700
2323
params:
24-
ttl: 2m
24+
ttl: 5m

scripts/env.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ function deploy_jobset() {
131131
}
132132

133133
# https://github.com/kubernetes-sigs/kueue
134-
KUEUE_VERSION=v0.8.1
134+
KUEUE_VERSION=v0.9.0
135135

136136
function deploy_kueue() {
137137
printGreen Deploying kueue

0 commit comments

Comments
 (0)