Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 2 additions & 14 deletions resources/benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,28 +29,16 @@ To run the benchmark test for Kueue:
./scripts/benchmarks/gang-scheduling/run-kueue.sh
```

To run the benchmark test for Run:ai

```bash
./scripts/benchmarks/gang-scheduling/run-runai.sh
```

## Scaling Benchmark Test

The scaling benchmark workflow operates on 700 virtual GPU nodes with tho workflows. The first [workflow](scaling/workflows/run-test-multi.yaml) submits is a job with 700 replicas, the second [workflow](scaling/workflows/run-test-single.yaml) submits a batch of 700 single-node jobs.
The scaling benchmark workflow operates on 700 virtual GPU nodes. The [workflow](scaling/workflows/run-test.yaml) submits a batch of 700 single-node jobs.

### Example

To run the benchmark test for Volcano:

```bash
./bin/knavigator -workflow 'resources/benchmarks/scaling/workflows/{config-nodes.yaml,config-volcano.yaml,run-test-multi.yaml}'
```

To run the benchmark test for Run:ai

```bash
./bin/knavigator -workflow 'resources/benchmarks/scaling/workflows/{config-nodes.yaml,config-runai.yaml,runai-test-single.yaml}'
./scripts/benchmarks/scaling/run-volcano.sh
```

## Network Topology Benchmark Test
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,6 @@ tasks:
- name: sandbox
submitacl: '*'
resources:
max:
{memory: 36Gi, vcore: 8000m, nvidia.com/gpu: 256}
guaranteed: {memory: 36Gi, vcore: 8000m, nvidia.com/gpu: 256}
max: {memory: 36Gi, vcore: 8000m, nvidia.com/gpu: 256}
timeout: 1m
2 changes: 1 addition & 1 deletion resources/benchmarks/scaling/workflows/config-runai.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
name: config-runai
description: register, deploy and configure run:ai custom resources
tasks:
- id: register-trainingworkload
- id: register
type: RegisterObj
params:
template: "resources/benchmarks/templates/runai/trainingworkload.yaml"
Expand Down
25 changes: 0 additions & 25 deletions resources/benchmarks/scaling/workflows/run-test-multi.yaml

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

name: test-scaling-single-node-jobs
name: test-scaling
description: deploy 700 single-replica jobs
tasks:
- id: job
Expand Down
25 changes: 0 additions & 25 deletions resources/benchmarks/scaling/workflows/runai-test-multi.yaml

This file was deleted.

24 changes: 0 additions & 24 deletions resources/benchmarks/scaling/workflows/runai-test-single.yaml

This file was deleted.

2 changes: 1 addition & 1 deletion scripts/benchmarks/scaling/run-kai.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@ set -e

REPO_HOME=$(readlink -f $(dirname $(readlink -f "$0"))/../../../)

$REPO_HOME/bin/knavigator -workflow "$REPO_HOME/resources/benchmarks/scaling/workflows/{config-nodes.yaml,config-kai.yaml,run-test-single.yaml}"
$REPO_HOME/bin/knavigator -workflow "$REPO_HOME/resources/benchmarks/scaling/workflows/{config-nodes.yaml,config-kai.yaml,run-test.yaml}"
2 changes: 1 addition & 1 deletion scripts/benchmarks/scaling/run-kueue.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@ set -e

REPO_HOME=$(readlink -f $(dirname $(readlink -f "$0"))/../../../)

$REPO_HOME/bin/knavigator -workflow "$REPO_HOME/resources/benchmarks/scaling/workflows/{config-nodes.yaml,config-kueue.yaml,run-test-single.yaml}"
$REPO_HOME/bin/knavigator -workflow "$REPO_HOME/resources/benchmarks/scaling/workflows/{config-nodes.yaml,config-kueue.yaml,run-test.yaml}"
2 changes: 1 addition & 1 deletion scripts/benchmarks/scaling/run-volcano.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@ set -e

REPO_HOME=$(readlink -f $(dirname $(readlink -f "$0"))/../../../)

$REPO_HOME/bin/knavigator -workflow "$REPO_HOME/resources/benchmarks/scaling/workflows/{config-nodes.yaml,config-volcano.yaml,run-test-single.yaml}"
$REPO_HOME/bin/knavigator -workflow "$REPO_HOME/resources/benchmarks/scaling/workflows/{config-nodes.yaml,config-volcano.yaml,run-test.yaml}"
2 changes: 1 addition & 1 deletion scripts/benchmarks/scaling/run-yunikorn.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@ set -e

REPO_HOME=$(readlink -f $(dirname $(readlink -f "$0"))/../../../)

$REPO_HOME/bin/knavigator -workflow "$REPO_HOME/resources/benchmarks/scaling/workflows/{config-nodes.yaml,config-yunikorn.yaml,run-test-single.yaml}"
$REPO_HOME/bin/knavigator -workflow "$REPO_HOME/resources/benchmarks/scaling/workflows/{config-nodes.yaml,config-yunikorn.yaml,run-test.yaml}"