-
Notifications
You must be signed in to change notification settings - Fork 124
[ci] Refactor CI files to use new CI pipeline generator #246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| name: vllm_omni_ci | ||
| github_repo_name: vllm-project/vllm-omni | ||
| job_dirs: | ||
| - ".buildkite/jobs" | ||
| run_all_patterns: [] | ||
| run_all_exclude_patterns: [] | ||
| registries: public.ecr.aws/q9t5s3a7 | ||
| repositories: | ||
| main: "vllm-ci-postmerge-repo" | ||
| premerge: "vllm-ci-test-repo" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| group: Build | ||
| steps: | ||
| - label: ":docker: build vllm-omni-ci image" | ||
| key: image-build | ||
| depends_on: [] | ||
| commands: | ||
| - "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7" | ||
| - "docker build --file docker/Dockerfile.ci -t vllm-omni-ci ." | ||
| - "docker tag vllm-omni-ci public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT" | ||
| - "docker push public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| group: Tests | ||
| depends_on: | ||
| - image-build | ||
| steps: | ||
| - label: "Simple Unit Test" | ||
| commands: | ||
| - ".buildkite/scripts/simple_test.sh" | ||
| no_gpu: true | ||
| no_plugin: true | ||
|
|
||
| - label: "Diffusion Model Test" | ||
| timeout_in_minutes: 15 | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the timeout applied per label or per command?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's per job/label |
||
| commands: | ||
| - pytest -s -v tests/single_stage/test_diffusion_model.py | ||
|
Comment on lines
+11
to
+14
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The GPU test steps are now plain command invocations without the docker or Kubernetes plugins that previously ran them inside Useful? React with 👍 / 👎. |
||
|
|
||
| - label: "Omni Model Test" | ||
| timeout_in_minutes: 15 | ||
| num_gpus: 4 | ||
| commands: | ||
| - export VLLM_LOGGING_LEVEL=DEBUG | ||
| - export VLLM_WORKER_MULTIPROC_METHOD=spawn | ||
| - pytest -s -v tests/multi_stages/ | ||
|
|
||
| - label: "Omni Model Test with H100" | ||
| timeout_in_minutes: 20 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We used to set the timeout to 15 minutes. @ywang96 Do you agree we set 20 minutes for testing on H100?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The timeout is already 20 minutes on main branch https://github.com/vllm-project/vllm-omni/blob/main/.buildkite/pipeline.yml#L55
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Oops, my mistake! Thanks for catching that. |
||
| gpu: h100 | ||
| num_gpus: 2 | ||
| commands: | ||
| - export VLLM_WORKER_MULTIPROC_METHOD=spawn | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we also need to set the logging level here, align with the Omni Model Test? |
||
| - pytest -s -v tests/multi_stages_h100/ | ||
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this empty file mandatory for the Buildkite test?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya it's used as an indicator whether a branch has the new refactored changes or not, to route CI bootstrap step to use the correct pipeline generator. The new pipeline generator wouldn't work with the old yaml file, and vice versa.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the elaboration, very clear.