Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added .buildkite/.pipeline_gen_v2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this empty file mandatory for the Buildkite test?

Copy link
Collaborator Author

@khluu khluu Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya it's used as an indicator whether a branch has the new refactored changes or not, to route CI bootstrap step to use the correct pipeline generator. The new pipeline generator wouldn't work with the old yaml file, and vice versa.

Copy link
Contributor

@congw729 congw729 Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya it's used as an indicator whether a branch has the new refactored changes or not, to route CI bootstrap step to use the correct pipeline generator. The new pipeline generator wouldn't work with the old yaml file, and vice versa.

Thanks for the elaboration, very clear.

Empty file.
10 changes: 10 additions & 0 deletions .buildkite/ci_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
name: vllm_omni_ci
github_repo_name: vllm-project/vllm-omni
job_dirs:
- ".buildkite/jobs"
run_all_patterns: []
run_all_exclude_patterns: []
registries: public.ecr.aws/q9t5s3a7
repositories:
main: "vllm-ci-postmerge-repo"
premerge: "vllm-ci-test-repo"
10 changes: 10 additions & 0 deletions .buildkite/jobs/build.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
group: Build
steps:
- label: ":docker: build vllm-omni-ci image"
key: image-build
depends_on: []
commands:
- "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7"
- "docker build --file docker/Dockerfile.ci -t vllm-omni-ci ."
- "docker tag vllm-omni-ci public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT"
- "docker push public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT"
30 changes: 30 additions & 0 deletions .buildkite/jobs/tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
group: Tests
depends_on:
- image-build
steps:
- label: "Simple Unit Test"
commands:
- ".buildkite/scripts/simple_test.sh"
no_gpu: true
no_plugin: true

- label: "Diffusion Model Test"
timeout_in_minutes: 15
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the timeout applied per label or per command?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's per job/label

commands:
- pytest -s -v tests/single_stage/test_diffusion_model.py
Comment on lines +11 to +14

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge GPU tests no longer run in built container

The GPU test steps are now plain command invocations without the docker or Kubernetes plugins that previously ran them inside public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT with HF cache mounts. With this change they execute directly on the host even though the build job still builds and pushes the container, so any GPU agent lacking the full Python environment or pre-seeded HuggingFace cache (common in these pipelines) will fail as soon as pytest starts because the required dependencies/models are missing.

Useful? React with 👍 / 👎.


- label: "Omni Model Test"
timeout_in_minutes: 15
num_gpus: 4
commands:
- export VLLM_LOGGING_LEVEL=DEBUG
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -s -v tests/multi_stages/

- label: "Omni Model Test with H100"
timeout_in_minutes: 20
Copy link
Contributor

@congw729 congw729 Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to set the timeout to 15 minutes. @ywang96 Do you agree we set 20 minutes for testing on H100?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout is already 20 minutes on main branch https://github.com/vllm-project/vllm-omni/blob/main/.buildkite/pipeline.yml#L55

Oops, my mistake! Thanks for catching that.

gpu: h100
num_gpus: 2
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to set the logging level here, align with the Omni Model Test?

- pytest -s -v tests/multi_stages_h100/
87 changes: 0 additions & 87 deletions .buildkite/pipeline.yml

This file was deleted.