Nightly CI that builds CPU-only vLLM
wheels for Linux x86_64 and macOS arm64 and publishes them to PyPI as
vllm-cpu-nightly.
The Python import name is unchanged — consumers still write import vllm.
vLLM upstream does not publish prebuilt CPU wheels for either Linux or macOS.
Building from source on every CI run takes 30–40 minutes per platform and
burns CI minutes; this repo does the build once a day and downstream
projects (e.g. LMCache CPU device tests) just pip install the wheel.
-
Daily schedule — every day at
06:00 UTC(~14:00 Beijing). -
Manual — via the Actions → Nightly build & publish vllm-cpu → Run workflow button, or:
gh workflow run nightly.yml -R opendataio/vllm-cpu # optional inputs: gh workflow run nightly.yml -R opendataio/vllm-cpu \ -f vllm_git_ref=main \ -f skip_upload=0
Each run produces a wheel whose version is
<latest-stable-tag-of-vllm-bumped>.devN<YYYYMMDDHHMM>
e.g. 0.11.1.dev202506110600. The minute stamp is monotonically
increasing, so pip install --upgrade vllm-cpu-nightly always picks up
the newest one. Older versions stay installable on PyPI for rollback by
pinning the exact version string.
The workflow needs a single repository secret:
| Name | Value |
|---|---|
PYPI_TOKEN |
A PyPI API token with upload permission to vllm-cpu-nightly. |
pip install --upgrade vllm-cpu-nightly \
--extra-index-url https://download.pytorch.org/whl/cpuThe --extra-index-url is required on Linux because the wheel depends on
torch==2.11.0 (CPU build), which only lives on the PyTorch index.
On macOS arm64 the default PyPI torch wheel is already CPU-only so the
extra index is harmless.
.
├── .github/workflows/nightly.yml # the cron + dispatch workflow
└── scripts/
├── build_and_publish_vllm_cpu_nightly.sh # ubuntu/x86_64 build
└── build_and_publish_vllm_cpu_nightly_macos.sh # macos/arm64 build
Both scripts are also runnable locally for debugging — see the header
comments in each script for env knobs (PYPI_TOKEN_FILE, VLLM_GIT_REF,
SKIP_UPLOAD, WORK_DIR, …).