[DO NOT MERGE]: swap vLLM wheel to upstream v0.21.0 for LoRA + EP by JohannesHa · Pull Request #2620 · PrimeIntellect-ai/prime-rl

JohannesHa · 2026-05-24T03:00:11Z

The PrimeIntellect custom wheel (0.21.0+cu129.r42434.pr39568.a106aa6) was branched off vLLM main before PR #40867 ("Initial EP support for LoRA", merged 2026-05-09) landed, so it still asserts against base_layer.use_ep in FusedMoEWithLoRA and rejects MoE LoRA + EP. Upstream v0.21.0 (cut 2026-05-15) carries the merge and serves EP + LoRA cleanly. Verified locally on 2 GPUs with Qwen3-30B-A3B + jeeejeee/qwen3-moe-text2sql-spider at tp=2, EP=true.

Side effect: the routed-experts / NIXL P/D path that the fork wheel carried is no longer available, so prime-rl's
enable_return_routed_experts reads are guarded with getattr to keep the patches/serving-tokens imports working on the upstream wheel.

Note

Medium Risk
Swapping the x86_64 vllm wheel from a custom fork to upstream may change inference/runtime behavior and performance, especially around P/D (NIXL) and LoRA/EP paths. The code changes are small but sit in inference patching/serving logic and could affect routed-experts response shaping.

Overview
Moves x86_64 installs off the PrimeIntellect custom vllm wheel to the upstream v0.21.0 CUDA 12.9 wheel (with corresponding uv.lock updates), primarily to pick up upstream LoRA + expert-parallel support.

Updates prime-rl’s routed-experts/NIXL integration to be compatible with upstream vLLM by guarding enable_return_routed_experts reads with getattr(..., False) in both the vLLM config __post_init__ patch and the /inference/v1/generate non-streaming response post-processing path.

^{Reviewed by Cursor Bugbot for commit 80137f3. Bugbot is set up for automated code reviews on this repo. Configure here.}

The PrimeIntellect custom wheel (0.21.0+cu129.r42434.pr39568.a106aa6) was branched off vLLM main before PR #40867 ("Initial EP support for LoRA", merged 2026-05-09) landed, so it still asserts against `base_layer.use_ep` in `FusedMoEWithLoRA` and rejects MoE LoRA + EP. Upstream v0.21.0 (cut 2026-05-15) carries the merge and serves EP + LoRA cleanly. Verified locally on 2 GPUs with Qwen3-30B-A3B + jeeejeee/qwen3-moe-text2sql-spider at tp=2, EP=true. Side effect: the routed-experts / NIXL P/D path that the fork wheel carried is no longer available, so prime-rl's `enable_return_routed_experts` reads are guarded with `getattr` to keep the patches/serving-tokens imports working on the upstream wheel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

JannikSt

great job

JannikSt requested changes May 24, 2026

View reviewed changes

Merge branch 'main' into feat/vllm-upstream-lora-ep

80137f3

JohannesHa changed the title ~~feat(inference): swap vLLM wheel to upstream v0.21.0 for LoRA + EP~~ [DO NOT MERGE]: swap vLLM wheel to upstream v0.21.0 for LoRA + EP May 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DO NOT MERGE]: swap vLLM wheel to upstream v0.21.0 for LoRA + EP#2620

[DO NOT MERGE]: swap vLLM wheel to upstream v0.21.0 for LoRA + EP#2620
JohannesHa wants to merge 2 commits into
mainfrom
feat/vllm-upstream-lora-ep

JohannesHa commented May 24, 2026 •

edited by cursor Bot

Loading

Uh oh!

JannikSt left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JohannesHa commented May 24, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JannikSt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JohannesHa commented May 24, 2026 •

edited by cursor Bot

Loading

JannikSt left a comment •

edited

Loading