Enable Xeon optimizations like Tensor Parallel and AMX from vLLM 0.9.2 #2106

louie-tsai · 2025-07-01T01:27:18Z

Description

Added an additional compose.perf.yaml file, when users want to have more vLLM optimization, they just apply with one more yaml file during docker compose
docker compose -f compose.yaml -f compose.perf.yaml up

it includes most of the Xeon optimizations from public vLLM 0.9.2 which plan to be release this week.

Assume that we use a system with 2 NUMA nodes and AMX support.

Issues

[#2045 ]
[#2044 ]

Type of change

List the type of change like below. Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds new functionality)
Breaking change (fix or feature that would break existing design and interface)
Others (enhancement, documentation, validation, etc.)

Dependencies

vLLM 0.9.2 release
https://github.com/vllm-project/vllm/releases/tag/v0.9.2
https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo

Tests

github-actions · 2025-07-01T01:27:29Z

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

None

ChatQnA/kubernetes/helm/cpu-values.yaml

ChatQnA/docker_compose/intel/cpu/xeon/compose.perf.yaml

Signed-off-by: Tsai, Louie <[email protected]>

CICD-at-OPEA · 2025-08-06T22:46:48Z

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

louie-tsai requested review from lvliang-intel and letonghan as code owners July 1, 2025 01:27

This was referenced Jul 1, 2025

[Feature] enable AMX support for vLLM on GNR/EMR/SPR #2045

Open

[Feature] Enable vLLM V1 feature and Tensor/Pipeline Parallel to improve Performance #2044

Closed

louie-tsai requested a review from chensuyue July 1, 2025 01:31

louie-tsai force-pushed the vllm-optimize branch 2 times, most recently from cccd778 to 7072d2c Compare July 1, 2025 01:34

chensuyue reviewed Jul 1, 2025

View reviewed changes

ChatQnA/kubernetes/helm/cpu-values.yaml Outdated Show resolved Hide resolved

chensuyue reviewed Jul 1, 2025

View reviewed changes

ChatQnA/docker_compose/intel/cpu/xeon/compose.perf.yaml Show resolved Hide resolved

louie-tsai force-pushed the vllm-optimize branch from 4c2b923 to e93933e Compare July 3, 2025 17:33

louie-tsai requested a review from chensuyue July 3, 2025 17:34

louie-tsai force-pushed the vllm-optimize branch 2 times, most recently from c4d6f5e to 3740497 Compare July 7, 2025 18:20

louie-tsai added 2 commits July 7, 2025 11:24

changes to enable optimizatino from vLLM 0.9.2

b45a99b

Signed-off-by: Tsai, Louie <[email protected]>

adding CI test and new cpu-value-perf.yaml to address review feedback

2ee36fc

Signed-off-by: Tsai, Louie <[email protected]>

louie-tsai force-pushed the vllm-optimize branch from 3740497 to 2ee36fc Compare July 7, 2025 18:28

This was linked to issues Jul 9, 2025

[Feature] enable AMX support for vLLM on GNR/EMR/SPR #2045

Open

[Feature] Enable vLLM V1 feature and Tensor/Pipeline Parallel to improve Performance #2044

Closed

CICD-at-OPEA added the Stale label Aug 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable Xeon optimizations like Tensor Parallel and AMX from vLLM 0.9.2 #2106

Enable Xeon optimizations like Tensor Parallel and AMX from vLLM 0.9.2 #2106

louie-tsai commented Jul 1, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

CICD-at-OPEA commented Aug 6, 2025

Uh oh!

Uh oh!

Enable Xeon optimizations like Tensor Parallel and AMX from vLLM 0.9.2 #2106

Are you sure you want to change the base?

Enable Xeon optimizations like Tensor Parallel and AMX from vLLM 0.9.2 #2106

Conversation

louie-tsai commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues

Type of change

Dependencies

Tests

Uh oh!

github-actions bot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

Uh oh!

Uh oh!

CICD-at-OPEA commented Aug 6, 2025

Uh oh!

Uh oh!

louie-tsai commented Jul 1, 2025 •

edited

Loading

github-actions bot commented Jul 1, 2025 •

edited

Loading