Skip to content

2xB60+GRR perf is much lower than that of Xeon-SP for DeepSeek-R1-Distill-Qwen-14B 2xB60 1K-512 FP8 #143

@dukelee111

Description

@dukelee111

Hi sir:
We encountered an issue where the performance of the Grandridge CPU with the B60 card is significantly lower compared to that on the Xeon-SP. Could you please help look into it? Thank you very much~

The detailed configuration as bellow:

Grandridge CPU(GRR): Intel Atom(R) P6962 processor CPU @ 2.6GHz
Xeon-SP CPU: Intel(R) Xeon(R) Gold 6438N CPU @ 2.0GHz (Has locked the CPU freq to 2.6GHz)
GPU: 1xDual-core B60(as 2 B60 Device)
llm-scaler docker image:intel/llm-scaler-vllm:0.10.0-b2
OS/Kernel: Ubuntu25.04/6.14.0

We have tested with ze_peer , 1ccl allreduce , Torch level P2P test on the 2 different platform. Please see attachment for the performance differences and test result. Thanks~

2xB60+GRR perf is much lower than that of Xeon-SP.pdf

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions