Skip to content

llm-scaler-vllm pre-production release 0.2.0

Pre-release
Pre-release

Choose a tag to compare

@glorysdj glorysdj released this 04 Jul 02:25
· 112 commits to main since this release

Highlights

Resources

What’s new

  • oneCCL reduces the buffer size and published official release in github.
  • GQA kernel brings up-to 30% improvement for models.
  • Bugfix for OOM issues exposed by stress test (more tests are ongoing).
  • Support 70B FP8 TP4 in offline mode.
  • DeepSeek-v2-lite accuracy fix.
  • Other bugfixes.

Verified Features

  • Refresh the KPI functionality and performance on 4x and 8x BMG e211 system. All KPI models now meet the goal. Add FP8 performance of DS-Distilled-LLaMA 70B model measured on 4xBMG w/ TP4 under offline mode.
  • FP8 functionality test for 32K-8K(ISL/OSL) on DS-Distilled-Qwen32B model on 4xBMG w/ TP4.
  • Verified model list for FP8 functionality.