Releases · jd-opensource/xllm

20 Nov 14:01

JimHsiung

v0.7.1

2ed4e74

v0.7.1 Latest

Latest

Highlights

Model Support

Support GLM-4.5-Air.
Support Qwen3-VL-Moe.

Feature

Support scheduler overlap when enable chunked prefill and MTP.
Enable multi-process mode when running VLM model.
Support AclGraph for GLM-4.5.

Bugfix

Reslove core dump of qwen embedding 0.6B.
Resolve duplicate content in multi-turn tool call conversations.
Support sampler parameters for MTP.
Enable MTP and schedule overlap to work simultaneously.
Resolve google.protobuf.Struct parsing failures which broke tool_call and think toggle functionality.
Fix the precision issue in the Qwen2 model caused by model_type is not be assigned.
Fix core dump of GLM 4.5 when enable MTP.
Temporarily use heap allocation for VLM backend.
Reslove core dump of stream chat completion request for VLM.

Assets 2

20 Nov 13:03

JimHsiung

v0.7.0

53b6e6f

v0.7.0

Highlights

Model Support

Support GLM-4.5.
Support Qwen3-Embedding.
Support Qwen3-VL.
Support FluxFill.

Feature

Support MLU backend, currently supports Qwen3 series models.
Support dynamic disaggregated PD, with dynamic switching between P and D phases based on strategy.
Support multi-stream parallel overlap optimization.
Support beam-search capability in generative models.
Support virtual memory continuous kv-cache capability.
Support ACL graph executor.
Support unified online-offline co-location scheduling in disaggregated PD scenarios.
Support PrefillOnly Scheduler.
Support v1/rerank model service interface.
Support communication between devices via shared memory instead of RPC on a single machine.
Support function call.
Support reasoning output in chat interface.
Support top-k+add fusion in the router component of MoE models.
Support offline inference for LLM, VLM, and Embedding models.
Optimized certain runtime performance.

Bugfix

Skip cancelled requests when processing stream output.
Resolve segmentation fault during qwen3 quantized inference.
Fix the alignment of monitoring metrics format for Prometheus.
Clear outdated tensors to save memory when loading model weights.
Fix attention mask to support long sequence requests.
Fix bugs caused by enabling scheduler overlap.

Assets 2

31 Oct 02:41

JimHsiung

v0.6.1

a0ca5b4

v0.6.1

Highlights

Bugfix

Skip cancelled requests when processing stream output.
Resolve segmentation fault during qwen3 quantized inference.
Fix the alignment of monitoring metrics format for Prometheus.
Clear outdated tensors to save memory when loading model weights.

Release Images

x86 image

quay.io/jd_xllm/xllm-ai:xllm-0.6.1-release-hb-rc2-x86

ARM a2 device image

quay.io/jd_xllm/xllm-ai:xllm-0.6.1-release-hb-rc2-arm

ARM a3 device image

quay.io/jd_xllm/xllm-ai:xllm-0.6.1-release-hc-rc2-arm

Assets 2

15 Sep 14:31

yq33victor

v0.6.0

97aa650

v0.6.0

Highlights

Model Support

Support DeepSeek-V3/R1.
Support DeepSeek-R1-Distill-Qwen.
Support Kimi-k2.
Support Llama2/3.
Support Qwen2/2.5/QwQ.
Support Qwen3/Qwen3-MoE.
Support MiniCPM-V.
Support MiMo-VL.
Support Qwen2.5-VL .

Feature

Support KV cache store.
Support Expert Parallelism Load Balance.
Support multi-priority on/offline scheduler.
Support latency-aware scheduler.
Support serving early stop.
Optimize ppmatmul kernel.
Support image url input for VLM.
Support disaggregated prefill and decoding.
Support large-scale EP parallelism.
Support Hash-based PrefixCache matching.
Support Multi-Token Prediction for DeepSeek.
Support asynchronous scheduling, allowing the scheduling and computational pipeline to execute in parallel.
Support EP, DP, TP model parallel.
Support multiple process and multiple nodes.

Docs

Add getting started docs.
Add features docs.

Release Images

x86 image

quay.io/jd_xllm/xllm-ai:xllm-0.6.0-release-hb-rc2-py3.11-oe24.03-lts-x86

ARM a2 device image

quay.io/jd_xllm/xllm-ai:xllm-0.6.0-release-hb-rc2-py3.11-oe24.03-lts-arm

ARM a3 device image

quay.io/jd_xllm/xllm-ai:xllm-0.6.0-release-hc-rc2-py3.11-oe24.03-lts-arm

Assets 2

Releases: jd-opensource/xllm

v0.7.1

Highlights

Model Support

Feature

Bugfix

Uh oh!

v0.7.0

Highlights

Model Support

Feature

Bugfix

Uh oh!

v0.6.1

Highlights

Bugfix

Release Images

x86 image

ARM a2 device image

ARM a3 device image

Uh oh!

v0.6.0

Highlights

Model Support

Feature

Docs

Release Images

x86 image

ARM a2 device image

ARM a3 device image

Uh oh!