Skip to content

Commit 54b7230

Browse files
authored
bump version to v0.4.2 (#1644)
* bump version to v0.4.2 * update latest news
1 parent d7bf13f commit 54b7230

File tree

3 files changed

+7
-5
lines changed

3 files changed

+7
-5
lines changed

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,8 @@ ______________________________________________________________________
2626
<details open>
2727
<summary><b>2024</b></summary>
2828

29-
- \[2024/05\] Support VLMs quantization, such as InternVL v1.5, LLaVa, InternLMXComposer2.
29+
- \[2024/05\] Balance vision model when deploying VLMs with multiple GPUs
30+
- \[2024/05\] Support 4-bits weight-only quantization and inference on VMLs, such as InternVL v1.5, LLaVa, InternLMXComposer2
3031
- \[2024/04\] Support Llama3 and more VLMs, such as InternVL v1.1, v1.2, MiniGemini, InternLMXComposer2.
3132
- \[2024/04\] TurboMind adds online int8/int4 KV cache quantization and inference for all supported devices. Refer [here](docs/en/quantization/kv_quant.md) for detailed guide
3233
- \[2024/04\] TurboMind latest upgrade boosts GQA, rocketing the [internlm2-20b](https://huggingface.co/internlm/internlm2-20b) model inference to 16+ RPS, about 1.8x faster than vLLM.
@@ -122,6 +123,7 @@ For detailed inference benchmarks in more devices and more settings, please refe
122123
<li>Gemma (2B - 7B)</li>
123124
<li>Dbrx (132B)</li>
124125
<li>Phi-3-mini (3.8B)</li>
126+
<li>StarCoder2 (3B - 15B)</li>
125127
</ul>
126128
</td>
127129
<td>
@@ -133,7 +135,6 @@ For detailed inference benchmarks in more devices and more settings, please refe
133135
<li>DeepSeek-VL (7B)</li>
134136
<li>InternVL-Chat (v1.1-v1.5)</li>
135137
<li>MiniGeminiLlama (7B)</li>
136-
<li>StarCoder2 (3B - 15B)</li>
137138
</ul>
138139
</td>
139140
</tr>

README_zh-CN.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,8 @@ ______________________________________________________________________
2626
<details open>
2727
<summary><b>2024</b></summary>
2828

29-
- \[2024/05\] 支持 InternVL v1.5, LLaVa, InternLMXComposer2 等 VLM 模型的量化与推理。
29+
- \[2024/05\] 在多 GPU 上部署 VLM 模型时,支持把视觉部分的模型均分到多卡上
30+
- \[2024/05\] 支持InternVL v1.5, LLaVa, InternLMXComposer2 等 VLMs 模型的 4bit 权重量化和推理
3031
- \[2024/04\] 支持 Llama3 和 InternVL v1.1, v1.2,MiniGemini,InternLM-XComposer2 等 VLM 模型
3132
- \[2024/04\] TurboMind 支持 kv cache int4/int8 在线量化和推理,适用已支持的所有型号显卡。详情请参考[这里](docs/zh_cn/quantization/kv_quant.md)
3233
- \[2024/04\] TurboMind 引擎升级,优化 GQA 推理。[internlm2-20b](https://huggingface.co/internlm/internlm2-20b) 推理速度达 16+ RPS,约是 vLLM 的 1.8 倍
@@ -123,6 +124,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
123124
<li>Gemma (2B - 7B)</li>
124125
<li>Dbrx (132B)</li>
125126
<li>Phi-3-mini (3.8B)</li>
127+
<li>StarCoder2 (3B - 15B)</li>
126128
</ul>
127129
</td>
128130
<td>
@@ -134,7 +136,6 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
134136
<li>DeepSeek-VL (7B)</li>
135137
<li>InternVL-Chat (v1.1-v1.5)</li>
136138
<li>MiniGeminiLlama (7B)</li>
137-
<li>StarCoder2 (3B - 15B)</li>
138139
</ul>
139140
</td>
140141
</tr>

lmdeploy/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Copyright (c) OpenMMLab. All rights reserved.
22
from typing import Tuple
33

4-
__version__ = '0.4.1'
4+
__version__ = '0.4.2'
55
short_version = __version__
66

77

0 commit comments

Comments
 (0)