LMDeploy Release V0.0.3
What's Changed
🚀 Features
- Support tensor parallelism without offline splitting model weights by @grimoire in #158
- Add script to split HuggingFace model to the smallest sharded checkpoints by @LZHgrla in #199
- Add non-stream inference api for chatbot by @lvhan028 in #200
💥 Improvements
- Add issue/pr templates by @lvhan028 in #184
- Remove unused code to reduce binary size by @lzhangzz in #181
- Support serving with gradio without communicating to TIS by @AllentDan in #162
- Improve postprocessing in TIS serving by applying Incremental de-tokenizing by @lvhan028 in #197
- Support multi-session chat by @wangruohui in #178
🐞 Bug fixes
- Fix build test error and move turbmind csrc test cases to
tests/csrcby @lvhan028 in #188 - Fix launching client error by moving lmdeploy/turbomind/utils.py to lmdeploy/utils.py by @lvhan028 in #191
📚 Documentations
- Update README.md by @tpoisonooo in #187
- Translate turbomind.md by @xin-li-67 in #173
New Contributors
Full Changelog: v0.0.2...v0.0.3