diff --git a/README.md b/README.md index 2ced0b52..020ec2ce 100644 --- a/README.md +++ b/README.md @@ -101,6 +101,7 @@ This is the first work to correct hallucination in multimodal large language mod |:--------|:--------:|:--------:|:--------:|:--------:| | ![Star](https://img.shields.io/github/stars/deepseek-ai/DeepSeek-VL2.svg?style=social&label=Star)
[**DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding**](https://arxiv.org/pdf/2412.10302)
| arXiv | 2024-12-13 | [Github](https://github.com/deepseek-ai/DeepSeek-VL2) | - | | [**Apollo: An Exploration of Video Understanding in Large Multimodal Models**](https://arxiv.org/pdf/2412.10360) | arXiv | 2024-12-13 | - | - | +| ![Star](https://img.shields.io/github/stars/dvlab-research/Lyra.svg?style=social&label=Star)
[**Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition**](https://arxiv.org/pdf/2412.09501) | arXiv | 2024-12-12 | [Github](https://github.com/dvlab-research/Lyra?tab=readme-ov-file) | [Demo](https://103.170.5.190:17860) | | ![Star](https://img.shields.io/github/stars/InternLM/InternLM-XComposer.svg?style=social&label=Star)
[**InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions**](https://arxiv.org/pdf/2412.09596)
| arXiv | 2024-12-12 | [Github](https://github.com/InternLM/InternLM-XComposer/tree/main/InternLM-XComposer-2.5-OmniLive) | Local Demo | | [**StreamChat: Chatting with Streaming Video**](https://arxiv.org/pdf/2412.08646) | arXiv | 2024-12-11 | Coming soon | - | | [**CompCap: Improving Multimodal Large Language Models with Composite Captions**](https://arxiv.org/pdf/2412.05243) | arXiv | 2024-12-06 | - | - |