From b8cc0cbe16bd3e960784c69090c3b5dc66a4b7b5 Mon Sep 17 00:00:00 2001
From: Chengyu Fang <36543092+cnyvfang@users.noreply.github.com>
Date: Fri, 20 Dec 2024 14:47:57 +0800
Subject: [PATCH] Add Lyra
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
https://arxiv.org/pdf/2412.09501
https://github.com/dvlab-research/Lyra?tab=readme-ov-file
---
README.md | 1 +
1 file changed, 1 insertion(+)
diff --git a/README.md b/README.md
index 2ced0b52..020ec2ce 100644
--- a/README.md
+++ b/README.md
@@ -101,6 +101,7 @@ This is the first work to correct hallucination in multimodal large language mod
|:--------|:--------:|:--------:|:--------:|:--------:|
| 
[**DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding**](https://arxiv.org/pdf/2412.10302)
| arXiv | 2024-12-13 | [Github](https://github.com/deepseek-ai/DeepSeek-VL2) | - |
| [**Apollo: An Exploration of Video Understanding in Large Multimodal Models**](https://arxiv.org/pdf/2412.10360) | arXiv | 2024-12-13 | - | - |
+| 
[**Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition**](https://arxiv.org/pdf/2412.09501) | arXiv | 2024-12-12 | [Github](https://github.com/dvlab-research/Lyra?tab=readme-ov-file) | [Demo](https://103.170.5.190:17860) |
| 
[**InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions**](https://arxiv.org/pdf/2412.09596)
| arXiv | 2024-12-12 | [Github](https://github.com/InternLM/InternLM-XComposer/tree/main/InternLM-XComposer-2.5-OmniLive) | Local Demo |
| [**StreamChat: Chatting with Streaming Video**](https://arxiv.org/pdf/2412.08646) | arXiv | 2024-12-11 | Coming soon | - |
| [**CompCap: Improving Multimodal Large Language Models with Composite Captions**](https://arxiv.org/pdf/2412.05243) | arXiv | 2024-12-06 | - | - |