X-LANCE
diff --git a/‎examples/contextual_asr/README.md‎
Lines changed: 2 additions & 2 deletions b/‎examples/contextual_asr/README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎examples/contextual_asr/docs/model.png‎
856 KB b/‎examples/contextual_asr/docs/model.png‎
856 KB
@@ -8,7 +8,7 @@
 
 We use WavLM-Large model pre-trained on 94, 000 hours of data, and fine-tuned on 960h hours of Librispeech data with CTC loss, as our speech encoder. We use the public Vicuna 7B as our large language model decoder, and a simple-structured linear projector, consisting of a 1-D convolution layer and two linear layers as our adapter. Refer to our [paper](https://arxiv.org/pdf/2411.06437) for more details.
 
-![](docs/model.pdf)
+![](docs/model.png)
 
 ## Checkpoints
 We only train the linear projector in this recipe.
@@ -17,7 +17,7 @@ Encoder | Projector | LLM
 [CTC Fine-tuned WavLM-Large](https://drive.google.com/file/d/12ZmSSbDvx73W0eK1wpUgajapCLhqh5DI/view?usp=drive_link)(~315.45M) | [Linear](https://drive.google.com/file/d/1Zlbsnz1YUWtYtt-yNyoPK5OhR30kwLfS/view?usp=drive_link)(~15.74M) | [vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5)(~6.7B)  
 
 ## Performance
-![](docs/performanc.png)
+![](docs/performance.png)
 
 
 ## Data preparation