Skip to content

Commit 7cb95ce

Browse files
author
蒄骰
committed
fix
1 parent a9bd1fe commit 7cb95ce

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

examples/contextual_asr/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
We use WavLM-Large model pre-trained on 94, 000 hours of data, and fine-tuned on 960h hours of Librispeech data with CTC loss, as our speech encoder. We use the public Vicuna 7B as our large language model decoder, and a simple-structured linear projector, consisting of a 1-D convolution layer and two linear layers as our adapter. Refer to our [paper](https://arxiv.org/pdf/2411.06437) for more details.
1010

11-
![](docs/model.pdf)
11+
![](docs/model.png)
1212

1313
## Checkpoints
1414
We only train the linear projector in this recipe.
@@ -17,7 +17,7 @@ Encoder | Projector | LLM
1717
[CTC Fine-tuned WavLM-Large](https://drive.google.com/file/d/12ZmSSbDvx73W0eK1wpUgajapCLhqh5DI/view?usp=drive_link)(~315.45M) | [Linear](https://drive.google.com/file/d/1Zlbsnz1YUWtYtt-yNyoPK5OhR30kwLfS/view?usp=drive_link)(~15.74M) | [vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5)(~6.7B)
1818

1919
## Performance
20-
![](docs/performanc.png)
20+
![](docs/performance.png)
2121

2222

2323
## Data preparation
856 KB
Loading

0 commit comments

Comments
 (0)