Skip to content

Commit b8dbc12

Browse files
committed
SLAM-AAC
1 parent 56fb822 commit b8dbc12

File tree

3 files changed

+7
-9
lines changed

3 files changed

+7
-9
lines changed

examples/slam_aac/README.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -54,8 +54,8 @@ You can also fine-tune the model without loading any pre-trained weights, though
5454

5555

5656
### Note
57-
In the current version of SLAM-LLM, the `peft_ckpt` parameter is no longer required. However, if you are using the checkpoint provided by us, which was trained with an earlier version, please keep the `peft_ckpt` parameter in your configuration to ensure compatibility.
58-
57+
- In the current version of SLAM-LLM, the `peft_ckpt` parameter is no longer required. However, if you are using the checkpoint provided by us, which was trained with an earlier version, please keep the `peft_ckpt` parameter in your configuration to ensure compatibility.
58+
- Due to differences in dependency versions, there may be slight variations in the performance of the SLAM-AAC model.
5959

6060
## Inference
6161
To perform inference with the trained models, you can use the following commands to decode using the common beam search method:
@@ -67,7 +67,7 @@ bash scripts/inference_audiocaps_bs.sh
6767
bash scripts/inference_clotho_bs.sh
6868
```
6969

70-
For improved inference results, you can use the CLAP-Refine strategy, which utilizes multiple beam search decoding. Note that this method may take longer to run, but it can provide better quality outputs. You can execute the following commands:
70+
For improved inference results, you can use the CLAP-Refine strategy, which utilizes multiple beam search decoding. To use this method, you need to download and use our pre-trained [CLAP](https://drive.google.com/drive/folders/1X4NYE08N-kbOy6s_Itb0wBR_3X8oZF56?usp=sharing) model. Note that CLAP-Refine may take longer to run, but it can provide better quality outputs. You can execute the following commands:
7171
```bash
7272
# Inference on AudioCaps (CLAP-Refine)
7373
bash scripts/inference_audiocaps_CLAP_Refine.sh
@@ -86,5 +86,3 @@ You can refer to the paper for more results.
8686
```
8787
8888
``` -->
89-
90-
<!-- [CLAP](https://drive.google.com/drive/folders/1X4NYE08N-kbOy6s_Itb0wBR_3X8oZF56?usp=sharing) model for post-processing (CLAP-refine) -->

examples/slam_aac/scripts/clap_refine.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@ cd $run_dir
66
code_dir=examples/slam_aac
77

88
clap_dir=/data/xiquan.li/models/clap
9-
inference_data_path=/data/wenxi.chen/data/clotho/evaluation_single.jsonl
10-
output_dir=/data/wenxi.chen/cp/wavcaps_pt_v7_epoch4-clotho_ft-seed10086_btz4_lr8e-6-short_prompt_10w/aac_epoch_1_step_4500
9+
inference_data_path=/data/wenxi.chen/data/audiocaps/new_test.jsonl
10+
output_dir=/data/wenxi.chen/cp/aac_epoch_2_step_182_audiocaps_seed42
1111

1212
echo "Running CLAP-Refine"
1313

src/slam_llm/models/CLAP/feature_extractor.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,10 @@ def __init__(self, audio_config):
2727
fmin=audio_config["f_min"],
2828
fmax=audio_config["f_max"],
2929
ref=1.0,
30-
amin=1e-6,
30+
amin=audio_config.get("amin", 1e-6),
3131
top_db=None,
3232
freeze_parameters=True)
33-
33+
3434
def forward(self, input):
3535
# input: waveform [bs, wav_length]
3636
mel_feats = self.mel_trans(input)

0 commit comments

Comments
 (0)