Skip to content

Commit accd0e5

Browse files
Merge pull request #233 from teamtee/main
Fixed the join bug caused by Deepspeed adaptation and improved the model saving of Deepspeed
2 parents 440ece6 + 856f430 commit accd0e5

File tree

15 files changed

+46
-97
lines changed

15 files changed

+46
-97
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ data/
1515
jobs/
1616
debug/
1717
audio/
18-
18+
exp/
1919
examples/s2s/scripts/debug
2020
examples/vsr_LRS3/scripts/decode_avhubert_vo_vicuna_7b_noself.sh
2121
examples/asr_librispeech/scripts/decode_hubert_xtralarge_linear_vicuna_7b_copy.sh

examples/aispeech_asr/README.md

Lines changed: 2 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,6 @@ dev_scp_file_path= # Path to validation data
6666
train_max_frame_length=1500 # Maximum frame length for training
6767
eval_max_frame_length=1000 # Maximum frame length for evaluation
6868
multitask_prompt_path= # Path to multitask.jsonl
69-
prompt_style="\{\}" # Prompt style, e.g., "<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n" or "USER: {}\n ASSISTANT:"
7069
projector=linear # Type of projector
7170
encoder_name=whisper # Name of the encoder
7271
llm_name=Qwen2.5-7B-Instruct # Name of the LLM
@@ -86,7 +85,7 @@ For LoRA training, set (with `ckpt_path` pointing to the model saved in the prev
8685
```bash
8786
use_peft=true
8887
if [[ $use_peft == "true" ]]; then
89-
ckpt_path= # For DDP training, provide the path to the saved pt file; for DeepSpeed training, convert mp_rank_00_model_states.pt to model.pt using the `scripts/transcribe_deepspeed_to_pt.py` script
88+
ckpt_path=
9089
fi
9190
```
9291
### Deepspeed
@@ -113,28 +112,7 @@ When using `bf16`/`fp16` for training, deepspeed saves about 20GB of GPU memory
113112
}
114113
}
115114
```
116-
117-
Note that when using `zero-0`/`1`/`2`, the DeepSpeed model is saved in a format that requires a script to convert `mp_rank_00_model_states.pt` to `model.pt`, such as `python scripts/transcribe_deepspeed_to_pt.py mp_rank_00_model_states.pt output_dir`.
118-
119-
```
120-
global_step1000
121-
global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
122-
...
123-
global_step1000/mp_rank_00_model_states.pt
124-
latest
125-
zero_to_fp32.py
126-
```
127-
128-
If training with `Zero-3`, the model is saved in a different format and can be converted using `python zero_to_fp32.py global_step50 outputdir`.
129-
130-
```
131-
global_step50
132-
global_step50/zero_pp_rank_0_mp_rank_00_model_states.pt
133-
global_step50/zero_pp_rank_0_mp_rank_00_optim_states.pt
134-
...
135-
latest
136-
zero_to_fp32.py
137-
```
115+
Note that when using `zero-0`/`1`/`2`/`3`, the DeepSpeed model is saved as `pytorch_model.bin`
138116
If you use bf16/fp16 training in DeepSpeed and encounter NaN in train/eval loss, check the autocast in `src/slam_llm/utils/deepspeed_utils.py`:
139117

140118
```python

examples/aispeech_asr/README_zh.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -82,11 +82,11 @@ deepspeed_config= # DeepSpeed配置文件路径
8282
use_peft=false
8383
```
8484

85-
训练LoRA时,设置如下(`ckpt_path`是上一步训练保存的模型路径):
85+
训练LoRA时,设置如下(`ckpt_path`是上一步训练保存的模型路径`pytorch_model.bin/model.pt`):
8686
```bash
8787
use_peft=true
8888
if [[ $use_peft == "true" ]]; then
89-
ckpt_path= # 如果是DDP训练,直接写入保存的pt文件路径;如果是Deepspeed训练,需将mp_rank_00_model_states.pt文件转化为model.pt,可使用`scripts/transcribe_deepspeed_to_pt.py`脚本
89+
ckpt_path=
9090
fi
9191
```
9292

examples/aispeech_asr/aispeech_asr_config.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,8 +91,10 @@ class DataConfig:
9191
dataset: str = "multitask_dataset"
9292
train_max_frame_length: int = 1500
9393
eval_max_frame_length: int = 1000
94+
audio_sample_rate: int = 16000
95+
max_audio_length: int = 30
9496
multitask_prompt_path: str = "/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/data/multiprompt.jsonl"
95-
prompt_style: str = "\{\}" #
97+
prompt_style: str = "{}" # "<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n" | "USER: {}\n ASSISTANT:" Comment:Changed it in aispeech_asr_config.py
9698
append_info_tasks : List = field(default_factory=lambda: [ "hotword"])
9799
file: str = "examples/aispeech_asr/slam_llm/datasets/speech_dataset_large.py:get_speech_dataset"
98100
train_scp_file_path: str = ""

examples/aispeech_asr/scripts/decode.sh

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
11
#!/bin/bash
22
set -e
3-
run_dir=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/github/SLAM-LLM-NPU
3+
run_dir=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/github/SLAM-LLM
44
cd $run_dir
55
code_dir=examples/aispeech_asr
66

7-
prompt_style="\{\}" # "<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n" | "USER: {}\n ASSISTANT:"
87
projector=linear
98
encoder_name=whisper
109
llm_name=Qwen2.5-7B-Instruct
@@ -15,6 +14,7 @@ encoder_projector_ds_rate=5
1514
eval_max_frame_length=1000
1615
ckpt_path=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/project/aispeech_asr/exp/librispeech/20250322/whisper_linear_Qwen2.5-7B-Instruct_lorafalse_padtrue_normal_asr_speedfalse_specaugfalse-1121/mala_asr_epoch_2_step_25000_best
1716
test_scp_file_path=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/data/aishell-1/asr/test
17+
# prompt_style="\{\}" # "<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n" | "USER: {}\n ASSISTANT:" Comment:Changed it in aispeech_asr_config.py
1818

1919

2020
# Choose Encoder
@@ -69,7 +69,6 @@ python \
6969
++model_config.encoder_path=$speech_encoder_path \
7070
++model_config.encoder_dim=$encoder_dim \
7171
++model_config.encoder_projector=$projector \
72-
++dataset_config.prompt_style=$prompt_style \
7372
++dataset_config.dataset=$dataset \
7473
++dataset_config.pad_or_trim=$pad_or_trim \
7574
++dataset_config.test_scp_file_path=$test_scp_file_path \

examples/aispeech_asr/scripts/decode_deepspeed.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
11
#!/bin/bash
22
set -e
3-
run_dir=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/github/SLAM-LLM-NPU
3+
run_dir=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/github/SLAM-LLM
44
cd $run_dir
55
code_dir=examples/aispeech_asr
66

7-
prompt_style="\{\}" # "<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n" | "USER: {}\n ASSISTANT:"
87
projector=linear
98
encoder_name=whisper
109
llm_name=Qwen2.5-7B-Instruct
@@ -15,6 +14,7 @@ encoder_projector_ds_rate=5
1514
eval_max_frame_length=1000
1615
ckpt_path=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/project/aispeech_asr/exp/librispeech/20250322/whisper_linear_Qwen2.5-7B-Instruct_lorafalse_padtrue_normal_asr_speedfalse_specaugfalse-1121/mala_asr_epoch_2_step_25000_best
1716
test_scp_file_path=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/data/aishell-1/asr/test
17+
# prompt_style="\{\}" # "<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n" | "USER: {}\n ASSISTANT:" Comment:Changed it in aispeech_asr_config.py
1818

1919

2020
# Choose Encoder

examples/aispeech_asr/scripts/finetune_deepspeed.sh

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,16 +9,16 @@ export OMP_NUM_THREADS=1
99

1010

1111

12-
run_dir=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/github/SLAM-LLM-NPU
12+
run_dir=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/github/SLAM-LLM
1313
cd $run_dir
1414
code_dir=examples/aispeech_asr
1515

1616
train_scp_file_path=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/data/aishell-1/asr/test
1717
dev_scp_file_path=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/data/aishell-1/asr/test
18-
train_max_frame_length=500
19-
eval_max_frame_length=500
18+
train_max_frame_length=2000
19+
eval_max_frame_length=2500
2020
multitask_prompt_path="/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/data/multiprompt.jsonl"
21-
prompt_style="\{\}" # "<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n" | "USER: {}\n ASSISTANT:"
21+
# prompt_style="\{\}" # "<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n" | "USER: {}\n ASSISTANT:" Comment:Changed it in aispeech_asr_config.py
2222
projector=linear
2323
encoder_name=whisper
2424
llm_name=Qwen2.5-7B-Instruct
@@ -30,7 +30,7 @@ pad_or_trim=true # For whisper
3030
deepspeed_config=examples/aispeech_asr/conf/ds_config.json
3131

3232
if [[ $use_peft == "true" || $freeze_encoder == false ]];then
33-
ckpt_path=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/project/aispeech_asr/exp/slidespeech/20250414/whisper_linear_Qwen2.5-7B-Instruct_lorafalse_padtrue_normal_asr_speedfalse_specaugfalse-1515_slidespeech_text/mala_asr_epoch_2_step_7000
33+
ckpt_path=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/project/aispeech_asr/exp/slidespeech/20250414/whisper_linear_Qwen2.5-7B-Instruct_lorafalse_padtrue_normal_asr_speedfalse_specaugfalse-1515_slidespeech_text/mala_asr_epoch_2_step_7000/pytorch_model.bin
3434
fi
3535

3636
# Choose Encoder
@@ -86,7 +86,6 @@ hydra.run.dir=$output_dir \
8686
++model_config.encoder_path=$speech_encoder_path \
8787
++model_config.encoder_dim=$encoder_dim \
8888
++model_config.encoder_projector=$projector \
89-
++dataset_config.prompt_style=$prompt_style \
9089
++dataset_config.train_max_frame_length=$train_max_frame_length \
9190
++dataset_config.eval_max_frame_length=$eval_max_frame_length \
9291
++dataset_config.multitask_prompt_path=$multitask_prompt_path \
@@ -107,7 +106,7 @@ hydra.run.dir=$output_dir \
107106
++metric=acc \
108107
"
109108
if [[ $use_peft == "true" || $freeze_encoder == false ]];then
110-
hydra_args+="++ckpt_path=$ckpt_path/model.pt"
109+
hydra_args+="++ckpt_path=$ckpt_path"
111110
fi
112111

113112

examples/aispeech_asr/scripts/finetune_torchrun.sh

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,16 +9,16 @@ export OMP_NUM_THREADS=1
99

1010

1111

12-
run_dir=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/github/SLAM-LLM-NPU
12+
run_dir=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/github/SLAM-LLM
1313
cd $run_dir
1414
code_dir=examples/aispeech_asr
1515

1616
train_scp_file_path=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/data/aishell-1/asr/test
1717
dev_scp_file_path=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/data/aishell-1/asr/test
18-
train_max_frame_length=1500
19-
eval_max_frame_length=3000
18+
train_max_frame_length=1400
19+
eval_max_frame_length=2000
2020
multitask_prompt_path="/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/data/multiprompt.jsonl"
21-
prompt_style="\{\}" # "<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n" | "USER: {}\n ASSISTANT:"
21+
# prompt_style="\{\}" # "<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n" | "USER: {}\n ASSISTANT:" Comment:Changed it in aispeech_asr_config.py
2222
projector=linear
2323
encoder_name=whisper
2424
llm_name=Qwen2.5-1.5B-Instruct
@@ -29,7 +29,7 @@ pad_or_trim=true # For whisper
2929

3030

3131
if [[ $use_peft == "true" || $freeze_encoder == false ]];then
32-
ckpt_path=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/project/aispeech_asr/exp/slidespeech/20250414/whisper_linear_Qwen2.5-7B-Instruct_lorafalse_padtrue_normal_asr_speedfalse_specaugfalse-1515_slidespeech_text/mala_asr_epoch_2_step_7000
32+
ckpt_path=/aistor/aispeech/hpc_stor01/home/fangyangui/workingspace/project/aispeech_asr/exp/slidespeech/20250414/whisper_linear_Qwen2.5-7B-Instruct_lorafalse_padtrue_normal_asr_speedfalse_specaugfalse-1515_slidespeech_text/mala_asr_epoch_2_step_7000/model.pt
3333
fi
3434

3535
# Choose Encoder
@@ -89,7 +89,6 @@ hydra.run.dir=$output_dir \
8989
++model_config.encoder_path=$speech_encoder_path \
9090
++model_config.encoder_dim=$encoder_dim \
9191
++model_config.encoder_projector=$projector \
92-
++dataset_config.prompt_style=$prompt_style \
9392
++dataset_config.train_max_frame_length=$train_max_frame_length \
9493
++dataset_config.eval_max_frame_length=$eval_max_frame_length \
9594
++dataset_config.multitask_prompt_path=$multitask_prompt_path \
@@ -104,18 +103,18 @@ hydra.run.dir=$output_dir \
104103
++train_config.freeze_llm=true \
105104
++train_config.use_peft=$use_peft \
106105
++train_config.batching_strategy=dynamic \
107-
++train_config.validation_interval=10 \
106+
++train_config.validation_interval=1000 \
108107
++train_config.num_workers_dataloader=8 \
109108
++train_config.output_dir=$output_dir \
110109
++metric=acc \
111110
"
112111
if [[ $use_peft == "true" || $freeze_encoder == false ]];then
113-
hydra_args+="++ckpt_path=$ckpt_path/model.pt"
112+
hydra_args+="++ckpt_path=$ckpt_path"
114113
fi
115114

116115
torchrun \
117116
--nnodes 1 \
118-
--nproc_per_node 2 \
117+
--nproc_per_node 8 \
119118
--master_port=29505 \
120119
$code_dir/finetune_torchrun.py \
121120
--config-path "conf" \

examples/aispeech_asr/scripts/transcribe_deepspeed_to_pt.py

Lines changed: 0 additions & 9 deletions
This file was deleted.

examples/asr_librispeech/README.md

Lines changed: 2 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -79,27 +79,7 @@ If you're interested in training with DeepSpeed, refer to the script `finetune_w
7979
}
8080
```
8181

82-
Note that when using `zero-0`/`1`/`2`, the DeepSpeed model is saved in a format that requires a script to convert `mp_rank_00_model_states.pt` to `model.pt`, such as `python transcribe_deepspeed_to_pt.py mp_rank_00_model_states.pt output_dir`.
83-
84-
```
85-
global_step1000
86-
global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
87-
...
88-
global_step1000/mp_rank_00_model_states.pt
89-
latest
90-
zero_to_fp32.py
91-
```
92-
93-
If training with `Zero-3`, the model is saved in a different format and can be converted using `python zero_to_fp32.py global_step50 outputdir`.
94-
95-
```
96-
global_step50
97-
global_step50/zero_pp_rank_0_mp_rank_00_model_states.pt
98-
global_step50/zero_pp_rank_0_mp_rank_00_optim_states.pt
99-
...
100-
latest
101-
zero_to_fp32.py
102-
```
82+
Note that when using `zero-0`/`1`/`2`/`3`, the DeepSpeed model is saved as `pytorch_model.bin`, and you should change "++ckpt_path=$ckpt_path/model.pt" to " ++ckpt_path=$ckpt_path/pytorch_model.bin" in the script to use the model during inference.
10383
If you use bf16/fp16 training in DeepSpeed and encounter NaN in train/eval loss, check the autocast in `src/slam_llm/utils/deepspeed_utils.py`:
10484

10585
```python
@@ -116,4 +96,4 @@ You can refer to the paper for more results.
11696
journal={arXiv preprint arXiv:2402.08846},
11797
year={2024}
11898
}
119-
```
99+
```

0 commit comments

Comments
 (0)