[AISPEECH_ASR] Finetuning script does not load dev_scp_file_path (validation data)

### System Info

1 NVIDIA Ada Lovelace L40S GPU 

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### 🐛 Describe the bug

Hi! Thank you for creating the AISPEECH_ASR recipe.

I compiled separate multitask.jsonl files for my training and validation data and have tried to run scripts/finetune_torchrun.sh. I modified the paths (train_scp_file_path and dev_scp_file_path) to the directories containing the multitask.jsonl files in the .sh file:

```
train_scp_file_path=/multitask/train/
dev_scp_file_path=/multitask/dev/

...
++dataset_config.train_scp_file_path=$train_scp_file_path \
++dataset_config.dev_scp_file_path=$dev_scp_file_path \
```

The training progresses but will fail when reaching the validation stage, giving a "FileNotFoundError" error despite the validation multitask.jsonl file existing.

However, it runs once I replace "dev_scp_file_path" to "test_scp_file_path" in the .sh file

### Error logs

```
File "/SLAM-LLM/src/slam_llm/utils/train_utils.py", line 186, in train
[rank0]: eval_ppl, eval_epoch_loss, *rest = evaluation(model, train_config, eval_dataloader, local_rank, tokenizer)
[rank0]: File "/SLAM-LLM/src/slam_llm/utils/train_utils.py", line 424, in evaluation
[rank0]: for step, batch in enumerate(eval_dataloader):
[rank0]: File "/home/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 701, in next
[rank0]: data = self._next_data()
[rank0]: File "/home/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1465, in _next_data
[rank0]: return self._process_data(data)
[rank0]: File "/home/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1491, in _process_data
[rank0]: data.reraise()
[rank0]: File "/home/.local/lib/python3.10/site-packages/torch/_utils.py", line 715, in reraise
[rank0]: raise exception
[rank0]: FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
[rank0]: Original Traceback (most recent call last):
[rank0]: File "/home/.local/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 351, in _worker_loop
[rank0]: data = fetcher.fetch(index) # type: ignore[possibly-undefined]
[rank0]: File "/home/.local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 42, in fetch
[rank0]: data = next(self.dataset_iter)
[rank0]: File "/SLAM-LLM/src/slam_llm/datasets/speech_dataset_large.py", line 245, in iter
[rank0]: for elem in self.dp:
[rank0]: File "/SLAM-LLM/src/slam_llm/datasets/speech_dataset_large.py", line 84, in iter
[rank0]: with open(multitask_task_path) as f_task:
[rank0]: FileNotFoundError: [Errno 2] No such file or directory: ''
```

### Expected behavior

The script should load the validation data from the validation file path (dev_scp_file_path) when running validation during training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AISPEECH_ASR] Finetuning script does not load dev_scp_file_path (validation data) #230

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[AISPEECH_ASR] Finetuning script does not load dev_scp_file_path (validation data) #230

Description

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions