-
Notifications
You must be signed in to change notification settings - Fork 99
Open
Description
System Info
1 NVIDIA Ada Lovelace L40S GPU
Information
- The official example scripts
- My own modified scripts
🐛 Describe the bug
Hi! Thank you for creating the AISPEECH_ASR recipe.
I compiled separate multitask.jsonl files for my training and validation data and have tried to run scripts/finetune_torchrun.sh. I modified the paths (train_scp_file_path and dev_scp_file_path) to the directories containing the multitask.jsonl files in the .sh file:
train_scp_file_path=/multitask/train/
dev_scp_file_path=/multitask/dev/
...
++dataset_config.train_scp_file_path=$train_scp_file_path \
++dataset_config.dev_scp_file_path=$dev_scp_file_path \
The training progresses but will fail when reaching the validation stage, giving a "FileNotFoundError" error despite the validation multitask.jsonl file existing.
However, it runs once I replace "dev_scp_file_path" to "test_scp_file_path" in the .sh file
Error logs
File "/SLAM-LLM/src/slam_llm/utils/train_utils.py", line 186, in train
[rank0]: eval_ppl, eval_epoch_loss, *rest = evaluation(model, train_config, eval_dataloader, local_rank, tokenizer)
[rank0]: File "/SLAM-LLM/src/slam_llm/utils/train_utils.py", line 424, in evaluation
[rank0]: for step, batch in enumerate(eval_dataloader):
[rank0]: File "/home/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 701, in next
[rank0]: data = self._next_data()
[rank0]: File "/home/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1465, in _next_data
[rank0]: return self._process_data(data)
[rank0]: File "/home/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1491, in _process_data
[rank0]: data.reraise()
[rank0]: File "/home/.local/lib/python3.10/site-packages/torch/_utils.py", line 715, in reraise
[rank0]: raise exception
[rank0]: FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
[rank0]: Original Traceback (most recent call last):
[rank0]: File "/home/.local/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 351, in _worker_loop
[rank0]: data = fetcher.fetch(index) # type: ignore[possibly-undefined]
[rank0]: File "/home/.local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 42, in fetch
[rank0]: data = next(self.dataset_iter)
[rank0]: File "/SLAM-LLM/src/slam_llm/datasets/speech_dataset_large.py", line 245, in iter
[rank0]: for elem in self.dp:
[rank0]: File "/SLAM-LLM/src/slam_llm/datasets/speech_dataset_large.py", line 84, in iter
[rank0]: with open(multitask_task_path) as f_task:
[rank0]: FileNotFoundError: [Errno 2] No such file or directory: ''
Expected behavior
The script should load the validation data from the validation file path (dev_scp_file_path) when running validation during training.
Metadata
Metadata
Assignees
Labels
No labels