Skip to content

[AISPEECH_ASR] Finetuning script does not load dev_scp_file_path (validation data) #230

@helloseraphina

Description

@helloseraphina

System Info

1 NVIDIA Ada Lovelace L40S GPU

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

Hi! Thank you for creating the AISPEECH_ASR recipe.

I compiled separate multitask.jsonl files for my training and validation data and have tried to run scripts/finetune_torchrun.sh. I modified the paths (train_scp_file_path and dev_scp_file_path) to the directories containing the multitask.jsonl files in the .sh file:

train_scp_file_path=/multitask/train/
dev_scp_file_path=/multitask/dev/

...
++dataset_config.train_scp_file_path=$train_scp_file_path \
++dataset_config.dev_scp_file_path=$dev_scp_file_path \

The training progresses but will fail when reaching the validation stage, giving a "FileNotFoundError" error despite the validation multitask.jsonl file existing.

However, it runs once I replace "dev_scp_file_path" to "test_scp_file_path" in the .sh file

Error logs

File "/SLAM-LLM/src/slam_llm/utils/train_utils.py", line 186, in train
[rank0]: eval_ppl, eval_epoch_loss, *rest = evaluation(model, train_config, eval_dataloader, local_rank, tokenizer)
[rank0]: File "/SLAM-LLM/src/slam_llm/utils/train_utils.py", line 424, in evaluation
[rank0]: for step, batch in enumerate(eval_dataloader):
[rank0]: File "/home/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 701, in next
[rank0]: data = self._next_data()
[rank0]: File "/home/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1465, in _next_data
[rank0]: return self._process_data(data)
[rank0]: File "/home/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1491, in _process_data
[rank0]: data.reraise()
[rank0]: File "/home/.local/lib/python3.10/site-packages/torch/_utils.py", line 715, in reraise
[rank0]: raise exception
[rank0]: FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
[rank0]: Original Traceback (most recent call last):
[rank0]: File "/home/.local/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 351, in _worker_loop
[rank0]: data = fetcher.fetch(index) # type: ignore[possibly-undefined]
[rank0]: File "/home/.local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 42, in fetch
[rank0]: data = next(self.dataset_iter)
[rank0]: File "/SLAM-LLM/src/slam_llm/datasets/speech_dataset_large.py", line 245, in iter
[rank0]: for elem in self.dp:
[rank0]: File "/SLAM-LLM/src/slam_llm/datasets/speech_dataset_large.py", line 84, in iter
[rank0]: with open(multitask_task_path) as f_task:
[rank0]: FileNotFoundError: [Errno 2] No such file or directory: ''

Expected behavior

The script should load the validation data from the validation file path (dev_scp_file_path) when running validation during training.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions