-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
When I started a job using clusterduck with Slurm and an error is raised, I only see the following stack trace:
Error executing job with overrides: ['seed=0', '+experiment/deformable_plate=ltsgns_mesh_eval', '+platform=kluster_1_gpu']
submitit ERROR (2023-11-10 13:02:46,649) - Submitted job triggered an exception
Traceback (most recent call last):
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/site-packages/submitit/core/_submit.py", line 11, in <module>
submitit_main()
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/site-packages/submitit/core/submission.py", line 76, in submitit_main
process_job(args.folder)
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/site-packages/submitit/core/submission.py", line 69, in process_job
raise error
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/site-packages/submitit/core/submission.py", line 55, in process_job
result = delayed.result()
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/site-packages/submitit/core/utils.py", line 133, in result
self._result = self.function(*self.args, **self.kwargs)
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/site-packages/hydra_plugins/clusterduck_launcher/clusterduck_launcher.py", line 116, in run_workers
exceptions = [
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/site-packages/hydra_plugins/clusterduck_launcher/clusterduck_launcher.py", line 117, in <listcomp>
result.return_value
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
IndexError: too many indices for tensor of dimension 5
srun: error: node2: task 0: Exited with exit code 1
I would like to see the stacktrace of my code, where in this example the IndexError was raised in order to find out the problem.
Is that possible?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels