-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Description
System Info
On a basic tier Google Colab (no GPU or TPU), with :
accelerate 1.12.0
Python 3.12.12
Pytorch 2.9.0+cu126Information
- The official example scripts
- My own modified scripts
Tasks
- One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - My own task or dataset (give details below)
Reproduction
Run inside a google colab jupyter notebook cell :
from accelerate import notebook_launcher
def dummy_training_function():
print("Inside training function")
notebook_launcher(dummy_training_function, num_processes=1)
it raises :
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
/tmp/ipython-input-3351436335.py in <cell line: 0>()
4 print("Inside training function")
5
----> 6 notebook_launcher(dummy_training_function, num_processes=1)
/usr/local/lib/python3.12/dist-packages/accelerate/launchers.py in notebook_launcher(function, args, num_processes, mixed_precision, use_port, master_addr, node_rank, num_nodes, rdzv_backend, rdzv_endpoint, rdzv_conf, rdzv_id, max_restarts, monitor_interval, log_line_prefix_template)
150 print("Launching a training on TPU cores.")
151 xmp.spawn(launcher, args=args, start_method="fork")
--> 152 elif in_colab and get_gpu_info()[1] < 2:
153 # No need for a distributed launch otherwise as it's either CPU or one GPU.
154 if torch.cuda.is_available():
/usr/local/lib/python3.12/dist-packages/accelerate/utils/environment.py in get_gpu_info()
171 """
172 # Returns as list of `n` GPUs and their names
--> 173 output = subprocess.check_output(
174 [_nvidia_smi(), "--query-gpu=count,name", "--format=csv,noheader"], universal_newlines=True
175 )
/usr/lib/python3.12/subprocess.py in check_output(timeout, *popenargs, **kwargs)
464 kwargs['input'] = empty
465
--> 466 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
...
-> 1955 raise child_exception_type(errno_num, err_msg, err_filename)
1956 else:
1957 raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'nvidia-smi'
Expected behavior
in colab, there is no nvidia_smi for CPU/TPU instances, so accelerate should not rely on nvidia-smi to detect GPU.
Metadata
Metadata
Assignees
Labels
No labels