Send an alarm email when your GPUs are idle for a long time. May be useful if you rent GPUs from cloud services to train AI models.
- Install dependencies with
pip install click nvidia-ml-py pyyamlorpip install -r requirements.txt. - Prepare an email account, generate an authorization token and fill these values in
smtp.yaml. - Run
python alarm.py. For more configurations, runpython alarm.py --help. - By default, the program will check your GPU utilization rates every 10 seconds and send an alarm email if some of the rates are below 20% for more than 30 minutes.
The program only supports tracking utilization rates of NVIDIA GPUs.