error:
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda')
I am running on a newly setup H100 and all seems to be working as the miner downloads the shards, but I am seeing this error in the logs.
I am getting loss of 7-8 .