You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. 

error:

You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`

----

I am running on a newly setup H100 and all seems to be working as the miner downloads the shards, but I am seeing this error in  the logs.

I am getting loss of 7-8 .