Hello!
I am using this logic in my codebase, inspired from this
self.agent_inference = type(self.policy)(**inference_kwargs)
self.agent_inference_p = from_module(self.policy).data
self.agent_inference_p.to_module(self.agent_inference)
and while this works well for all policy classes in my project, if there is an LSTM inside the policy, then it stops working.
I have verified this is the only reason why the LSTM policies don't work, because when I comment out these 3 lines and use self.policy instead of self.agent_inference, the LSTM-equipped agents actually learn well.
What is going on?