Hey, thanks for the great repo and a cool paper.
There is a possible typo in POLOAgent.action_taken and the Agent.step.
The call for self.polo_buf.update(prev_obs, obs, rew, done) does not have action as an input, as defined in the replay buffer class. The done flags are written in the actions column as a result, and the done flags never get updated.
Cheers.
Hey, thanks for the great repo and a cool paper.
There is a possible typo in
POLOAgent.action_takenand theAgent.step.The call for
self.polo_buf.update(prev_obs, obs, rew, done)does not have action as an input, as defined in the replay buffer class. The done flags are written in the actions column as a result, and thedoneflags never get updated.Cheers.