Typo in POLOAgent

Hey, thanks for the great repo and a cool paper.

There is a possible typo in ```POLOAgent.action_taken``` and the ```Agent.step```.

The call for ```self.polo_buf.update(prev_obs, obs, rew, done)``` does not have action as an input, as defined in the replay buffer class. The done flags are written in the actions column as a result, and the ```done``` flags never get updated.

Cheers.