Implement the actor-critic methods

Hello,
In the [asynchronous dqn paper](http://arxiv.org/pdf/1602.01783v1.pdf), they also described an on policy method, the advantage actor-critic (A3C), which achieved better results than others, do you currently have any plan to include this method in this repo as well?
Because I am working off this repo as a starting point, and attempt to reproduce the results of the A3C method on the continuous action domain, but I am still trying to figure out the network model they used in the physical state case when apply to Mojoco, and how the policy gradient is accumulated. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the actor-critic methods #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implement the actor-critic methods #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions