Skip to content

The implementation seems to be different from the method in the paper #70

@zmccmzty

Description

@zmccmzty

It looks like the policy and the discriminator are trained together at the same rate with single optimizer and combined loss (https://github.com/nv-tlabs/ASE/blob/21257078f0c6bf75ee4f02626260d7cf2c48fee0/ase/learning/ase_agent.py#L280C1-L280C1). It seems to be different from the pseudocode in the paper, where they were trained separately. Any idea about what's the reason for this? Or am I missing something?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions