Create a Notebook that demonstrates overfitting on a robotics control task, to manually/visually verify the masking/sequencing logic.

When you run a training loop with the current cycle of dataloaders, the model learns to just predict "forward" for every action in FourRooms, regardless of what the robot sees. I assume it's just that "forward" comes up so often that it gets a big reward by always predicting that, and maybe it would work with enough training and the right hyperparameters and optimizer/scheduler. But it would be nice to have a demo to verify.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create a Notebook that demonstrates overfitting on a robotics control task, to manually/visually verify the masking/sequencing logic. #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Create a Notebook that demonstrates overfitting on a robotics control task, to manually/visually verify the masking/sequencing logic. #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions