Taking up memory on the primary GPU loading from checkpoint

Hi,
I'm wondering what is taking up memory on the main GPU when I am resuming training from a checkpoint. As a result i cannot train with a larger patch size, which led to memory limitation.
![Screenshot from 2020-01-27 22-50-26](https://user-images.githubusercontent.com/18147024/73234301-7048c980-4157-11ea-91a5-8ce5963caada.png)

I think this is caused by the distributed training system and wonder if there is any way to either avoid the memory cost or distribute it evenly to other GPUs?
Many thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Taking up memory on the primary GPU loading from checkpoint #36

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Taking up memory on the primary GPU loading from checkpoint #36

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions