Skip to content

Graceful exit from hpc #12

@m-bossart

Description

@m-bossart

If a training hits the hpc time limit it is terminated and data is not saved. Need to have some saving of parameters at checkpointing to be able to restart the training.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions