Skip to content

[WIP] Async checkpointing support#12

Draft
zhenghh04 wants to merge 2 commits intomainfrom
checkpoints
Draft

[WIP] Async checkpointing support#12
zhenghh04 wants to merge 2 commits intomainfrom
checkpoints

Conversation

@zhenghh04
Copy link
Member

@zhenghh04 zhenghh04 commented May 10, 2024

This PR is to add support for asynchronous checkpointing support in Megatron-DeepSpeed through multiprocessing

This is still in working progress.

--num-checkpoint-workers Specify the number of background threads to perform checkpointing. Perform Sync checkpointing if zero.

saforem2 added a commit that referenced this pull request Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant