Pull upstream into `saforem2/fix-formatting` by saforem2 · Pull Request #90 · argonne-lcf/Megatron-DeepSpeed

saforem2 · 2025-07-16T14:03:40Z

Pull upstream commits into saforem2/fix-formatting branch:

argonne-lcf/Megatron-DeepSpeed @ saforem2/fix-formatting ← deepspeedai/Megatron-DeepSpeed @ main

Copilot Summary

This pull request primarily updates repository links across multiple files and introduces enhancements to the fine-tuning configuration for Hugging Face LLAMA models. The changes ensure consistency in repository references and improve usability for fine-tuning workflows.

Repository Link Updates:

README.md: Updated links from microsoft to deepspeedai for the rebase folder and backup branch references.
examples_deepspeed/bert_with_pile/README.md: Modified links to point to deepspeedai for BERT examples and Megatron-LM arguments. [1] [2]
examples_deepspeed/rebase/README.md: Adjusted repository links for scripts and contributions related to activation checkpointing and performance comparisons. [1] [2] [3]
examples_deepspeed/deepspeed4science/megatron_long_seq_support/README.md: Changed clone commands to use deepspeedai repositories.

Fine-Tuning Enhancements:

examples_deepspeed/finetune_hf_llama/README.md: Updated fine-tuning commands to include convert_hf2mds and convert_mds2hf options for converting model formats.
examples_deepspeed/finetune_hf_llama/ds_config.json: Added BF16 support and Zero Optimization stage configuration for improved training efficiency.
examples_deepspeed/finetune_hf_llama/ds_config_empty.json: Introduced a minimal configuration file for specific conversion tasks.
examples_deepspeed/finetune_hf_llama/finetune_llama.sh: Enhanced script flexibility by dynamically selecting configuration files based on task type and adding fine-tuning arguments. [1] [2] [3]

* pass batch_dim_idx to deepspeed sequence parallel distributed attention for supporting batch size larger than 1 * add FPDT support; add Ulysses rotary position embedding support * add FPDT support; add Ulysses rotary position embedding support * add FPDT support; add Ulysses rotary position embedding support * add FPDT support; add Ulysses rotary position embedding support * remove unnecessary files * set the warmup length to be FPDT chunk size if enabled --------- Co-authored-by: Jinghan Yao <yjhmitweb@ascend-rw02.ten.osc.edu> Co-authored-by: Jinghan Yao <yjhmitweb@ascend-rw01.ten.osc.edu>

* [tools]GQA convert support * fix readme

Previously, `deepspeed_to_megatron.py` would raise an import error due to the relative import. This commit fixes this issue by changing from the relative import to the absolute import like in `deepspeed_to_transformers.py`.

Signed-off-by: Logan Adams <loadams@microsoft.com>

…run successfully with DeepSpeed (#468) Signed-off-by: yisheng <yi.sheng@intel.com>

Signed-off-by: yisheng <yi.sheng@intel.com>

Signed-off-by: Schwidola0607 <khoadangpham82944@gmail.com>

…nabled (#479) * pass batch_dim_idx to deepspeed sequence parallel distributed attention for supporting batch size larger than 1 Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * add fused_rms_norm support on XPU device (#431) Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * [LLaMa] Adding support converting checkpoint from mds to hf (#432) * add support converting checkpoint from hf to mds * Fix PP issue * update Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * add device check when import ipex (#436) Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * fix TFLOPs calculation (#371) * fix TFLOPs calculation when GQA used, we observe right TFLOPs after this fix. when GQA is not used, huge difference in TFLOPs is solved with selective recompute . some other minor difference will also be observed as logits macs also added. * add copyrights Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * fix nan issue when running megatron-deepspeed (#434) Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * enable empty cache on XPU device (#438) Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * [wandb] disable wandb more gracefully (#422) Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * [Bug] Fix crash when logging optimizer state to tb (#417) Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * add FPDT support; add Ulysses rotary position embedding support Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * add FPDT support; add Ulysses rotary position embedding support Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * add FPDT support; add Ulysses rotary position embedding support Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * add FPDT support; add Ulysses rotary position embedding support Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * remove unnecessary files Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * set the warmup length to be FPDT chunk size if enabled Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * Enable Sequence Parallelism (#429) Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * grad_wei can't be NoneType when running with DeepSpeed, for zero3 will divided the gradient (#428) Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * fix init issue for rms_norm in squence_parallel (#448) Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * enable profiler for specific ranks (#451) Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * fix init issue for silently ignoring the deepspeed config (#452) Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * fix moe tflops (#445) Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * [tool]GQA convert support (#454) * [tools]GQA convert support * fix readme Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * Fix import error in `deepspeed_to_megatron.py` (#455) Previously, `deepspeed_to_megatron.py` would raise an import error due to the relative import. This commit fixes this issue by changing from the relative import to the absolute import like in `deepspeed_to_transformers.py`. Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * Update references to new GitHub org (deepspeedai) (#462) Signed-off-by: Logan Adams <loadams@microsoft.com> Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * add sequence_parallel in layernorm init to enable 3D parallelism can run successfully with DeepSpeed (#468) Signed-off-by: yisheng <yi.sheng@intel.com> Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> * fix bug when FPDT is disabled but with original Ulysses Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> Signed-off-by: jinghan yao yjhmitweb@gmail.com Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> --------- Signed-off-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu> Signed-off-by: Logan Adams <loadams@microsoft.com> Signed-off-by: yisheng <yi.sheng@intel.com> Signed-off-by: jinghan yao yjhmitweb@gmail.com Co-authored-by: Jinghan Yao <yjhmitweb@ascend-rw02.ten.osc.edu> Co-authored-by: YiSheng5 <syhm@mail.ustc.edu.cn> Co-authored-by: billishyahao <yahao.he@gmail.com> Co-authored-by: Polisetty V R K Jyothendra Varma <jvarma@habana.ai> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Jinghan Yao <yjhmitweb@ascend-rw01.ten.osc.edu> Co-authored-by: ranzhejiang <zhejiang.ran@intel.com> Co-authored-by: Xinyu Lian <lian7@illinois.edu> Co-authored-by: inkcherry <mingzhi.liu@intel.com> Co-authored-by: hotsuyuki <hotsuyuki.kawanishi@gmail.com> Co-authored-by: Jinghan Yao <yjhmitweb@cardinal-rw02.ten.osc.edu>

YJHMITWEB and others added 8 commits December 4, 2024 17:34

[tool]GQA convert support (#454)

c3df187

* [tools]GQA convert support * fix readme

Fix import error in deepspeed_to_megatron.py (#455)

f4157be

Previously, `deepspeed_to_megatron.py` would raise an import error due to the relative import. This commit fixes this issue by changing from the relative import to the absolute import like in `deepspeed_to_transformers.py`.

Update references to new GitHub org (deepspeedai) (#462)

3e1da1f

Signed-off-by: Logan Adams <loadams@microsoft.com>

add sequence_parallel in layernorm init to enable 3D parallelism can …

4efb479

…run successfully with DeepSpeed (#468) Signed-off-by: yisheng <yi.sheng@intel.com>

fix the error issue for q/k/v stride is not match (#469)

8860868

Signed-off-by: yisheng <yi.sheng@intel.com>

add instruction document for huggingface UCP (#477)

1d71682

Signed-off-by: Schwidola0607 <khoadangpham82944@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull upstream into `saforem2/fix-formatting`#90

Pull upstream into `saforem2/fix-formatting`#90
saforem2 wants to merge 8 commits intoargonne-lcf:saforem2/fix-formattingfrom
deepspeedai:main

saforem2 commented Jul 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

saforem2 commented Jul 16, 2025

Copilot Summary

Repository Link Updates:

Fine-Tuning Enhancements:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants