-
Notifications
You must be signed in to change notification settings - Fork 999
Support other NVLink scenarios #218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
fzyzcjy
wants to merge
74
commits into
deepseek-ai:main
Choose a base branch
from
fzyzcjy:feat/deepep_normal_update
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
74 commits
Select commit
Hold shift + click to select a range
be2eed8
more
fzyzcjy 9ecb941
more
fzyzcjy 9683d94
more
fzyzcjy 8cf6bd8
more
fzyzcjy acf108a
more
fzyzcjy 3e2cede
Revert "more"
fzyzcjy 45fa1af
Revert "more"
fzyzcjy 443bfa8
more
fzyzcjy b986cce
more
fzyzcjy 3ea6f58
more
fzyzcjy 5d3513b
more
fzyzcjy bda5695
more
fzyzcjy 3740762
more
fzyzcjy ad4aee8
more
fzyzcjy b5e4aad
more
fzyzcjy 240d058
more
fzyzcjy 5379d59
more
fzyzcjy 4fc8e79
more
fzyzcjy 2e90afe
more
fzyzcjy 3639a57
more
fzyzcjy 4ef8f05
more
fzyzcjy 047656e
more
fzyzcjy c21f36d
more
fzyzcjy 7f3e4c0
more
fzyzcjy 92fb573
more
fzyzcjy 29f86f3
more
fzyzcjy 5557e70
more
fzyzcjy 9fd34e7
more
fzyzcjy 6417393
more
fzyzcjy faaeaad
more
fzyzcjy c38dbed
more
fzyzcjy dc74c0a
more
fzyzcjy 61dea30
more
fzyzcjy 7d4bc93
more
fzyzcjy 5b78f22
more
fzyzcjy 75351cd
more
fzyzcjy 7bb12d4
more
fzyzcjy 0e5a155
more
fzyzcjy 87b3980
more
fzyzcjy 4398b5c
more
fzyzcjy d7e9ce3
more
fzyzcjy 5b83cb8
more
fzyzcjy f024df5
more
fzyzcjy 5a7b2f2
more
fzyzcjy 6052379
more
fzyzcjy befcd27
more
fzyzcjy df598ea
more
fzyzcjy 5b23a8a
more
fzyzcjy 210e499
more
fzyzcjy 379ac24
more
fzyzcjy 43999dc
more
fzyzcjy 7916011
more
fzyzcjy 2f90c2d
Merge branch 'feat/cu_mem_api' into feat/deepep_normal_update
fzyzcjy 0525f8f
more
fzyzcjy 3032ede
Merge branch 'feat/cu_mem_api' into feat/deepep_normal_update
fzyzcjy dc652ea
more
fzyzcjy 151993b
more
fzyzcjy 06169d5
more
fzyzcjy 4b54c98
more
fzyzcjy dec3315
more
fzyzcjy 04f6a5b
more
fzyzcjy 0613b1f
more
fzyzcjy b0ba0ea
Revert "more"
fzyzcjy 01f0f90
more
fzyzcjy b80e0d4
more
fzyzcjy 26130b2
more
fzyzcjy e395621
moew
fzyzcjy 5b7e55a
more
fzyzcjy a8c6df8
temp
fzyzcjy e895366
more
fzyzcjy af060e6
more
fzyzcjy 378f9b2
more
fzyzcjy 0fc2a30
more
fzyzcjy 1b14ad6
more
fzyzcjy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we increase it to 72?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering the use case of it - it seems large scale EP on prefill with 72 gpus does not have benefits iirc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's for training in NVL72.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that looks pretty reasonable! I think it is implementable, but since there are already a lot of PRs pending waiting for LyricZhao to have time to review and merge, I may continue this PR a bit later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong. NVL72 is 18 nodes of 4-GPU, so the intra-node nvlink peer number is no more than 4, while the inter-node nvshmem can itself find cross-node nvlink. Why do we need extend the intra-node nvlink peer to 24 or larger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vinjn Thanks for the reply. Wondering without changes here , how did sglang run DeepEP with EP48 on nvlink-only NVL72 ?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For SGLang Decoding, we can get performance gain with large EP size, such as EP48. It uses low latency dispatch/combine, which already support NVL72 for any EP size.
For SGLang Prefill, it uses intranode/internode dispatch/combine, which is the kernels we are talking about.
Without the pr-218, intranode dispatch/combine cannot support EP size larger than 4. Internode dispatch/combine supports any EP size, but it uses two hops transition, so it is not the best solution for NVL72.
With the pr-218, intranode dispatch/combine can expand to EP24.