-
Notifications
You must be signed in to change notification settings - Fork 73
Use compact reshaping in cast #1281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
thomasywang
wants to merge
3
commits into
meta-pytorch:main
Choose a base branch
from
thomasywang:export-D82831565
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+1,276
−6
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@thomasywang has exported this pull request. If you are a Meta employee, you can view the originating diff in D82831565. |
thomasywang
added a commit
to thomasywang/monarch-1
that referenced
this pull request
Sep 19, 2025
Summary: Use the algorithm from D82697968 Differential Revision: D82831565
785bfb4
to
e4fdee4
Compare
@thomasywang has exported this pull request. If you are a Meta employee, you can view the originating diff in D82831565. |
Summary: ActorMesh's shape might have large extents on some dimensions. Those dimensions would cause large fanout in our comm actor implementation. To avoid that, we reshape it by increasing dimensionality and limiting the extent of each dimension. Note: the reshape is only visibility to the internal algorithom. Theshape that user sees maintains intact. For example, a typical shape is [hosts=1024, gpus=8]. By using limit 8, it becomes [8, 8, 8, 2, 8] during casting. In other words, it adds 3 extra layers to the comm actor tree, while keeping the fanout in each layer at 8 or smaller. The limit for cast fanouts will be configured by the key `CASTING_FANOUT_SIZE` which is currently set to 0 as default disabling the feature. Differential Revision: D82320948
Summary: When we fold a rectangle along a dimension with a step size of 1 we can factor into 3 components: a start (blue), middle(green), and end (red) that can be unioned together. [25] -> [5 x 5] {F1982079295} For step sizes larger than 1 the middle sections must also be factored into multiple components until they cycle {F1982079301} Turns out logic from 1D to 2D is the same from ND to N+1D. All other cases are trivial Speedups have large variance but compression is consistently several orders of magnitude. Many times when running the benchmark we get a stack overflow from the old implementation (we know this because we never stack overflow when only running the new) ``` Starting reshape_selection fuzzing with parameters: Max dimensions: 3 Max dimension size: 128 Fanout limit: 8 Iterations: 100 === BENCHMARK AVERAGES (over 100 iterations) === Average new approach (reshape_selection): 1.489µs Average new approach eval: 83.473µs Average new selection string length: 158.8 Average old approach (Selection::of_ranks): 118.151µs Average old approach eval: 26.252816ms Average old selection string length: 31258.3 Average total new time (Reshape + Eval): 84.962µs Average total old time (Reshape + Eval): 26.370967ms Average reshape speedup (reshape_selection vs Selection::of_ranks): 79.35x Average eval speedup (new eval vs old eval): 314.51x Average total speedup (Reshape + Eval): 310.39x Average compression: 196.84x ``` ``` Starting reshape_selection fuzzing with parameters: Max dimensions: 3 Max dimension size: 128 Fanout limit: 8 Iterations: 100000 === NEW APPROACH BENCHMARK AVERAGES (over 100000 iterations) === Average reshape_selection time: 1.27µs Average eval time: 82.275µs Average selection string length: 175.8 Average total time (Reshape + Eval): 83.545µs ``` =============================================== *Sophons are proton-sized supercomputers created by the alien Trisolarans by "unfolding" protons from their normal eleven-dimensional state into lower dimensions, etching circuits onto the larger, two-dimensional form, and then refolding them into their original eleven dimensions* Reviewed By: shayne-fletcher Differential Revision: D82697968
Summary: Use the algorithm from D82697968 Differential Revision: D82831565
e4fdee4
to
122d463
Compare
@thomasywang has exported this pull request. If you are a Meta employee, you can view the originating diff in D82831565. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary: Use the algorithm from D82697968
Differential Revision: D82831565