Skip to content

Conversation

thomasywang
Copy link
Contributor

Summary: Use the algorithm from D82697968

Differential Revision: D82831565

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 19, 2025
@facebook-github-bot
Copy link
Contributor

@thomasywang has exported this pull request. If you are a Meta employee, you can view the originating diff in D82831565.

thomasywang added a commit to thomasywang/monarch-1 that referenced this pull request Sep 19, 2025
Summary:

Use the algorithm from D82697968

Differential Revision: D82831565
@facebook-github-bot
Copy link
Contributor

@thomasywang has exported this pull request. If you are a Meta employee, you can view the originating diff in D82831565.

Summary:

ActorMesh's shape might have large extents on some dimensions. Those dimensions would cause large fanout in our comm actor
implementation. To avoid that, we reshape it by increasing dimensionality and limiting the extent of each dimension. Note: the reshape is only visibility to the internal algorithom. Theshape that user sees maintains intact.

For example, a typical shape is [hosts=1024, gpus=8]. By using limit 8, it becomes [8, 8, 8, 2, 8] during casting. In other words, it adds 3 extra layers to the comm actor tree, while keeping the fanout in each layer at 8 or smaller.

The limit for cast fanouts will be configured by the key `CASTING_FANOUT_SIZE` which is currently set to 0 as default disabling the feature.

Differential Revision: D82320948
Summary:

When we fold a rectangle along a dimension with a step size of 1 we can factor into 3 components: a start (blue), middle(green), and end (red) that can be unioned together.

[25] -> [5 x 5]

 {F1982079295} 

For step sizes larger than 1 the middle sections must also be factored into multiple components until they cycle

 {F1982079301} 

Turns out logic from 1D to 2D is the same from ND to N+1D.

All other cases are trivial

Speedups have large variance but compression is consistently several orders of magnitude. Many times when running the benchmark we get a stack overflow from the old implementation (we know this because we never stack overflow when only running the new)
```
Starting reshape_selection fuzzing with parameters:
  Max dimensions: 3
  Max dimension size: 128
  Fanout limit: 8
  Iterations: 100

=== BENCHMARK AVERAGES (over 100 iterations) ===
  Average new approach (reshape_selection): 1.489µs
  Average new approach eval: 83.473µs
  Average new selection string length: 158.8
  Average old approach (Selection::of_ranks): 118.151µs
  Average old approach eval: 26.252816ms
  Average old selection string length: 31258.3
  Average total new time (Reshape + Eval): 84.962µs
  Average total old time (Reshape + Eval): 26.370967ms
  Average reshape speedup (reshape_selection vs Selection::of_ranks): 79.35x
  Average eval speedup (new eval vs old eval): 314.51x
  Average total speedup (Reshape + Eval): 310.39x
  Average compression: 196.84x
```
```
Starting reshape_selection fuzzing with parameters:
  Max dimensions: 3
  Max dimension size: 128
  Fanout limit: 8
  Iterations: 100000

=== NEW APPROACH BENCHMARK AVERAGES (over 100000 iterations) ===
  Average reshape_selection time: 1.27µs
  Average eval time: 82.275µs
  Average selection string length: 175.8
  Average total time (Reshape + Eval): 83.545µs
```
===============================================
*Sophons are proton-sized supercomputers created by the alien Trisolarans by "unfolding" protons from their normal eleven-dimensional state into lower dimensions, etching circuits onto the larger, two-dimensional form, and then refolding them into their original eleven dimensions*

Reviewed By: shayne-fletcher

Differential Revision: D82697968
Summary:

Use the algorithm from D82697968

Differential Revision: D82831565
@facebook-github-bot
Copy link
Contributor

@thomasywang has exported this pull request. If you are a Meta employee, you can view the originating diff in D82831565.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants