[Sunmmio] Support Layout Inference for Fill and Reduce Operators#101
[Sunmmio] Support Layout Inference for Fill and Reduce Operators#101weizzh wants to merge 5 commits intoSUNMMIO:tilelang_mesh_mainfrom
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 09bee55503
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const auto make_zz = | ||
| ffi::Function::GetGlobal("tl.layout.make_blockwise_zz_layout"); | ||
| auto acc_layout = Downcast<Layout>((*make_zz)(acc)); | ||
| local_layout_map.Set(acc, acc_layout); |
There was a problem hiding this comment.
Avoid applying ZZ layout to 1D accumulator buffers
This unconditionally assigns acc a blockwise ZZ layout, but tl.layout.make_blockwise_zz_layout is 2D-only, so rank-1 Sunmmio reductions produce a 1D acc and fail during lowering when this call is executed. The regression is user-visible for valid 1D reduce kernels (e.g., shape (N,), dim=0) that previously lowered without this extra layout annotation step. Please branch on acc->shape.size() and use linear layout for 1D buffers (consistent with the new Fill/Reduce inference logic).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
you are right, in fact, there is also this problem in fill.cc
c58695b to
829b300
Compare
Previously, the Sunmmio layout inference enforced a strict shape constraint where only tensors with trailing dimensions perfectly divisible by 32 could utilize the hardware-optimized `blockwise_zz` layout. Irregular shapes were forced to fallback to `linear` layout, degrading performance. This commit significantly relaxes the validation rules: - C++ `CanUseBlockwiseZZ` (`fill.cc` & `reduce.cc`): Now permits all multi-dimensional shapes (e.g., `[2, 64]`, `[1, 4, 64]`) to leverage `blockwise_zz` layouts. It strictly filters out only 1D vectors and degenerated 1D shapes (e.g., `[1, 64]`, `[1, 1, 128]`), which structurally require standard linear layouts.
Summary :
This PR implements hardware-specific layout inference for fill and reduce operators on the Sunmmio target. It ensures that all buffers involved in these operations (including intermediate ones) are assigned correct hardware-optimized layouts in the shared.rsram scope.
Key Changes :
Infrastructure & Metadata :
Operator Implementations :