### Suggestion Description Implement one kernel that fuses the all gather and the GEMM computation or fused MLP, collective use-case. ### Operating System _No response_ ### GPU _No response_ ### ROCm Component _No response_