Skip to content

AMDGPU: Handle V->A MFMA copy from case with immediate src2 #153023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: users/arsenm/amdgpu/handle-mfma-copy-from-agpr
Choose a base branch
from

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Aug 11, 2025

Handle a special case for copies from AGPR VGPR on the MFMA inputs.
If the "input" is really a subregister def, we will not see the
usual copy to VGPR for src2, only the read of the subregister def.
Not sure if this pattern appears in practice.

Copy link
Contributor Author

arsenm commented Aug 11, 2025

@llvmbot
Copy link
Member

llvmbot commented Aug 11, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

Handle a special case for copies from AGPR VGPR on the MFMA inputs.
If the "input" is really a subregister def, we will not see the
usual copy to VGPR for src2, only the read of the subregister def.
Not sure if this pattern appears in practice.


Full diff: https://github.com/llvm/llvm-project/pull/153023.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp (+6-5)
  • (modified) llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir (+2-2)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
index b71c70db5e6b3..4e0d64a20690e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
@@ -375,13 +375,14 @@ bool AMDGPURewriteAGPRCopyMFMAImpl::tryFoldCopiesFromAGPR(
     Register CopyDstReg = UseMI.getOperand(0).getReg();
     if (!CopyDstReg.isVirtual())
       continue;
+    for (MachineOperand &CopyUseMO : MRI.reg_nodbg_operands(CopyDstReg)) {
+      if (!CopyUseMO.readsReg())
+        continue;
 
-    for (MachineInstr &CopyUseMI : MRI.use_instructions(CopyDstReg)) {
+      MachineInstr &CopyUseMI = *CopyUseMO.getParent();
       if (isRewriteCandidate(CopyUseMI)) {
-        const MachineOperand *Op =
-            CopyUseMI.findRegisterUseOperand(CopyDstReg, /*TRI=*/nullptr);
-        if (tryReassigningMFMAChain(CopyUseMI, Op->getOperandNo(),
-                                    VRM.getPhys(Op->getReg())))
+        if (tryReassigningMFMAChain(CopyUseMI, CopyUseMO.getOperandNo(),
+                                    VRM.getPhys(CopyUseMO.getReg())))
           MadeChange = true;
       }
     }
diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir
index 632401b6128c5..17a72110767bb 100644
--- a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir
@@ -187,8 +187,8 @@ body:             |
     ; CHECK-NEXT: [[COPY1:%[0-9]+]]:av_64_align2 = COPY $vgpr0_vgpr1
     ; CHECK-NEXT: [[COPY2:%[0-9]+]]:av_64_align2 = COPY $vgpr2_vgpr3
     ; CHECK-NEXT: [[GLOBAL_LOAD_DWORDX4_:%[0-9]+]]:areg_128_align2 = GLOBAL_LOAD_DWORDX4 [[COPY]], 0, 0, implicit $exec :: (load (s128), addrspace 1)
-    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:vreg_128_align2 = COPY [[GLOBAL_LOAD_DWORDX4_]]
-    ; CHECK-NEXT: [[COPY3:%[0-9]+]].sub0_sub1:vreg_128_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 [[COPY1]], [[COPY2]], 0, 0, 0, 0, implicit $mode, implicit $exec
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:areg_128_align2 = COPY [[GLOBAL_LOAD_DWORDX4_]]
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]].sub0_sub1:areg_128_align2 = V_MFMA_F64_4X4X4F64_e64 [[COPY1]], [[COPY2]], 0, 0, 0, 0, implicit $mode, implicit $exec
     ; CHECK-NEXT: GLOBAL_STORE_DWORDX4 [[COPY]], [[COPY3]], 0, 0, implicit $exec :: (store (s128), addrspace 1)
     ; CHECK-NEXT: SI_RETURN
     %0:vreg_64_align2 = COPY $vgpr4_vgpr5

Copy link
Contributor

@perlfu perlfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-subreg-def-read-mfma-copy-from-agpr branch from 3bd59fe to c73ac5e Compare August 18, 2025 15:31
@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-mfma-copy-from-agpr branch from 5d234cc to 002114a Compare August 18, 2025 15:31
@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-subreg-def-read-mfma-copy-from-agpr branch from c73ac5e to 90d2381 Compare August 20, 2025 23:23
@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-mfma-copy-from-agpr branch from 002114a to 87bc565 Compare August 20, 2025 23:23
@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-subreg-def-read-mfma-copy-from-agpr branch from 90d2381 to 22d2495 Compare August 21, 2025 00:11
@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-mfma-copy-from-agpr branch from 87bc565 to 8a87d16 Compare August 21, 2025 00:11
@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-subreg-def-read-mfma-copy-from-agpr branch from 22d2495 to 8735fbf Compare August 21, 2025 00:42
@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-mfma-copy-from-agpr branch 2 times, most recently from 2a2778f to f2932c5 Compare August 21, 2025 01:41
@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-subreg-def-read-mfma-copy-from-agpr branch from 8735fbf to db5f240 Compare August 21, 2025 01:41
Handle a special case for copies from AGPR VGPR on the MFMA inputs.
If the "input" is really a subregister def, we will not see the
usual copy to VGPR for src2, only the read of the subregister def.
Not sure if this pattern appears in practice.
@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-mfma-copy-from-agpr branch from f2932c5 to 5d8dc9b Compare August 21, 2025 13:43
@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-subreg-def-read-mfma-copy-from-agpr branch from db5f240 to 579e971 Compare August 21, 2025 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants