-
Notifications
You must be signed in to change notification settings - Fork 14.8k
AMDGPU: Handle V->A MFMA copy from case with immediate src2 #153023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: users/arsenm/amdgpu/handle-mfma-copy-from-agpr
Are you sure you want to change the base?
AMDGPU: Handle V->A MFMA copy from case with immediate src2 #153023
Conversation
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
@llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) ChangesHandle a special case for copies from AGPR VGPR on the MFMA inputs. Full diff: https://github.com/llvm/llvm-project/pull/153023.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
index b71c70db5e6b3..4e0d64a20690e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
@@ -375,13 +375,14 @@ bool AMDGPURewriteAGPRCopyMFMAImpl::tryFoldCopiesFromAGPR(
Register CopyDstReg = UseMI.getOperand(0).getReg();
if (!CopyDstReg.isVirtual())
continue;
+ for (MachineOperand &CopyUseMO : MRI.reg_nodbg_operands(CopyDstReg)) {
+ if (!CopyUseMO.readsReg())
+ continue;
- for (MachineInstr &CopyUseMI : MRI.use_instructions(CopyDstReg)) {
+ MachineInstr &CopyUseMI = *CopyUseMO.getParent();
if (isRewriteCandidate(CopyUseMI)) {
- const MachineOperand *Op =
- CopyUseMI.findRegisterUseOperand(CopyDstReg, /*TRI=*/nullptr);
- if (tryReassigningMFMAChain(CopyUseMI, Op->getOperandNo(),
- VRM.getPhys(Op->getReg())))
+ if (tryReassigningMFMAChain(CopyUseMI, CopyUseMO.getOperandNo(),
+ VRM.getPhys(CopyUseMO.getReg())))
MadeChange = true;
}
}
diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir
index 632401b6128c5..17a72110767bb 100644
--- a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir
@@ -187,8 +187,8 @@ body: |
; CHECK-NEXT: [[COPY1:%[0-9]+]]:av_64_align2 = COPY $vgpr0_vgpr1
; CHECK-NEXT: [[COPY2:%[0-9]+]]:av_64_align2 = COPY $vgpr2_vgpr3
; CHECK-NEXT: [[GLOBAL_LOAD_DWORDX4_:%[0-9]+]]:areg_128_align2 = GLOBAL_LOAD_DWORDX4 [[COPY]], 0, 0, implicit $exec :: (load (s128), addrspace 1)
- ; CHECK-NEXT: [[COPY3:%[0-9]+]]:vreg_128_align2 = COPY [[GLOBAL_LOAD_DWORDX4_]]
- ; CHECK-NEXT: [[COPY3:%[0-9]+]].sub0_sub1:vreg_128_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 [[COPY1]], [[COPY2]], 0, 0, 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:areg_128_align2 = COPY [[GLOBAL_LOAD_DWORDX4_]]
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]].sub0_sub1:areg_128_align2 = V_MFMA_F64_4X4X4F64_e64 [[COPY1]], [[COPY2]], 0, 0, 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: GLOBAL_STORE_DWORDX4 [[COPY]], [[COPY3]], 0, 0, implicit $exec :: (store (s128), addrspace 1)
; CHECK-NEXT: SI_RETURN
%0:vreg_64_align2 = COPY $vgpr4_vgpr5
|
431cefa
to
5d234cc
Compare
ed69c54
to
3bd59fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
3bd59fe
to
c73ac5e
Compare
5d234cc
to
002114a
Compare
c73ac5e
to
90d2381
Compare
002114a
to
87bc565
Compare
90d2381
to
22d2495
Compare
87bc565
to
8a87d16
Compare
22d2495
to
8735fbf
Compare
2a2778f
to
f2932c5
Compare
8735fbf
to
db5f240
Compare
Handle a special case for copies from AGPR VGPR on the MFMA inputs. If the "input" is really a subregister def, we will not see the usual copy to VGPR for src2, only the read of the subregister def. Not sure if this pattern appears in practice.
f2932c5
to
5d8dc9b
Compare
db5f240
to
579e971
Compare
Handle a special case for copies from AGPR VGPR on the MFMA inputs.
If the "input" is really a subregister def, we will not see the
usual copy to VGPR for src2, only the read of the subregister def.
Not sure if this pattern appears in practice.