Skip to content

AMDGPU: Add some baseline test for mfma rewrite with subregister copies #153018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Aug 11, 2025

Currently only cases rooted at a full copy of an MFMA result are handled.
Prepare to relax that by testing more intricate subregister usage.

Currently only full copies are handled, add some tests to help work
towards handling subregisters.

@arsenm arsenm marked this pull request as ready for review August 11, 2025 14:38
@llvmbot
Copy link
Member

llvmbot commented Aug 11, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

Currently only cases rooted at a full copy of an MFMA result are handled.
Prepare to relax that by testing more intricate subregister usage.

Currently only full copies are handled, add some tests to help work
towards handling subregisters.


Patch is 40.26 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/153018.diff

3 Files Affected:

  • (added) llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-insert-extract.mir (+218)
  • (added) llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-src2-chain.mir (+327)
  • (modified) llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll (+69)
diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-insert-extract.mir b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-insert-extract.mir
new file mode 100644
index 0000000000000..73784fb405ce5
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-insert-extract.mir
@@ -0,0 +1,218 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -run-pass=greedy,amdgpu-rewrite-agpr-copy-mfma -o - %s | FileCheck %s
+
+# V-to-A copy is a subregister insert
+---
+name:  test_rewrite_mfma_copy_subreg_insert
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+
+    ; CHECK-LABEL: name: test_rewrite_mfma_copy_subreg_insert
+    ; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vreg_64_align2 = COPY $vgpr4_vgpr5
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:av_64_align2 = COPY $vgpr0_vgpr1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:av_64_align2 = COPY $vgpr2_vgpr3
+    ; CHECK-NEXT: [[GLOBAL_LOAD_DWORDX2_:%[0-9]+]]:vreg_64_align2 = GLOBAL_LOAD_DWORDX2 [[COPY]], 0, 0, implicit $exec :: (load (s64), addrspace 1)
+    ; CHECK-NEXT: [[V_MFMA_F64_4X4X4F64_vgprcd_e64_:%[0-9]+]]:vreg_64_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 [[COPY1]], [[COPY2]], [[GLOBAL_LOAD_DWORDX2_]], 0, 0, 0, implicit $mode, implicit $exec
+    ; CHECK-NEXT: undef [[COPY3:%[0-9]+]].sub0_sub1:areg_128_align2 = COPY [[V_MFMA_F64_4X4X4F64_vgprcd_e64_]]
+    ; CHECK-NEXT: INLINEASM &"; use $0", 1 /* sideeffect attdialect */, 6291465 /* reguse:AReg_128_Align2 */, [[COPY3]]
+    ; CHECK-NEXT: GLOBAL_STORE_DWORDX4 [[COPY]], [[COPY3]], 0, 0, implicit $exec :: (store (s128), addrspace 1)
+    ; CHECK-NEXT: GLOBAL_STORE_DWORDX2 [[COPY]], [[COPY3]].sub2_sub3, 0, 0, implicit $exec :: (store (s128), addrspace 1)
+    ; CHECK-NEXT: SI_RETURN
+    %0:vreg_64_align2 = COPY $vgpr4_vgpr5
+    %1:av_64_align2 = COPY $vgpr0_vgpr1
+    %2:av_64_align2 = COPY $vgpr2_vgpr3
+    %3:vreg_64_align2 = GLOBAL_LOAD_DWORDX2 %0, 0, 0, implicit $exec :: (load (s64), addrspace 1)
+    %4:vreg_64_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 %1, %2, %3, 0, 0, 0, implicit $mode, implicit $exec
+    undef %5.sub0_sub1:areg_128_align2 = COPY %4
+    INLINEASM &"; use $0", 1 /* sideeffect attdialect */, 6291465 /* reguse:AReg_128_Align2 */, %5
+    GLOBAL_STORE_DWORDX4 %0, %5, 0, 0, implicit $exec :: (store (s128), addrspace 1)
+    GLOBAL_STORE_DWORDX2 %0, %5.sub2_sub3, 0, 0, implicit $exec :: (store (s128), addrspace 1)
+    SI_RETURN
+...
+
+# V-to-A copy is a subregister extract
+---
+name:  test_rewrite_mfma_copy_subreg_extract
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+
+    ; CHECK-LABEL: name: test_rewrite_mfma_copy_subreg_extract
+    ; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vreg_64_align2 = COPY $vgpr4_vgpr5
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:av_64_align2 = COPY $vgpr0_vgpr1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:av_64_align2 = COPY $vgpr2_vgpr3
+    ; CHECK-NEXT: [[GLOBAL_LOAD_DWORDX2_:%[0-9]+]]:vreg_64_align2 = GLOBAL_LOAD_DWORDX2 [[COPY]], 0, 0, implicit $exec :: (load (s64), addrspace 1)
+    ; CHECK-NEXT: [[V_MFMA_F64_4X4X4F64_vgprcd_e64_:%[0-9]+]]:vreg_64_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 [[COPY1]], [[COPY2]], [[GLOBAL_LOAD_DWORDX2_]], 0, 0, 0, implicit $mode, implicit $exec
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:agpr_32 = COPY [[V_MFMA_F64_4X4X4F64_vgprcd_e64_]].sub0
+    ; CHECK-NEXT: GLOBAL_STORE_DWORD [[COPY]], [[COPY3]], 0, 0, implicit $exec :: (store (s32), addrspace 1)
+    ; CHECK-NEXT: SI_RETURN
+    %0:vreg_64_align2 = COPY $vgpr4_vgpr5
+    %1:av_64_align2 = COPY $vgpr0_vgpr1
+    %2:av_64_align2 = COPY $vgpr2_vgpr3
+    %3:vreg_64_align2 = GLOBAL_LOAD_DWORDX2 %0, 0, 0, implicit $exec :: (load (s64), addrspace 1)
+    %4:vreg_64_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 %1, %2, %3, 0, 0, 0, implicit $mode, implicit $exec
+    %5:agpr_32 = COPY %4.sub0
+    GLOBAL_STORE_DWORD %0, %5, 0, 0, implicit $exec :: (store (s32), addrspace 1)
+    SI_RETURN
+...
+
+# V-to-A copy is a subregister-to-subregister copy
+---
+name:  test_rewrite_mfma_copy_subreg_insert_extract_same_subreg
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+
+    ; CHECK-LABEL: name: test_rewrite_mfma_copy_subreg_insert_extract_same_subreg
+    ; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vreg_64_align2 = COPY $vgpr4_vgpr5
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:av_64_align2 = COPY $vgpr0_vgpr1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:av_64_align2 = COPY $vgpr2_vgpr3
+    ; CHECK-NEXT: [[GLOBAL_LOAD_DWORDX2_:%[0-9]+]]:vreg_64_align2 = GLOBAL_LOAD_DWORDX2 [[COPY]], 0, 0, implicit $exec :: (load (s64), addrspace 1)
+    ; CHECK-NEXT: [[V_MFMA_F64_4X4X4F64_vgprcd_e64_:%[0-9]+]]:vreg_64_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 [[COPY1]], [[COPY2]], [[GLOBAL_LOAD_DWORDX2_]], 0, 0, 0, implicit $mode, implicit $exec
+    ; CHECK-NEXT: undef [[COPY3:%[0-9]+]].sub0:areg_64_align2 = COPY [[V_MFMA_F64_4X4X4F64_vgprcd_e64_]].sub0
+    ; CHECK-NEXT: GLOBAL_STORE_DWORDX2 [[COPY]], [[COPY3]], 0, 0, implicit $exec :: (store (s64), addrspace 1)
+    ; CHECK-NEXT: SI_RETURN
+    %0:vreg_64_align2 = COPY $vgpr4_vgpr5
+    %1:av_64_align2 = COPY $vgpr0_vgpr1
+    %2:av_64_align2 = COPY $vgpr2_vgpr3
+    %3:vreg_64_align2 = GLOBAL_LOAD_DWORDX2 %0, 0, 0, implicit $exec :: (load (s64), addrspace 1)
+    %4:vreg_64_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 %1, %2, %3, 0, 0, 0, implicit $mode, implicit $exec
+    undef %5.sub0:areg_64_align2 = COPY %4.sub0
+    GLOBAL_STORE_DWORDX2 %0, %5, 0, 0, implicit $exec :: (store (s64), addrspace 1)
+    SI_RETURN
+...
+
+# V-to-A copy is a subregister-to-subregister copy
+---
+name:  test_rewrite_mfma_copy_subreg_insert_extract_different_subreg
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+
+    ; CHECK-LABEL: name: test_rewrite_mfma_copy_subreg_insert_extract_different_subreg
+    ; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vreg_64_align2 = COPY $vgpr4_vgpr5
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:av_64_align2 = COPY $vgpr0_vgpr1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:av_64_align2 = COPY $vgpr2_vgpr3
+    ; CHECK-NEXT: [[GLOBAL_LOAD_DWORDX2_:%[0-9]+]]:vreg_64_align2 = GLOBAL_LOAD_DWORDX2 [[COPY]], 0, 0, implicit $exec :: (load (s64), addrspace 1)
+    ; CHECK-NEXT: [[V_MFMA_F64_4X4X4F64_vgprcd_e64_:%[0-9]+]]:vreg_64_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 [[COPY1]], [[COPY2]], [[GLOBAL_LOAD_DWORDX2_]], 0, 0, 0, implicit $mode, implicit $exec
+    ; CHECK-NEXT: undef [[COPY3:%[0-9]+]].sub0:areg_64_align2 = COPY [[V_MFMA_F64_4X4X4F64_vgprcd_e64_]].sub1
+    ; CHECK-NEXT: GLOBAL_STORE_DWORDX2 [[COPY]], [[COPY3]], 0, 0, implicit $exec :: (store (s64), addrspace 1)
+    ; CHECK-NEXT: SI_RETURN
+    %0:vreg_64_align2 = COPY $vgpr4_vgpr5
+    %1:av_64_align2 = COPY $vgpr0_vgpr1
+    %2:av_64_align2 = COPY $vgpr2_vgpr3
+    %3:vreg_64_align2 = GLOBAL_LOAD_DWORDX2 %0, 0, 0, implicit $exec :: (load (s64), addrspace 1)
+    %4:vreg_64_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 %1, %2, %3, 0, 0, 0, implicit $mode, implicit $exec
+    undef %5.sub0:areg_64_align2 = COPY %4.sub1
+    GLOBAL_STORE_DWORDX2 %0, %5, 0, 0, implicit $exec :: (store (s64), addrspace 1)
+    SI_RETURN
+...
+
+# V-to-A copy is a subregister extract, from a subregister def
+---
+name:  test_rewrite_mfma_copy_subreg_extract_from_subreg_def
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+
+    ; CHECK-LABEL: name: test_rewrite_mfma_copy_subreg_extract_from_subreg_def
+    ; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vreg_64_align2 = COPY $vgpr4_vgpr5
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:av_64_align2 = COPY $vgpr0_vgpr1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:av_64_align2 = COPY $vgpr2_vgpr3
+    ; CHECK-NEXT: [[GLOBAL_LOAD_DWORDX2_:%[0-9]+]]:vreg_64_align2 = GLOBAL_LOAD_DWORDX2 [[COPY]], 0, 0, implicit $exec :: (load (s64), addrspace 1)
+    ; CHECK-NEXT: undef [[V_MFMA_F64_4X4X4F64_vgprcd_e64_:%[0-9]+]].sub2_sub3:vreg_128_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 [[COPY1]], [[COPY2]], [[GLOBAL_LOAD_DWORDX2_]], 0, 0, 0, implicit $mode, implicit $exec
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:agpr_32 = COPY [[V_MFMA_F64_4X4X4F64_vgprcd_e64_]].sub0
+    ; CHECK-NEXT: GLOBAL_STORE_DWORD [[COPY]], [[COPY3]], 0, 0, implicit $exec :: (store (s32), addrspace 1)
+    ; CHECK-NEXT: SI_RETURN
+    %0:vreg_64_align2 = COPY $vgpr4_vgpr5
+    %1:av_64_align2 = COPY $vgpr0_vgpr1
+    %2:av_64_align2 = COPY $vgpr2_vgpr3
+    %3:vreg_64_align2 = GLOBAL_LOAD_DWORDX2 %0, 0, 0, implicit $exec :: (load (s64), addrspace 1)
+    undef %4.sub2_sub3:vreg_128_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 %1, %2, %3, 0, 0, 0, implicit $mode, implicit $exec
+    %5:agpr_32 = COPY %4.sub0
+    GLOBAL_STORE_DWORD %0, %5, 0, 0, implicit $exec :: (store (s32), addrspace 1)
+    SI_RETURN
+...
+
+# V-to-A copy is a subregister insert from a subregister_def
+---
+name:  test_rewrite_mfma_copy_subreg_insert_from_subreg_def_tuple
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+
+    ; CHECK-LABEL: name: test_rewrite_mfma_copy_subreg_insert_from_subreg_def_tuple
+    ; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vreg_64_align2 = COPY $vgpr4_vgpr5
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:av_64_align2 = COPY $vgpr0_vgpr1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:av_64_align2 = COPY $vgpr2_vgpr3
+    ; CHECK-NEXT: [[GLOBAL_LOAD_DWORDX2_:%[0-9]+]]:vreg_64_align2 = GLOBAL_LOAD_DWORDX2 [[COPY]], 0, 0, implicit $exec :: (load (s64), addrspace 1)
+    ; CHECK-NEXT: undef [[V_MFMA_F64_4X4X4F64_vgprcd_e64_:%[0-9]+]].sub2_sub3:vreg_128_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 [[COPY1]], [[COPY2]], [[GLOBAL_LOAD_DWORDX2_]], 0, 0, 0, implicit $mode, implicit $exec
+    ; CHECK-NEXT: undef [[COPY3:%[0-9]+]].sub0_sub1:areg_128_align2 = COPY [[V_MFMA_F64_4X4X4F64_vgprcd_e64_]].sub2_sub3
+    ; CHECK-NEXT: INLINEASM &"; use $0", 1 /* sideeffect attdialect */, 6291465 /* reguse:AReg_128_Align2 */, [[COPY3]]
+    ; CHECK-NEXT: GLOBAL_STORE_DWORDX4 [[COPY]], [[COPY3]], 0, 0, implicit $exec :: (store (s128), addrspace 1)
+    ; CHECK-NEXT: GLOBAL_STORE_DWORDX2 [[COPY]], [[COPY3]].sub2_sub3, 0, 0, implicit $exec :: (store (s128), addrspace 1)
+    ; CHECK-NEXT: SI_RETURN
+    %0:vreg_64_align2 = COPY $vgpr4_vgpr5
+    %1:av_64_align2 = COPY $vgpr0_vgpr1
+    %2:av_64_align2 = COPY $vgpr2_vgpr3
+    %3:vreg_64_align2 = GLOBAL_LOAD_DWORDX2 %0, 0, 0, implicit $exec :: (load (s64), addrspace 1)
+    undef %4.sub2_sub3:vreg_128_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 %1, %2, %3, 0, 0, 0, implicit $mode, implicit $exec
+    undef %5.sub0_sub1:areg_128_align2 = COPY %4.sub2_sub3
+    INLINEASM &"; use $0", 1 /* sideeffect attdialect */, 6291465 /* reguse:AReg_128_Align2 */, %5
+    GLOBAL_STORE_DWORDX4 %0, %5, 0, 0, implicit $exec :: (store (s128), addrspace 1)
+    GLOBAL_STORE_DWORDX2 %0, %5.sub2_sub3, 0, 0, implicit $exec :: (store (s128), addrspace 1)
+    SI_RETURN
+...
+
+# V-to-A copy is a subregister insert of a subregister from a
+# subregister_def
+---
+name:  test_rewrite_mfma_copy_subreg_insert_from_subreg_def_subreg
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+
+    ; CHECK-LABEL: name: test_rewrite_mfma_copy_subreg_insert_from_subreg_def_subreg
+    ; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vreg_64_align2 = COPY $vgpr4_vgpr5
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:av_64_align2 = COPY $vgpr0_vgpr1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:av_64_align2 = COPY $vgpr2_vgpr3
+    ; CHECK-NEXT: [[GLOBAL_LOAD_DWORDX2_:%[0-9]+]]:vreg_64_align2 = GLOBAL_LOAD_DWORDX2 [[COPY]], 0, 0, implicit $exec :: (load (s64), addrspace 1)
+    ; CHECK-NEXT: undef [[V_MFMA_F64_4X4X4F64_vgprcd_e64_:%[0-9]+]].sub2_sub3:vreg_128_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 [[COPY1]], [[COPY2]], [[GLOBAL_LOAD_DWORDX2_]], 0, 0, 0, implicit $mode, implicit $exec
+    ; CHECK-NEXT: undef [[COPY3:%[0-9]+]].sub1:areg_128_align2 = COPY [[V_MFMA_F64_4X4X4F64_vgprcd_e64_]].sub2
+    ; CHECK-NEXT: INLINEASM &"; use $0", 1 /* sideeffect attdialect */, 6291465 /* reguse:AReg_128_Align2 */, [[COPY3]]
+    ; CHECK-NEXT: GLOBAL_STORE_DWORDX4 [[COPY]], [[COPY3]], 0, 0, implicit $exec :: (store (s128), addrspace 1)
+    ; CHECK-NEXT: GLOBAL_STORE_DWORDX2 [[COPY]], [[COPY3]].sub2_sub3, 0, 0, implicit $exec :: (store (s128), addrspace 1)
+    ; CHECK-NEXT: SI_RETURN
+    %0:vreg_64_align2 = COPY $vgpr4_vgpr5
+    %1:av_64_align2 = COPY $vgpr0_vgpr1
+    %2:av_64_align2 = COPY $vgpr2_vgpr3
+    %3:vreg_64_align2 = GLOBAL_LOAD_DWORDX2 %0, 0, 0, implicit $exec :: (load (s64), addrspace 1)
+    undef %4.sub2_sub3:vreg_128_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 %1, %2, %3, 0, 0, 0, implicit $mode, implicit $exec
+    undef %5.sub1:areg_128_align2 = COPY %4.sub2
+    INLINEASM &"; use $0", 1 /* sideeffect attdialect */, 6291465 /* reguse:AReg_128_Align2 */, %5
+    GLOBAL_STORE_DWORDX4 %0, %5, 0, 0, implicit $exec :: (store (s128), addrspace 1)
+    GLOBAL_STORE_DWORDX2 %0, %5.sub2_sub3, 0, 0, implicit $exec :: (store (s128), addrspace 1)
+    SI_RETURN
+...
diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-src2-chain.mir b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-src2-chain.mir
new file mode 100644
index 0000000000000..38b13052246c1
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-src2-chain.mir
@@ -0,0 +1,327 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -run-pass=greedy,amdgpu-rewrite-agpr-copy-mfma -o - %s | FileCheck %s
+
+---
+name:  test_rewrite_mfma_src2_is_subreg_0
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+
+    ; CHECK-LABEL: name: test_rewrite_mfma_src2_is_subreg_0
+    ; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vreg_64_align2 = COPY $vgpr4_vgpr5
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:av_64_align2 = COPY $vgpr0_vgpr1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:av_64_align2 = COPY $vgpr2_vgpr3
+    ; CHECK-NEXT: [[GLOBAL_LOAD_DWORDX4_:%[0-9]+]]:vreg_128_align2 = GLOBAL_LOAD_DWORDX4 [[COPY]], 0, 0, implicit $exec :: (load (s128), addrspace 1)
+    ; CHECK-NEXT: [[V_MFMA_F64_4X4X4F64_vgprcd_e64_:%[0-9]+]]:vreg_64_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 [[COPY1]], [[COPY2]], [[GLOBAL_LOAD_DWORDX4_]].sub0_sub1, 0, 0, 0, implicit $mode, implicit $exec
+    ; CHECK-NEXT: undef [[COPY3:%[0-9]+]].sub0_sub1:areg_128_align2 = COPY [[V_MFMA_F64_4X4X4F64_vgprcd_e64_]]
+    ; CHECK-NEXT: INLINEASM &"; use $0", 1 /* sideeffect attdialect */, 6291465 /* reguse:AReg_128_Align2 */, [[COPY3]]
+    ; CHECK-NEXT: GLOBAL_STORE_DWORDX4 [[COPY]], [[COPY3]], 0, 0, implicit $exec :: (store (s128), addrspace 1)
+    ; CHECK-NEXT: SI_RETURN
+    %0:vreg_64_align2 = COPY $vgpr4_vgpr5
+    %1:av_64_align2 = COPY $vgpr0_vgpr1
+    %2:av_64_align2 = COPY $vgpr2_vgpr3
+    %3:vreg_128_align2 = GLOBAL_LOAD_DWORDX4 %0, 0, 0, implicit $exec :: (load (s128), addrspace 1)
+    %4:vreg_64_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 %1, %2, %3.sub0_sub1, 0, 0, 0, implicit $mode, implicit $exec
+    undef %5.sub0_sub1:areg_128_align2 = COPY %4
+    INLINEASM &"; use $0", 1 /* sideeffect attdialect */, 6291465 /* reguse:AReg_128_Align2 */, %5
+    GLOBAL_STORE_DWORDX4 %0, %5, 0, 0, implicit $exec :: (store (s128), addrspace 1)
+    SI_RETURN
+...
+
+---
+name:  test_rewrite_mfma_src2_is_subreg_1
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+
+    ; CHECK-LABEL: name: test_rewrite_mfma_src2_is_subreg_1
+    ; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vreg_64_align2 = COPY $vgpr4_vgpr5
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:av_64_align2 = COPY $vgpr0_vgpr1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:av_64_align2 = COPY $vgpr2_vgpr3
+    ; CHECK-NEXT: [[GLOBAL_LOAD_DWORDX4_:%[0-9]+]]:vreg_128_align2 = GLOBAL_LOAD_DWORDX4 [[COPY]], 0, 0, implicit $exec :: (load (s128), addrspace 1)
+    ; CHECK-NEXT: [[V_MFMA_F64_4X4X4F64_vgprcd_e64_:%[0-9]+]]:vreg_64_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 [[COPY1]], [[COPY2]], [[GLOBAL_LOAD_DWORDX4_]].sub2_sub3, 0, 0, 0, implicit $mode, implicit $exec
+    ; CHECK-NEXT: undef [[COPY3:%[0-9]+]].sub0_sub1:areg_128_align2 = COPY [[V_MFMA_F64_4X4X4F64_vgprcd_e64_]]
+    ; CHECK-NEXT: INLINEASM &"; use $0", 1 /* sideeffect attdialect */, 6291465 /* reguse:AReg_128_Align2 */, [[COPY3]]
+    ; CHECK-NEXT: GLOBAL_STORE_DWORDX4 [[COPY]], [[COPY3]], 0, 0, implicit $exec :: (store (s128), addrspace 1)
+    ; CHECK-NEXT: SI_RETURN
+    %0:vreg_64_align2 = COPY $vgpr4_vgpr5
+    %1:av_64_align2 = COPY $vgpr0_vgpr1
+    %2:av_64_align2 = COPY $vgpr2_vgpr3
+    %3:vreg_128_align2 = GLOBAL_LOAD_DWORDX4 %0, 0, 0, implicit $exec :: (load (s128), addrspace 1)
+    %4:vreg_64_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 %1, %2, %3.sub2_sub3, 0, 0, 0, implicit $mode, implicit $exec
+    undef %5.sub0_sub1:areg_128_align2 = COPY %4
+    INLINEASM &"; use $0", 1 /* sideeffect attdialect */, 6291465 /* reguse:AReg_128_Align2 */, %5
+    GLOBAL_STORE_DWORDX4 %0, %5, 0, 0, implicit $exec :: (store (s128), addrspace 1)
+    SI_RETURN
+...
+
+---
+name:  test_rewrite_mfma_src2_is_subreg_chain_mfma_full_def
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+
+    ; CHECK-LABEL: name: test_rewrite_mfma_src2_is_subreg_chain_mfma_full_def
+    ; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vreg_64_align2 = COPY $vgpr4_vgpr5
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:av_64_align2 = COPY $vgpr0_vgpr1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:av_64_align2 = COPY $vgpr2_vgpr3
+    ; CHECK-NEXT: [[GLOBAL_LOAD_DWORDX2_:%[0-9]+]]:vreg_64_align2 = GLOBAL_LOAD_DWORDX2 [[COPY]], 0, 0, implicit $exec :: (load (s64), addrspace 1)
+    ; CHECK-NEXT: undef [[V_MFMA_F64_4X4X4F64_vgprcd_e64_:%[0-9]+]].sub0_sub1:vreg_128_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 [[COPY1]], [[COPY2]], [[GLOBAL_LOAD_DWORDX2_]], 0, 0, 0, implicit $mode, implicit $exec
+    ; CHECK-NEXT: dead %other_use:vreg_64_align2 = COPY [[V_MFMA_F64_4X4X4F64_vgprcd_e64_]].sub0_sub1
+    ; CHECK-NEXT: [[V_MFMA_F64_4X4X4F64_vgprcd_e64_1:%[0-9]+]]:vreg_64_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 [[COPY1]], [[COPY2]], [[V_MFMA_F64_4X4X4F64_vgprcd_e64_]].sub0_sub1, 0, 0, 0, implicit $mode, implicit $exec
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:areg_64_align2 = COPY [[V_MFMA_F64_4X4X4F64_vgprcd_e64_1]]
+    ; CHECK-NEXT: INLINEASM &"; use $0", 1 /* sideeffect attdialect */, 3735561 /* reguse:AReg_64_Align2 */, [[COPY3]]
+    ; CHECK-NEXT: GLOBAL_STORE_DWORDX2 [[COPY]], [[COPY3]], 0, 0, implicit $exec :: (store (s64), addrspace 1)
+    ; CHECK-NEXT: SI_RETURN
+    %0:vreg_64_align2 = COPY $vgpr4_vgpr5
+    %1:av_64_align2 = COPY $vgpr0_vgpr1
+    %2:av_64_align2 = COPY $vgpr2_vgpr3
+    %3:vreg_64_align2 = GLOBAL_LOAD_DWORDX2 %0, 0, 0, implicit $exec :: (load (s64), addrspace 1)
+    undef %4.sub0_sub1:vreg_128_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 %1, %2, %3, 0, 0, 0, implicit $mode, implicit $exec
+    %other_use:vreg_64_align2 = COPY %4.sub0_sub1
+    %5:vreg_64_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 %1, %2, %4.sub0_sub1, 0, 0, 0, implicit $mode, implicit $exec
+    %6:areg_64_align2 = COPY %5
+    INLINEASM &"; use $0", 1 /* sideeffect attdialect */, 3735561 /* reguse:AReg_...
[truncated]

Copy link
Contributor

@perlfu perlfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - test additions seems logical

I notice a lot of use of partially defined registers in final store instructions of tests. Are these intentional or otherwise likely to create issues later?

@arsenm
Copy link
Contributor Author

arsenm commented Aug 13, 2025

LGTM - test additions seems logical

I notice a lot of use of partially defined registers in final store instructions of tests. Are these intentional or otherwise likely to create issues later?

That's kind of the point. The only problem would be use of undefined lanes, but the only interest there is the machine verifier doesn't fail

@arsenm arsenm force-pushed the users/arsenm/amdgpu/add-tests-rewrite-mfma-agpr-with-subreg-copies branch from 3776f12 to 69cd540 Compare August 18, 2025 15:31
@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-rewrite-vgpr-mfma-with-imm-src2-2 branch from 4e9c227 to 58720b8 Compare August 18, 2025 15:31
@arsenm arsenm force-pushed the users/arsenm/amdgpu/add-tests-rewrite-mfma-agpr-with-subreg-copies branch from 69cd540 to dd011f2 Compare August 20, 2025 23:23
@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-rewrite-vgpr-mfma-with-imm-src2-2 branch from 58720b8 to 25b15ea Compare August 20, 2025 23:23
Base automatically changed from users/arsenm/amdgpu/handle-rewrite-vgpr-mfma-with-imm-src2-2 to main August 21, 2025 00:09
Currently only cases rooted at a full copy of an MFMA result are handled.
Prepare to relax that by testing more intricate subregister usage.

Currently only full copies are handled, add some tests to help work
towards handling subregisters.
@arsenm arsenm force-pushed the users/arsenm/amdgpu/add-tests-rewrite-mfma-agpr-with-subreg-copies branch from dd011f2 to 8b45c13 Compare August 21, 2025 00:11
@arsenm arsenm enabled auto-merge (squash) August 21, 2025 00:15
@arsenm arsenm merged commit 744cd8a into main Aug 21, 2025
9 checks passed
@arsenm arsenm deleted the users/arsenm/amdgpu/add-tests-rewrite-mfma-agpr-with-subreg-copies branch August 21, 2025 00:39
@mikaelholmen
Copy link
Collaborator

I see

Failed Tests (2):
  LLVM :: CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-insert-extract.mir
  LLVM :: CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-src2-chain.mir

with EXPENSIVE_CHECKS.
E.g.

# Machine code for function test_rewrite_mfma_src2_chain_different_subregs_same_reg_full_copy_use: NoPHIs, TracksLiveness, TracksDebugUserValues

0B	bb.0:
	  liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
16B	  %0:vreg_64_align2 = COPY $vgpr4_vgpr5
32B	  %1:av_64_align2 = COPY $vgpr0_vgpr1
48B	  %2:av_64_align2 = COPY $vgpr2_vgpr3
64B	  %3:vreg_128_align2 = GLOBAL_LOAD_DWORDX4 %0:vreg_64_align2, 0, 0, implicit $exec :: (load (s128), addrspace 1)
80B	  undef %4.sub2_sub3:vreg_128_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 %1:av_64_align2, %2:av_64_align2, %3.sub0_sub1:vreg_128_align2, 0, 0, 0, implicit $mode, implicit $exec
96B	  dead %other_use0:vreg_64_align2 = COPY %4.sub0_sub1:vreg_128_align2
112B	  %4.sub0_sub1:vreg_128_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 %1:av_64_align2, %2:av_64_align2, %3.sub2_sub3:vreg_128_align2, 0, 0, 0, implicit $mode, implicit $exec
128B	  dead %other_use1:vreg_64_align2 = COPY %4.sub2_sub3:vreg_128_align2
144B	  dead %other_use2:vreg_64 = COPY %4.sub1_sub2:vreg_128_align2
160B	  %8:areg_128_align2 = COPY %4:vreg_128_align2
176B	  INLINEASM &"; use $0" [sideeffect] [attdialect], $0:[reguse:AReg_128_Align2], %8:areg_128_align2
192B	  GLOBAL_STORE_DWORDX4 %0:vreg_64_align2, %8:areg_128_align2, 0, 0, implicit $exec :: (store (s128), addrspace 1)
208B	  SI_RETURN

# End machine code for function test_rewrite_mfma_src2_chain_different_subregs_same_reg_full_copy_use.

*** Bad machine code: No live subrange at use ***
- function:    test_rewrite_mfma_src2_chain_different_subregs_same_reg_full_copy_use
- basic block: %bb.0  (0x557c337490d8) [0B;224B)
- instruction: 96B	dead %other_use0:vreg_64_align2 = COPY %4.sub0_sub1:vreg_128_align2
- operand 1:   %4.sub0_sub1:vreg_128_align2
- interval:    %4 [80r,112r:1)[112r,160r:0) 0@112r 1@80r  L000000000000000C [112r,160r:0) 0@112r  L0000000000000030 [80r,160r:0) 0@80r  L00000000000000C0 [80r,160r:0) 0@80r  L0000000000000003 [112r,160r:0) 0@112r  weight:1.472917e-02
- at:          96B
LLVM ERROR: Found 1 machine code errors.

@arsenm
Copy link
Contributor Author

arsenm commented Aug 21, 2025

I see

Failed Tests (2):
  LLVM :: CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-insert-extract.mir
  LLVM :: CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-src2-chain.mir

with EXPENSIVE_CHECKS. E.g.

Fixed in b614975

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants