Open
Description
Repro Repo:
https://github.com/damageboy/coreclr-pdep-mask-flaky-cse
Relevant piece of code:
ulong t64;
t64 = P.AsUInt64().GetElement(0);
var p0 = ParallelBitDeposit(t64, 0x0707070707070707);
var p1 = ParallelBitDeposit(t64 >> 32, 0x0707070707070707);
t64 = P.AsUInt64().GetElement(1);
var p2 = ParallelBitDeposit(t64, 0x0707070707070707);
var p3 = ParallelBitDeposit(t64 >> 32, 0x0707070707070707);
var tmp128 = ExtractVector128(P, 1);
t64 = tmp128.AsUInt64().GetElement(0);
var p4 = ParallelBitDeposit(t64, 0x0707070707070707);
var p5 = ParallelBitDeposit(t64 >> 32, 0x0707070707070707);
t64 = tmp128.AsUInt64().GetElement(1);
var p6 = ParallelBitDeposit(t64, 0x0707070707070707);
var p7 = ParallelBitDeposit(t64 >> 32, 0x0707070707070707);```
Generated asm:
; t64 = P.AsUInt64().GetElement(0);
00007FC3A6AB07F5 C5FC28C8 vmovaps ymm1,ymm0
00007FC3A6AB07F9 C4E1F97EC8 vmovq rax,xmm1
; var p0 = ParallelBitDeposit(t64, 0x0707070707070707);
00007FC3A6AB07FE 48BF0707070707070707 mov rdi,707070707070707h
00007FC3A6AB0808 C4E2FBF5FF pdep rdi,rax,rdi
; var p1 = ParallelBitDeposit(t64 >> 32, 0x0707070707070707);
00007FC3A6AB080D 48C1E820 shr rax,20h
00007FC3A6AB0811 48BE0707070707070707 mov rsi,707070707070707h
00007FC3A6AB081B C4E2FBF5F6 pdep rsi,rax,rsi
; t64 = P.AsUInt64().GetElement(1);
00007FC3A6AB0820 C5FC28C8 vmovaps ymm1,ymm0
00007FC3A6AB0824 C4E3F916C801 vpextrq rax,xmm1,1
; var p2 = ParallelBitDeposit(t64, 0x0707070707070707);
00007FC3A6AB082A 48BA0707070707070707 mov rdx,707070707070707h
00007FC3A6AB0834 C4E2FBF5D2 pdep rdx,rax,rdx
; var p3 = ParallelBitDeposit(t64 >> 32, 0x0707070707070707);
00007FC3A6AB0839 48C1E820 shr rax,20h
00007FC3A6AB083D 48B90707070707070707 mov rcx,707070707070707h
00007FC3A6AB0847 C4E2FBF5C9 pdep rcx,rax,rcx
; var tmp128 = ExtractVector128(P, 1);
00007FC3A6AB084C C4E37D39C001 vextracti128 xmm0,ymm0,1
; t64 = tmp128.AsUInt64().GetElement(0);
00007FC3A6AB0852 C4E1F97EC0 vmovq rax,xmm0
; var p4 = ParallelBitDeposit(t64, 0x0707070707070707);
00007FC3A6AB0857 49B80707070707070707 mov r8,707070707070707h
00007FC3A6AB0861 C442FBF5C0 pdep r8,rax,r8
; var p5 = ParallelBitDeposit(t64 >> 32, 0x0707070707070707);
00007FC3A6AB0866 48C1E820 shr rax,20h
00007FC3A6AB086A 49B90707070707070707 mov r9,707070707070707h
00007FC3A6AB0874 C442FBF5C9 pdep r9,rax,r9
; t64 = tmp128.AsUInt64().GetElement(1);
00007FC3A6AB0879 C4E3F916C001 vpextrq rax,xmm0,1
; var p6 = ParallelBitDeposit(t64, 0x0707070707070707);
00007FC3A6AB087F 49BA0707070707070707 mov r10,707070707070707h
00007FC3A6AB0889 C442FBF5D2 pdep r10,rax,r10
; var p7 = ParallelBitDeposit(t64 >> 32, 0x0707070707070707);
00007FC3A6AB088E 48C1E820 shr rax,20h
00007FC3A6AB0892 49BB0707070707070707 mov r11,707070707070707h
00007FC3A6AB089C C4C2FBF5C3 pdep rax,rax,r11
Issue
This is a very minor tweak for the bug I opened yesterday: #442
Somehow, just moving a few of these expressions around causes the JIT to not perform CSE on the mask
parameter for PDEP
in a dependable way... (unlike the code I posted on the previous issue).
Not sure why this is suddenly happening for such a trivial change compared to the previous listing...
category:cq
theme:cse
skill-level:intermediate
cost:medium
impact:small