-
Notifications
You must be signed in to change notification settings - Fork 371
[Microbenchmarks] Add benchmark for conditional scalar assignment autovec #295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Microbenchmarks] Add benchmark for conditional scalar assignment autovec #295
Conversation
…ovec Benchmarks with vs. without autovec for a loop containing conditional scalar assignment (plus a little extra arithmetic as a 'work payload').
|
Microbenchmark for FindLast/CSA autovec, as requested on llvm/llvm-project#158088 With just the conditional assignment in the loop, there was no noticeable performance difference. However, when I added a small arithmetic payload I saw a noticeable difference, especially for uint8t. |
MacDue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally seems reasonable to me (bar a few nits), but I've not added a benchmark before, so wait and see if there's any more comments.
| // Pick out-of-range default value. | ||
| T Result = 101; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a moment to see that "out-of-range" here was referring to the range of the input A values. Could you clarify that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| @@ -0,0 +1,118 @@ | |||
| #include <iostream> | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was going to comment about the license header, but it seems that's not done here (looking at other files).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wondered about that too.
| run_csa_autovec(&A[0], &B[0], &C[0], Threshold); | ||
| benchmark::DoNotOptimize(A); | ||
| benchmark::DoNotOptimize(B); | ||
| benchmark::DoNotOptimize(C); | ||
| benchmark::ClobberMemory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it makes a difference, but other benchmarks seem to do these first:
| run_csa_autovec(&A[0], &B[0], &C[0], Threshold); | |
| benchmark::DoNotOptimize(A); | |
| benchmark::DoNotOptimize(B); | |
| benchmark::DoNotOptimize(C); | |
| benchmark::ClobberMemory(); | |
| benchmark::DoNotOptimize(A); | |
| benchmark::DoNotOptimize(B); | |
| benchmark::DoNotOptimize(C); | |
| benchmark::ClobberMemory(); | |
| run_csa_autovec(&A[0], &B[0], &C[0], Threshold); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MathFunctions.cpp and RuntimeChecks.cpp seem to run the test function first as well, so there doesn't seem to be agreement on this.
| // for 'A' in init_data below. | ||
| T Result = 101; | ||
| for (unsigned i = 0; i < ITERATIONS; i++) { | ||
| // Do some work to make the difference noticeable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a few more variations, like the minimal case with just a CAS and multiple independent CAS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| } | ||
| } | ||
|
|
||
| // Add add auto-vectorized and disabled vectorization benchmarks for math |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment needs updating, currently passes only ty and Threshold, but it might be helpful to also pass a function if it helps to reduce the duplication for additional patterns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Benchmarks with vs. without autovec for a loop containing conditional
scalar assignment (plus a little extra arithmetic as a 'work payload').