Skip to content

Conversation

@huntergr-arm
Copy link

Benchmarks with vs. without autovec for a loop containing conditional
scalar assignment (plus a little extra arithmetic as a 'work payload').

…ovec

Benchmarks with vs. without autovec for a loop containing conditional
scalar assignment (plus a little extra arithmetic as a 'work payload').
@huntergr-arm
Copy link
Author

Microbenchmark for FindLast/CSA autovec, as requested on llvm/llvm-project#158088

With just the conditional assignment in the loop, there was no noticeable performance difference. However, when I added a small arithmetic payload I saw a noticeable difference, especially for uint8t.

Copy link
Member

@MacDue MacDue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally seems reasonable to me (bar a few nits), but I've not added a benchmark before, so wait and see if there's any more comments.

Comment on lines 14 to 15
// Pick out-of-range default value.
T Result = 101;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a moment to see that "out-of-range" here was referring to the range of the input A values. Could you clarify that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,118 @@
#include <iostream>
Copy link
Member

@MacDue MacDue Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was going to comment about the license header, but it seems that's not done here (looking at other files).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wondered about that too.

Comment on lines 78 to 82
run_csa_autovec(&A[0], &B[0], &C[0], Threshold);
benchmark::DoNotOptimize(A);
benchmark::DoNotOptimize(B);
benchmark::DoNotOptimize(C);
benchmark::ClobberMemory();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it makes a difference, but other benchmarks seem to do these first:

Suggested change
run_csa_autovec(&A[0], &B[0], &C[0], Threshold);
benchmark::DoNotOptimize(A);
benchmark::DoNotOptimize(B);
benchmark::DoNotOptimize(C);
benchmark::ClobberMemory();
benchmark::DoNotOptimize(A);
benchmark::DoNotOptimize(B);
benchmark::DoNotOptimize(C);
benchmark::ClobberMemory();
run_csa_autovec(&A[0], &B[0], &C[0], Threshold);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MathFunctions.cpp and RuntimeChecks.cpp seem to run the test function first as well, so there doesn't seem to be agreement on this.

// for 'A' in init_data below.
T Result = 101;
for (unsigned i = 0; i < ITERATIONS; i++) {
// Do some work to make the difference noticeable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a few more variations, like the minimal case with just a CAS and multiple independent CAS?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

}
}

// Add add auto-vectorized and disabled vectorization benchmarks for math
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment needs updating, currently passes only ty and Threshold, but it might be helpful to also pass a function if it helps to reduce the duplication for additional patterns

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants