Skip to content

Conversation

@ylpoonlg
Copy link
Contributor

Adds two SVE benchmarks making use of SVE2 APIs and interleaved load/store.

Performance Results

Run on Neoverse-V2

Method Size Mean Error StdDev Median Min Max Allocated
Scalar 15 13.212 ns 0.0076 ns 0.0068 ns 13.214 ns 13.198 ns 13.222 ns -
Vector128PairwiseAdd 15 6.525 ns 0.0117 ns 0.0109 ns 6.529 ns 6.503 ns 6.534 ns -
SvePairwiseAdd 15 5.243 ns 0.0067 ns 0.0056 ns 5.244 ns 5.227 ns 5.251 ns -
Sve2PairwiseAdd 15 5.040 ns 0.0093 ns 0.0087 ns 5.037 ns 5.027 ns 5.053 ns -
Scalar 127 100.121 ns 0.2256 ns 0.2111 ns 100.175 ns 99.688 ns 100.390 ns -
Vector128PairwiseAdd 127 24.714 ns 0.0784 ns 0.0612 ns 24.711 ns 24.632 ns 24.820 ns -
SvePairwiseAdd 127 32.348 ns 0.0650 ns 0.0608 ns 32.385 ns 32.258 ns 32.416 ns -
Sve2PairwiseAdd 127 29.272 ns 0.0282 ns 0.0263 ns 29.279 ns 29.207 ns 29.296 ns -
Scalar 527 391.076 ns 0.4614 ns 0.4090 ns 391.211 ns 390.138 ns 391.601 ns -
Vector128PairwiseAdd 527 89.575 ns 0.1429 ns 0.1267 ns 89.619 ns 89.281 ns 89.726 ns -
SvePairwiseAdd 527 127.174 ns 0.0962 ns 0.0900 ns 127.148 ns 127.001 ns 127.321 ns -
Sve2PairwiseAdd 527 119.669 ns 0.1557 ns 0.1457 ns 119.690 ns 119.427 ns 119.920 ns -
Scalar 10015 7,389.023 ns 66.0503 ns 55.1550 ns 7,381.598 ns 7,333.689 ns 7,513.358 ns -
Vector128PairwiseAdd 10015 2,328.763 ns 33.0594 ns 30.9238 ns 2,341.812 ns 2,271.219 ns 2,365.432 ns -
SvePairwiseAdd 10015 2,406.691 ns 2.8076 ns 2.4889 ns 2,406.969 ns 2,401.705 ns 2,411.131 ns -
Sve2PairwiseAdd 10015 2,202.091 ns 2.4197 ns 2.2634 ns 2,202.101 ns 2,198.067 ns 2,206.706 ns -
Method Size Mean Error StdDev Median Min Max Allocated
Scalar 15 18.665 ns 0.0076 ns 0.0071 ns 18.663 ns 18.657 ns 18.681 ns -
Vector128ComplexMultiply 15 7.547 ns 0.0015 ns 0.0011 ns 7.547 ns 7.545 ns 7.549 ns -
SveComplexMultiply 15 5.893 ns 0.0093 ns 0.0087 ns 5.896 ns 5.879 ns 5.904 ns -
Sve2ComplexMultiply 15 5.033 ns 0.0015 ns 0.0013 ns 5.034 ns 5.031 ns 5.035 ns -
Scalar 127 148.051 ns 0.0958 ns 0.0850 ns 148.031 ns 147.947 ns 148.224 ns -
Vector128ComplexMultiply 127 43.807 ns 0.1016 ns 0.0849 ns 43.784 ns 43.683 ns 44.004 ns -
SveComplexMultiply 127 47.935 ns 0.0558 ns 0.0495 ns 47.945 ns 47.863 ns 48.015 ns -
Sve2ComplexMultiply 127 37.890 ns 0.0082 ns 0.0068 ns 37.887 ns 37.880 ns 37.904 ns -
Scalar 527 595.884 ns 0.7120 ns 0.5946 ns 595.605 ns 595.460 ns 597.372 ns -
Vector128ComplexMultiply 527 179.198 ns 0.4771 ns 0.4463 ns 179.116 ns 178.661 ns 180.130 ns -
SveComplexMultiply 527 199.454 ns 0.0967 ns 0.0905 ns 199.454 ns 199.283 ns 199.590 ns -
Sve2ComplexMultiply 527 158.460 ns 0.3893 ns 0.3642 ns 158.689 ns 157.934 ns 158.831 ns -
Scalar 10015 11,261.033 ns 6.9704 ns 5.4421 ns 11,260.642 ns 11,253.823 ns 11,269.318 ns -
Vector128ComplexMultiply 10015 3,417.121 ns 6.5946 ns 5.8460 ns 3,416.762 ns 3,410.709 ns 3,426.948 ns -
SveComplexMultiply 10015 3,775.434 ns 1.9476 ns 1.7265 ns 3,775.040 ns 3,773.440 ns 3,779.061 ns -
Sve2ComplexMultiply 10015 3,035.012 ns 4.2198 ns 3.9472 ns 3,033.787 ns 3,031.124 ns 3,044.965 ns -

cc @dotnet/arm64-contrib @SwapnilGaikwad @LoopedBard3

@ylpoonlg ylpoonlg marked this pull request as ready for review November 10, 2025 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants