Skip to content

Conversation

@ylpoonlg
Copy link
Contributor

@ylpoonlg ylpoonlg commented Nov 14, 2025

Adds 4 floating-point math routines to SVE benchmarks.

Performance Results

Run on Neoverse-V2

FastDivision

Method Size Mean Error StdDev Median Min Max Allocated
Scalar 15 9.825 ns 0.0037 ns 0.0031 ns 9.825 ns 9.821 ns 9.832 ns -
Vector128FastDivision 15 7.824 ns 0.0092 ns 0.0081 ns 7.824 ns 7.811 ns 7.839 ns -
SveFastDivision 15 9.219 ns 0.0172 ns 0.0161 ns 9.221 ns 9.182 ns 9.244 ns -
Scalar 127 94.367 ns 0.0301 ns 0.0251 ns 94.365 ns 94.329 ns 94.422 ns -
Vector128FastDivision 127 77.715 ns 0.0960 ns 0.0898 ns 77.762 ns 77.590 ns 77.858 ns -
SveFastDivision 127 79.684 ns 0.0204 ns 0.0171 ns 79.682 ns 79.657 ns 79.716 ns -
Scalar 527 397.337 ns 0.1914 ns 0.1790 ns 397.322 ns 397.022 ns 397.759 ns -
Vector128FastDivision 527 328.083 ns 0.3779 ns 0.3535 ns 328.128 ns 327.514 ns 328.745 ns -
SveFastDivision 527 331.842 ns 0.3459 ns 0.2888 ns 331.897 ns 331.362 ns 332.280 ns -
Scalar 10015 7,596.185 ns 7.9107 ns 7.3997 ns 7,595.238 ns 7,585.420 ns 7,609.528 ns -
Vector128FastDivision 10015 6,270.509 ns 6.7028 ns 5.5972 ns 6,269.118 ns 6,262.699 ns 6,281.886 ns -
SveFastDivision 10015 6,355.315 ns 9.5852 ns 8.9660 ns 6,358.352 ns 6,343.016 ns 6,369.762 ns -

MultiplyPow2

Method Size Mean Error StdDev Median Min Max Allocated
Scalar 15 10.068 ns 0.0067 ns 0.0052 ns 10.066 ns 10.063 ns 10.078 ns -
Vector128MultiplyPow2 15 5.770 ns 0.0045 ns 0.0042 ns 5.769 ns 5.764 ns 5.778 ns -
SveMultiplyPow2 15 5.651 ns 0.0015 ns 0.0012 ns 5.651 ns 5.649 ns 5.653 ns -
SveTail 15 5.721 ns 0.0136 ns 0.0106 ns 5.718 ns 5.714 ns 5.754 ns -
Scalar 127 88.694 ns 0.0922 ns 0.0862 ns 88.680 ns 88.570 ns 88.855 ns -
Vector128MultiplyPow2 127 29.605 ns 0.0284 ns 0.0252 ns 29.605 ns 29.565 ns 29.649 ns -
SveMultiplyPow2 127 40.121 ns 0.3650 ns 0.3414 ns 40.308 ns 39.560 ns 40.363 ns -
SveTail 127 39.014 ns 0.0256 ns 0.0214 ns 39.010 ns 38.997 ns 39.082 ns -
Scalar 527 349.414 ns 0.1524 ns 0.1351 ns 349.381 ns 349.260 ns 349.670 ns -
Vector128MultiplyPow2 527 120.288 ns 0.0819 ns 0.0726 ns 120.312 ns 120.120 ns 120.362 ns -
SveMultiplyPow2 527 168.595 ns 0.0843 ns 0.0747 ns 168.618 ns 168.467 ns 168.723 ns -
SveTail 527 161.587 ns 0.0874 ns 0.0818 ns 161.559 ns 161.505 ns 161.748 ns -
Scalar 10015 6,573.934 ns 17.9883 ns 15.0211 ns 6,579.797 ns 6,547.476 ns 6,586.919 ns -
Vector128MultiplyPow2 10015 2,587.071 ns 0.9635 ns 0.8541 ns 2,586.873 ns 2,586.034 ns 2,588.692 ns -
SveMultiplyPow2 10015 3,158.436 ns 24.2145 ns 22.6503 ns 3,164.490 ns 3,096.195 ns 3,180.436 ns -
SveTail 10015 3,039.080 ns 1.4834 ns 1.2387 ns 3,039.081 ns 3,037.630 ns 3,041.798 ns -

FP64Overflow

Method Size Mean Error StdDev Median Min Max Allocated
Scalar 15 19.399 ns 0.0096 ns 0.0090 ns 19.396 ns 19.387 ns 19.417 ns -
Vector128FP64Overflow 15 8.756 ns 0.0063 ns 0.0056 ns 8.755 ns 8.748 ns 8.766 ns -
SveFP64Overflow 15 8.147 ns 0.0066 ns 0.0062 ns 8.145 ns 8.139 ns 8.159 ns -
Sve2FP64Overflow 15 7.619 ns 0.0123 ns 0.0109 ns 7.616 ns 7.608 ns 7.644 ns -
Scalar 127 164.961 ns 0.2280 ns 0.2021 ns 164.916 ns 164.576 ns 165.303 ns -
Vector128FP64Overflow 127 66.245 ns 0.0188 ns 0.0157 ns 66.245 ns 66.220 ns 66.275 ns -
SveFP64Overflow 127 64.313 ns 0.8416 ns 0.7872 ns 64.410 ns 62.749 ns 65.260 ns -
Sve2FP64Overflow 127 77.453 ns 0.0143 ns 0.0127 ns 77.456 ns 77.437 ns 77.479 ns -
Scalar 527 663.966 ns 0.9327 ns 0.8724 ns 663.922 ns 662.648 ns 665.514 ns -
Vector128FP64Overflow 527 271.640 ns 0.0580 ns 0.0484 ns 271.639 ns 271.543 ns 271.743 ns -
SveFP64Overflow 527 254.105 ns 0.0560 ns 0.0497 ns 254.092 ns 254.048 ns 254.190 ns -
Sve2FP64Overflow 527 327.546 ns 0.0940 ns 0.0785 ns 327.538 ns 327.438 ns 327.724 ns -
Scalar 10015 12,511.545 ns 37.5932 ns 35.1647 ns 12,500.521 ns 12,474.141 ns 12,588.912 ns -
Vector128FP64Overflow 10015 5,216.800 ns 13.8224 ns 12.9294 ns 5,215.184 ns 5,188.232 ns 5,232.808 ns -
SveFP64Overflow 10015 5,068.319 ns 16.6997 ns 15.6210 ns 5,074.020 ns 5,042.855 ns 5,089.636 ns -
Sve2FP64Overflow 10015 6,263.925 ns 5.0281 ns 4.4572 ns 6,263.132 ns 6,257.081 ns 6,273.254 ns -

Exponent

Method Size Mean Error StdDev Median Min Max Allocated
Scalar 15 49.909 ns 0.0501 ns 0.0418 ns 49.912 ns 49.858 ns 49.994 ns -
Vector128Exponent 15 16.389 ns 0.0150 ns 0.0125 ns 16.385 ns 16.376 ns 16.418 ns -
SveExponent 15 6.889 ns 0.0074 ns 0.0070 ns 6.889 ns 6.877 ns 6.903 ns -
Scalar 127 432.017 ns 0.2920 ns 0.2588 ns 431.993 ns 431.621 ns 432.565 ns -
Vector128Exponent 127 73.328 ns 0.0658 ns 0.0549 ns 73.319 ns 73.248 ns 73.465 ns -
SveExponent 127 55.834 ns 0.1106 ns 0.1035 ns 55.779 ns 55.687 ns 55.970 ns -
Scalar 527 1,784.395 ns 0.5001 ns 0.4176 ns 1,784.531 ns 1,783.495 ns 1,784.844 ns -
Vector128Exponent 527 277.828 ns 0.1142 ns 0.0954 ns 277.825 ns 277.660 ns 277.971 ns -
SveExponent 527 230.463 ns 0.5259 ns 0.4919 ns 230.461 ns 229.614 ns 231.321 ns -
Scalar 10015 33,839.271 ns 26.9486 ns 22.5033 ns 33,833.955 ns 33,818.360 ns 33,897.961 ns -
Vector128Exponent 10015 5,141.024 ns 6.0805 ns 5.3902 ns 5,142.541 ns 5,129.059 ns 5,146.362 ns -
SveExponent 10015 4,373.064 ns 10.5089 ns 9.3159 ns 4,368.532 ns 4,364.087 ns 4,387.380 ns -

cc @dotnet/arm64-contrib @SwapnilGaikwad @LoopedBard3

* FastDivision
* MultiplyPow2
* FP64Overflow
* Exponent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant