-
Notifications
You must be signed in to change notification settings - Fork 69
Description
I prototyped a portable split_mul
implementation premised on using the AVX IFMA instructions to accelerate itself. I believe I optimized it decently, the code is well-documented, the benchmark is easy to run, and the README explains the challenges moving forward. Unfortunately, despite observing a 20-40% improvement when compiled exclusively for the native CPU, I was unable to see any real-world performance benefit when I patched my crypto-bigint dependency to a fork which used this split_mul
implementation when reasonable.
https://github.com/kayabaNerve/avx-ifma-mul
I'd love to see further discussion and potential upstreaming (again, as explained in the README). I would have entirely made this an issue if I did not want to provide the implementation + benchmark as an artifact (hence the repository).