Skip to content

AVX IFMA Accelerated split_mul #885

@kayabaNerve

Description

@kayabaNerve

I prototyped a portable split_mul implementation premised on using the AVX IFMA instructions to accelerate itself. I believe I optimized it decently, the code is well-documented, the benchmark is easy to run, and the README explains the challenges moving forward. Unfortunately, despite observing a 20-40% improvement when compiled exclusively for the native CPU, I was unable to see any real-world performance benefit when I patched my crypto-bigint dependency to a fork which used this split_mul implementation when reasonable.

https://github.com/kayabaNerve/avx-ifma-mul

I'd love to see further discussion and potential upstreaming (again, as explained in the README). I would have entirely made this an issue if I did not want to provide the implementation + benchmark as an artifact (hence the repository).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions