-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Hi Emil,
first of all thanks a lot for releasing these nice UMAAL-based implementations of the base field of P256 under a permissive license! I'm building a Rust implementation of P256 ECDH/ECDSA around them, using your speed optimized P256_{add,sub,mul,sqr}mod routines as computational core.
I now find myself in the strange situation where the entire ephemeral (public) point calculation in ECDSA is twice as fast as inverting the ephemeral scalar k with Euler's theorem and my own Barrett reduction-based Rust implementation 🤦♂️. Did you ever put thought into speeding up the scalar field, or know of an existing UMAAL-based implementation for it? I think any "N256_mulmod" assembly routine would give a major speed bump, even if not completely optimized - my lack of assembly skills are currently preventing me from adapting your P256_mulmod to n = 0xffffffff00000000ffffffffffffffffbce6faada7179e84f3b9cac2fc632551.