Skip to content

Integer divexact with precomputed inverse #2375

@fredrik-johansson

Description

@fredrik-johansson

In #2371 fmpz_vec there is code for nx1-limb divexact with precomputed inverse which could be moved to ulong_extras and mpn_extras.

For multilimb divexact, one could be lazy and use flint_mpn_divrem_preinvn and it turns out this beats mpn_divexact above roughly 1000 limbs. Even better is to do a single flint_mpn_mulhigh_n or flint_mpn_mullow_n and exploit the fact that the quotient is known to be exact. Better yet is the bidirectional algorithm: compute the high n/2 limbs using mulhigh and the low n/2 limbs using mullow (one might need n/2+1 limbs to glue things together correctly, not sure).

Quick benchmark, timings relative to mpn_divexact (lower is better):

         n     divrem_preinvn   mullow   bidirectional
         2     1.562            0.102    0.182
         4     1.469            0.151    0.151
         8     1.281            0.307    0.191
        16     1.747            0.618    0.285
        32     1.943            0.824    0.429
        64     1.867            0.871    0.483
       128     1.483            0.716    0.446
       256     1.350            0.663    0.421
       512     1.256            0.612    0.404
      1024     0.761            0.376    0.389
      2048     0.634            0.309    0.283
      4096     0.528            0.252    0.243
      8192     0.450            0.217    0.210
     16384     0.410            0.202    0.193
     32768     0.384            0.190    0.185
     65536     0.339            0.167    0.161
    131072     0.350            0.171    0.161
    262144     0.298            0.145    0.134
    524288     0.268            0.132    0.126
   1048576     0.276            0.130    0.118
   2097152     0.282            0.139    0.118
   4194304     0.267            0.130    0.119
   8388608     0.273            0.135    0.124
  16777216     0.287            0.144    0.127

Of course, that is just the basic idea; there will be some more overhead for adjustments, and it gets a bit more tricky to do all unbalanced cases optimally.

Once some divexact_preinv algorithms are implemented, one could also implement a divexact which computes the inverse on the fly (with different tuning since this will also have to account for the cost of computing different inverses).

Benchmark code:

#include <gmp.h>
#include "mpn_extras.h"
#include "profiler.h"

void __gmpn_divexact(mp_ptr, mp_srcptr, mp_size_t, mp_srcptr, mp_size_t);

int main()
{
    nn_ptr C, D, A, B, AB, Binv;
    slong n;
    double __, tgmp, tinv, tpreinvn, tmullow, tbidirectional;

    flint_rand_t state;
    flint_rand_init(state);

    flint_printf("         n     divrem_preinvn   mullow   bidirectional\n");

    for (n = 2; ; n *= 2)
    {
        A = flint_malloc(n * sizeof(mp_limb_t));
        B = flint_malloc(n * sizeof(mp_limb_t));
        AB = flint_malloc(2 * n * sizeof(mp_limb_t));
        C = flint_malloc(2 * n * sizeof(mp_limb_t));
        D = flint_malloc(2 * n * sizeof(mp_limb_t));
        Binv = flint_malloc(n * sizeof(mp_limb_t));

        flint_mpn_urandomb(A, state, n * FLINT_BITS);
        flint_mpn_urandomb(B, state, n * FLINT_BITS);

        A[n - 1] |= UWORD(3) << (FLINT_BITS - 2);
        B[n - 1] |= UWORD(3) << (FLINT_BITS - 2);
        A[0] |= 1;
        B[0] |= 1;

        flint_mpn_mul_n(AB, A, B, n);

        TIMEIT_START
        __gmpn_divexact(C, AB, 2 * n, B, n);
        TIMEIT_STOP_VALUES(__, tgmp)

        flint_mpn_preinvn(Binv, B, n);
        TIMEIT_START
        flint_mpn_divrem_preinvn(C, D, AB, 2 * n, B, n, Binv);
        TIMEIT_STOP_VALUES(__, tpreinvn)

        TIMEIT_START
        flint_mpn_mullow_n(C, AB, B, n);
        TIMEIT_STOP_VALUES(__, tmullow)

        TIMEIT_START
        flint_mpn_mullow_n(C, AB, B, n / 2);
        flint_mpn_mulhigh_n(C, AB, B + n / 2, n / 2);
        TIMEIT_STOP_VALUES(__, tbidirectional)

        flint_printf("%10wd     %.3f            %.3f    %.3f\n", n, tpreinvn / tgmp, tmullow / tgmp, tbidirectional / tgmp);

        flint_free(A);
        flint_free(B);
        flint_free(AB);
        flint_free(C);
        flint_free(D);
        flint_free(Binv);
    }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions