Integer divexact with precomputed inverse

In #2371 ``fmpz_vec`` there is code for nx1-limb divexact with precomputed inverse which could be moved to ``ulong_extras`` and ``mpn_extras``.

For multilimb divexact, one could be lazy and use ``flint_mpn_divrem_preinvn`` and it turns out this beats ``mpn_divexact`` above roughly 1000 limbs. Even better is to do a single ``flint_mpn_mulhigh_n`` or ``flint_mpn_mullow_n`` and exploit the fact that the quotient is known to be exact. Better yet is the bidirectional algorithm: compute the high ``n/2`` limbs using mulhigh and the low ``n/2`` limbs using mullow (one might need ``n/2+1`` limbs to glue things together correctly, not sure).

Quick benchmark, timings relative to ``mpn_divexact`` (lower is better):

```
         n     divrem_preinvn   mullow   bidirectional
         2     1.562            0.102    0.182
         4     1.469            0.151    0.151
         8     1.281            0.307    0.191
        16     1.747            0.618    0.285
        32     1.943            0.824    0.429
        64     1.867            0.871    0.483
       128     1.483            0.716    0.446
       256     1.350            0.663    0.421
       512     1.256            0.612    0.404
      1024     0.761            0.376    0.389
      2048     0.634            0.309    0.283
      4096     0.528            0.252    0.243
      8192     0.450            0.217    0.210
     16384     0.410            0.202    0.193
     32768     0.384            0.190    0.185
     65536     0.339            0.167    0.161
    131072     0.350            0.171    0.161
    262144     0.298            0.145    0.134
    524288     0.268            0.132    0.126
   1048576     0.276            0.130    0.118
   2097152     0.282            0.139    0.118
   4194304     0.267            0.130    0.119
   8388608     0.273            0.135    0.124
  16777216     0.287            0.144    0.127
```

Of course, that is just the basic idea; there will be some more overhead for adjustments, and it gets a bit more tricky to do all unbalanced cases optimally.

Once some ``divexact_preinv`` algorithms are implemented, one could also implement a ``divexact`` which computes the inverse on the fly (with different tuning since this will also have to account for the cost of computing different inverses).

Benchmark code:

```
#include <gmp.h>
#include "mpn_extras.h"
#include "profiler.h"

void __gmpn_divexact(mp_ptr, mp_srcptr, mp_size_t, mp_srcptr, mp_size_t);

int main()
{
    nn_ptr C, D, A, B, AB, Binv;
    slong n;
    double __, tgmp, tinv, tpreinvn, tmullow, tbidirectional;

    flint_rand_t state;
    flint_rand_init(state);

    flint_printf("         n     divrem_preinvn   mullow   bidirectional\n");

    for (n = 2; ; n *= 2)
    {
        A = flint_malloc(n * sizeof(mp_limb_t));
        B = flint_malloc(n * sizeof(mp_limb_t));
        AB = flint_malloc(2 * n * sizeof(mp_limb_t));
        C = flint_malloc(2 * n * sizeof(mp_limb_t));
        D = flint_malloc(2 * n * sizeof(mp_limb_t));
        Binv = flint_malloc(n * sizeof(mp_limb_t));

        flint_mpn_urandomb(A, state, n * FLINT_BITS);
        flint_mpn_urandomb(B, state, n * FLINT_BITS);

        A[n - 1] |= UWORD(3) << (FLINT_BITS - 2);
        B[n - 1] |= UWORD(3) << (FLINT_BITS - 2);
        A[0] |= 1;
        B[0] |= 1;

        flint_mpn_mul_n(AB, A, B, n);

        TIMEIT_START
        __gmpn_divexact(C, AB, 2 * n, B, n);
        TIMEIT_STOP_VALUES(__, tgmp)

        flint_mpn_preinvn(Binv, B, n);
        TIMEIT_START
        flint_mpn_divrem_preinvn(C, D, AB, 2 * n, B, n, Binv);
        TIMEIT_STOP_VALUES(__, tpreinvn)

        TIMEIT_START
        flint_mpn_mullow_n(C, AB, B, n);
        TIMEIT_STOP_VALUES(__, tmullow)

        TIMEIT_START
        flint_mpn_mullow_n(C, AB, B, n / 2);
        flint_mpn_mulhigh_n(C, AB, B + n / 2, n / 2);
        TIMEIT_STOP_VALUES(__, tbidirectional)

        flint_printf("%10wd     %.3f            %.3f    %.3f\n", n, tpreinvn / tgmp, tmullow / tgmp, tbidirectional / tgmp);

        flint_free(A);
        flint_free(B);
        flint_free(AB);
        flint_free(C);
        flint_free(D);
        flint_free(Binv);
    }
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integer divexact with precomputed inverse #2375

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Integer divexact with precomputed inverse #2375

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions