Skip to content

Binary Exponentiation by Factoring

Consider a problem of computing axy(mod2d), given integers a, x, y and d3, where x is odd.

The algorithm below allows to solve this problem with O(d) additions and binary operations and a single multiplication by y.

Due to the structure of the multiplicative group modulo 2d, any number x such that x1(mod4) can be represented as

xbL(x)(mod2d),

where b5(mod8). Without loss of generality we assume that x1(mod4), as we can reduce x3(mod4) to x1(mod4) by substituting xx and a(1)ya. In this notion, axy is represented as

axyabyL(x)(mod2d).

The core idea of the algorithm is to simplify the computation of L(x) and byL(x) using the fact that we're working modulo 2d. For reasons that will be apparent later on, we'll be working with 4L(x) rather than L(x), but taken modulo 2d instead of 2d2.

In this article, we will cover the implementation for 32-bit integers. Let

  • mbin_log_32(r, x) be a function that computes r+4L(x)(mod2d);
  • mbin_exp_32(r, x) be a function that computes rbx4(mod2d);
  • mbin_power_odd_32(a, x, y) be a function that computes axy(mod2d).

Then mbin_power_odd_32 is implemented as follows:

uint32_t mbin_power_odd_32(uint32_t rem, uint32_t base, uint32_t exp) {
    if (base & 2) {
        /* divider is considered negative */
        base = -base;
        /* check if result should be negative */
        if (exp & 1) {
            rem = -rem;
        }
    }
    return (mbin_exp_32(rem, mbin_log_32(0, base) * exp));
}

Computing 4L(x) from x

Let x be an odd number such that x1(mod4). It can be represented as

x(2a1+1)(2ak+1)(mod2d),

where 1<a1<<ak<d. Here L() is well-defined for each multiplier, as they're equal to 1 modulo 4. Hence,

4L(x)4L(2a1+1)++4L(2ak+1)(mod2d).

So, if we precompute tk=4L(2n+1) for all 1<k<d, we will be able to compute 4L(x) for any number x.

For 32-bit integers, we can use the following table:

const uint32_t mbin_log_32_table[32] = {
    0x00000000, 0x00000000, 0xd3cfd984, 0x9ee62e18,
    0xe83d9070, 0xb59e81e0, 0xa17407c0, 0xce601f80,
    0xf4807f00, 0xe701fe00, 0xbe07fc00, 0xfc1ff800,
    0xf87ff000, 0xf1ffe000, 0xe7ffc000, 0xdfff8000,
    0xffff0000, 0xfffe0000, 0xfffc0000, 0xfff80000,
    0xfff00000, 0xffe00000, 0xffc00000, 0xff800000,
    0xff000000, 0xfe000000, 0xfc000000, 0xf8000000,
    0xf0000000, 0xe0000000, 0xc0000000, 0x80000000,
};

On practice, a slightly different approach is used than described above. Rather than finding the factorization for x, we will consequently multiply x with 2n+1 until we turn it into 1 modulo 2d. In this way, we will find the representation of x1, that is

x(2a1+1)(2ak+1)1(mod2d).

To do this, we iterate over n such that 1<n<d. If the current x has n-th bit set, we multiply x with 2n+1, which is conveniently done in C++ as x = x + (x << n). This won't change bits lower than n, but will turn the n-th bit to zero, because x is odd.

With all this in mind, the function mbin_log_32(r, x) is implemented as follows:

uint32_t mbin_log_32(uint32_t r, uint32_t x) {
    uint8_t n;

    for (n = 2; n < 32; n++) {
        if (x & (1 << n)) {
            x = x + (x << n);
            r -= mbin_log_32_table[n];
        }
    }

    return r;
}

Note that 4L(x)=4L(x1), so instead of adding 4L(2n+1), we subtract it from r, which initially equates to 0.

Computing x from 4L(x)

Note that for k1 it holds that

(a2k+1)2=a222k+a2k+1+1=b2k+1+1,

from which (by repeated squaring) we can deduce that

(2a+1)2b1(mod2a+b).

Applying this result to a=2n+1 and b=dk we deduce that the multiplicative order of 2n+1 is a divisor of 2dn.

This, in turn, means that L(2n+1) must be divisible by 2n, as the order of b is 2d2 and the order of by is 2d2v, where 2v is the highest power of 2 that divides y, so we need

2dk0(mod2d2v),

thus v must be greater or equal than k2. This is a bit ugly and to mitigate this we said in the beginning that we multiply L(x) by 4. Now if we know 4L(x), we can uniquely decomposing it into a sum of 4L(2n+1) by consequentially checking bits in 4L(x). If the n-th bit is set to 1, we will multiply the result with 2n+1 and reduce the current 4L(x) by 4L(2n+1).

Thus, mbin_exp_32 is implemented as follows:

uint32_t mbin_exp_32(uint32_t r, uint32_t x) {
    uint8_t n;

    for (n = 2; n < 32; n++) {
        if (x & (1 << n)) {
            r = r + (r << n);
            x -= mbin_log_32_table[n];
        }
    }

    return r;
}

Further optimizations

It is possible to halve the number of iterations if you note that 4L(2d1+1)=2d1 and that for 2kd it holds that

(2n+1)222n+2n+1+12n+1+1(mod2d),

which allows to deduce that 4L(2n+1)=2n for 2nd. So, you could simplify the algorithm by only going up to d2 and then use the fact above to compute the remaining part with bitwise operations:

uint32_t mbin_log_32(uint32_t r, uint32_t x) {
    uint8_t n;

    for (n = 2; n != 16; n++) {
        if (x & (1 << n)) {
            x = x + (x << n);
            r -= mbin_log_32_table[n];
        }
    }

    r -= (x & 0xFFFF0000);

    return r;
}

uint32_t mbin_exp_32(uint32_t r, uint32_t x) {
    uint8_t n;

    for (n = 2; n != 16; n++) {
        if (x & (1 << n)) {
            r = r + (r << n);
            x -= mbin_log_32_table[n];
        }
    }

    r *= 1 - (x & 0xFFFF0000);

    return r;
}

Computing logarithm table

To compute log-table, one could modify the Pohlig–Hellman algorithm for the case when modulo is a power of 2.

Our main task here is to compute x such that gxy(mod2d), where g=5 and y is a number of kind 2n+1.

Squaring both parts k times we arrive to

g2kxy2k(mod2d).

Note that the order of g is not greater than 2d (in fact, than 2d2, but we will stick to 2d for convenience), hence using k=d1 we will have either g1 or g0 on the left hand side which allows us to determine the smallest bit of x by comparing y2k to g. Now assume that x=x0+2kx1, where x0 is a known part and x1 is not yet known. Then

gx0+2kx1y(mod2d).

Multiplying both parts with gx0, we get

g2kx1(gx0y)(mod2d).

Now, squaring both sides dk1 times we can obtain the next bit of x, eventually recovering all its bits.

References