Montgomery Multiplication¶
Many algorithms in number theory, like prime testing or integer factorization, and in cryptography, like RSA, require lots of operations modulo a large number.
A multiplications like
The Montgomery (modular) multiplication is a method that allows computing such multiplications faster.
Instead of dividing the product and subtracting
Montgomery representation¶
However the Montgomery multiplication doesn't come for free. The algorithm works only in the Montgomery space. And we need to transform our numbers into that space, before we can start multiplying.
For the space we need a positive integer
The representative
Notice, the transformation is actually such a multiplication that we want to optimize.
So this is still an expensive operation.
However you only need to transform a number once into the space.
As soon as you are in the Montgomery space, you can perform as many operations as you want efficiently.
And at the end you transform the final result back.
So as long as you are doing lots of operations modulo
Inside the Montgomery space you can still perform most operations as usual.
You can add two elements (
However this is not the case for multiplication.
We expect the result to be:
But the normal multiplication will give us:
Therefore the multiplication in the Montgomery space is defined as:
Montgomery reduction¶
The multiplication of two numbers in the Montgomery space requires an efficient computation of
Because
Both
Using this identity we can write
The equivalences hold for any arbitrary integer
This gives us the following algorithm to compute
function reduce(x):
q = (x mod r) * n' mod r
a = (x - q * n) / r
if a < 0:
a += n
return a
Since
As we see, we can perform the Montgomery reduction without any heavy modulo operations.
If we choose
A second application of the Montgomery reduction is to transfer a number back from the Montgomery space into the normal space.
Fast inverse trick¶
For computing the inverse
This can easily be proven.
If we have
This means we can start with
Implementation¶
Using the GCC compiler we can compute __int128
and __uint128
.
long long result = (__int128)x * y % n;
However there is no type for 256 bit integer. Therefore we will here show an implementation for a 128 bit multiplication.
using u64 = uint64_t;
using u128 = __uint128_t;
using i128 = __int128_t;
struct u256 {
u128 high, low;
static u256 mult(u128 x, u128 y) {
u64 a = x >> 64, b = x;
u64 c = y >> 64, d = y;
// (a*2^64 + b) * (c*2^64 + d) =
// (a*c) * 2^128 + (a*d + b*c)*2^64 + (b*d)
u128 ac = (u128)a * c;
u128 ad = (u128)a * d;
u128 bc = (u128)b * c;
u128 bd = (u128)b * d;
u128 carry = (u128)(u64)ad + (u128)(u64)bc + (bd >> 64u);
u128 high = ac + (ad >> 64u) + (bc >> 64u) + (carry >> 64u);
u128 low = (ad << 64u) + (bc << 64u) + bd;
return {high, low};
}
};
struct Montgomery {
Montgomery(u128 n) : mod(n), inv(1) {
for (int i = 0; i < 7; i++)
inv *= 2 - n * inv;
}
u128 init(u128 x) {
x %= mod;
for (int i = 0; i < 128; i++) {
x <<= 1;
if (x >= mod)
x -= mod;
}
return x;
}
u128 reduce(u256 x) {
u128 q = x.low * inv;
i128 a = x.high - u256::mult(q, mod).high;
if (a < 0)
a += mod;
return a;
}
u128 mult(u128 a, u128 b) {
return reduce(u256::mult(a, b));
}
u128 mod, inv;
};
Fast transformation¶
The current method of transforming a number into Montgomery space is pretty slow. There are faster ways.
You can notice the following relation:
Transforming a number into the space is just a multiplication inside the space of the number with
In the following code we initialize r2
with -n % n
, which is equivalent to
struct Montgomery {
Montgomery(u128 n) : mod(n), inv(1), r2(-n % n) {
for (int i = 0; i < 7; i++)
inv *= 2 - n * inv;
for (int i = 0; i < 4; i++) {
r2 <<= 1;
if (r2 >= mod)
r2 -= mod;
}
for (int i = 0; i < 5; i++)
r2 = mul(r2, r2);
}
u128 init(u128 x) {
return mult(x, r2);
}
u128 mod, inv, r2;
};