Extended Euclidean Algorithm - Algorithmica
Extended Euclidean Algorithm

Extended Euclidean Algorithm

Fermat’s theorem allows us to calculate modular multiplicative inverses through binary exponentiation in O(logn)O(\log n) operations, but it only works with prime modula. There is a generalization of it, Euler’s theorem, stating that if mm and aa are coprime, then aϕ(m)1(modm) a^{\phi(m)} \equiv 1 \pmod m

where ϕ(m)\phi(m) is Euler’s totient function defined as the number of positive integers x<mx < m that are coprime with mm. In the special case when mm is a prime, then all the m1m - 1 residues are coprime and ϕ(m)=m1\phi(m) = m - 1, yielding the Fermat’s theorem.

This lets us calculate the inverse of aa as aϕ(m)1a^{\phi(m) - 1} if we know ϕ(m)\phi(m), but in turn, calculating it is not so fast: you usually need to obtain the factorization of mm to do it. There is a more general method that works by modifying the the Euclidean algorthm.

#Algorithm

Extended Euclidean algorithm, apart from finding g=gcd(a,b)g = \gcd(a, b), also finds integers xx and yy such that

ax+by=g a \cdot x + b \cdot y = g which solves the problem of finding modular inverse if we substitute bb with mm and gg with 11: a1a+km=1 a^{-1} \cdot a + k \cdot m = 1

Note that, if aa is not coprime with mm, there is no solution since no integer combination of aa and mm can yield anything that is not a multiple of their greatest common divisor.

The algorithm is also recursive: it calculates the coefficients xx’ and yy’ for gcd(b,amodb)\gcd(b, a \bmod b) and restores the solution for the original number pair. If we have a solution (x,y)(x’, y’) for the pair (b,amodb)(b, a \bmod b)

bx+(amodb)y=g b \cdot x' + (a \bmod b) \cdot y' = g then, to get the solution for the initial input, we can rewrite the expression (amodb)(a \bmod b) as (aabb)(a - \lfloor \frac{a}{b} \rfloor \cdot b) and subsitute it into the aforementioned equation: bx+(aabb)y=g b \cdot x' + (a - \Big \lfloor \frac{a}{b} \Big \rfloor \cdot b) \cdot y' = g Now we rearrange the terms grouping by aa and bb to get ayx+b(xaby)y=g a \cdot \underbrace{y'}_x + b \cdot \underbrace{(x' - \Big \lfloor \frac{a}{b} \Big \rfloor \cdot y')}_y = g

Comparing it with the initial expression, we infer that we can just use coefficients of aa and bb for the initial xx and yy.

#Implementation

We implement the algorithm as a recursive function. Since its output is not one but three integers, we pass the coefficients to it by reference:

int gcd(int a, int b, int &x, int &y) {
    if (a == 0) {
        x = 0;
        y = 1;
        return b;
    }
    int x1, y1;
    int d = gcd(b % a, a, x1, y1);
    x = y1 - (b / a) * x1;
    y = x1;
    return d;
}

To calculate the inverse, we simply pass aa and mm and return the xx coefficient the algorithm finds. Since we pass two positive numbers, one of the coefficient will be positive and the other one is negative (which one depends on whether the number of iterations is odd or even), so we need to optionally check if xx is negative and add mm to get a correct residue:

int inverse(int a) {
    int x, y;
    gcd(a, M, x, y);
    if (x < 0)
        x += M;
    return x;
}

It works in ~160ns — 10ns faster than inverting numbers with binary exponentiation. To optimize it further, we can similarly turn it iterative ­— which takes 135ns:

int inverse(int a) {
    int b = M, x = 1, y = 0;
    while (a != 1) {
        y -= b / a * x;
        b %= a;
        swap(a, b);
        swap(x, y);
    }
    return x < 0 ? x + M : x;
}

Note that, unlike binary exponentiation, the running time depends on the value of aa. For example, for this particular value of mm (109+710^9 + 7), the worst input happens to be 564400443, for which the algorithm performs 37 iterations and takes 250ns.

Exercise. Try to adapt the same technique for the binary GCD (it won’t give performance speedup though unless you are better than me at optimization).