I’ve talked a few times about ElGamal encryption, where a participant *A* has a private key *a* and a public key *g*^{a} mod *p* for some public prime *p* and generator *g*.

Now let’s suppose we’re an attacker and we’re after *A*‘s private key. We know her public key *g*^{a}, we know *g*, and we know *p*. Can we figure out *a* = log_{g} *g*^{a} mod *p*? This is called the “discrete logarithm problem.”

## Brute force

If *p* is extremely small or we are extremely patient we can do it very easily. Start with 1 = *g*^{0}. Multiply by *g* to get *g*. Multiply again to get *g*^{2} mod *p*. Multiply a third time to get *g*^{3} mod *p*. Keep going until you get *g*^{a}. Then *a* is just the number of times you had to multiply.

For example, consider implementing ElGamal encrypt, decrypt, sign, and verify signature. In this toy example *p* = 101, *g* = 2, *A*‘s private key was 57, and her public key was 2^{57} mod 101 = 74. Since we (as the attacker) know *g* and *p* we can just calculate, doubling as we go, and subtracting 101 when we can:

- 2
^{0} mod 101 = 1 - 2
^{1} mod 101 = 2 - 2
^{2} mod 101 = 4 - …
- 2
^{6} mod 101 = 64 - 2
^{7} mod 101 = 27 - 2
^{8} mod 101 = 54 - …
- 2
^{56} mod 101 = 37 - 2
^{57} mod 101 = 74

So log_{2} 74 = 57 mod 101 and now we know *A*‘s private key.

This approach takes O(*p*) time so it quickly becomes impractical as *p* gets big.

## Baby-step giant-step

I blogged earlier about a speed improvement: David Shanks’ baby-step/giant-step algorithm. The basic idea is to precompute *g*^{i} mod *p* only up to *s* = ⌈√*p*⌉ and then store them in a lookup table. Then use a little algebra to query the lookup table up to *s* times. This takes O(√*p*) time and space so it can let us squeeze out a little more juice but as √*p* gets big too this also becomes impractical.

## Pohlig-Hellman

Stephen Pohlig and Martin Hellman (yes, *that* Hellman) published their algorithm for taking discrete logarithms which works by analyzing the group *Z*_{p}^{x}.

The group has *p* – 1 elements {1, 2, …, *p* – 1}. (Note that 0 is not included.) This means that the order of the subgroup generated by any given element divides *p* – 1.

Factor *p* – 1 as a product of powers of primes: *p* – 1 = *p*_{1}^{k1} *p*_{2}^{k2} … *p*_{n}^{kn}.

To calculate *y* = log_{g} *x* mod *p*:

- Calculate
*y* mod *p*_{1}^{k1} (more on this later) - Calculate
*y* mod *p*_{2}^{k2} - …
- Calculate
*y* mod *p*_{n}^{kn} - Use the Chinese Remainder Theorem to reassemble
*y* from its various modular residues.

That’s all well and good, but how do we calculate *y* mod *p*_{i}^{ki}?

Let *q* = *p*_{i} and let *k* = *k*_{i}. The trick to the algorithm is to realize that *y* mod *q*^{k} can be written in base *q* as *y*_{0} + *y*_{1} *q* + *y*_{2} *q*^{2} + … + *y*_{k – 1} *q*^{k – 1}. Each of the *y*_{i} is a number in {0, 1, …, *q* – 1} and can be evaluated in O(√*q*) time and space using a little algebra, e.g. the fact that *g*^{(p – 1)/q} is an element of order *q*.

So even if *p* is big, **if ***p* – 1 has all small prime factors, private keys can be extracted from public keys very quickly!

I wrote up an implementation of Pohlig-Hellman in PowerShell and tested it out.

First I found a (*p* – 1, *p*) pair where *p* was reasonably big and *p* – 1 had all small prime factors:

> cd pohlig-hellman
> .\find-unsafe-prime.ps1 -n 10000000
Sieving primes up to 10000000 + 1...
Finding largest small prime and the product of small primes from 100000000000000 to 100000020000000
Finding the prime between 100000000000000 and 100000020000000 with the smoothest p - 1
Prime 100000008359681 is one more than a number whose biggest prime factor is 53
> $p = [bigint]::Parse("100000008359681")
> $p
100000008359681

100000008359681 is prime, and 100000008359680 = 2^{8} 5 11^{2} 13 19 31 37 43 53 has only small prime factors.

Then I found a generator:

> .\find-generator.ps1 -p 100000008359681
2^((100000008359681 - 1)/2) mod 100000008359681 = 1, rejecting 2
3^((100000008359681 - 1)/5) mod 100000008359681 = 1, rejecting 3
4^((100000008359681 - 1)/2) mod 100000008359681 = 1, rejecting 4
5^((100000008359681 - 1)/2) mod 100000008359681 = 1, rejecting 5
6 passes all checks
> $g = [bigint]::Parse("6")
> $g
6

Then I (arbitrarily) picked public key 6^{a} = 2 mod 100000008359681. Now I have enough to calculate the discrete logarithm to find the private key *a*.

Actually, √*p* is still small enough that baby-step giant-step *can* crack this logarithm directly, but it is slow. On my machine it finds log_{6} 2 = 67708279093016 mod 100000008359681 in a little over eleven minutes:

> cd baby-step-giant-step
> Measure-Command { .\baby-step-giant-step.ps1 -p $p -g $g -x 2 }
Find y so that 6^y = 2 mod 100000008359681
log_6 2 mod 100000008359681 = 67708279093016
6^67708279093016 mod 100000008359681 = 2
...
TotalMinutes : 11.3536922983333
...

By comparison, Pohlig-Hellman finds the same logarithm on the same machine **in less than one-tenth of a second**:

> cd pohlig-hellman
> Measure-Command { .\pohlig-hellman.ps1 -p $p -g $g -x 2 }
log_6 2 = 67708279093016 mod 100000008359681
6^67708279093016 = 2 mod 100000008359681
...
TotalMilliseconds : 74.797
...