Hacking private keys from unsafe primes using the Pohlig-Hellman discrete logarithm

I’ve talked a few times about ElGamal encryption, where a participant A has a private key a and a public key ga mod p for some public prime p and generator g.

Now let’s suppose we’re an attacker and we’re after A‘s private key. We know her public key ga, we know g, and we know p. Can we figure out a = logg ga mod p? This is called the “discrete logarithm problem.”

Brute force

If p is extremely small or we are extremely patient we can do it very easily. Start with 1 = g0. Multiply by g to get g. Multiply again to get g2 mod p. Multiply a third time to get g3 mod p. Keep going until you get ga. Then a is just the number of times you had to multiply.

For example, consider implementing ElGamal encrypt, decrypt, sign, and verify signature. In this toy example p = 101, g = 2, A‘s private key was 57, and her public key was 257 mod 101 = 74. Since we (as the attacker) know g and p we can just calculate, doubling as we go, and subtracting 101 when we can:

  • 20 mod 101 = 1
  • 21 mod 101 = 2
  • 22 mod 101 = 4
  • 26 mod 101 = 64
  • 27 mod 101 = 27
  • 28 mod 101 = 54
  • 256 mod 101 = 37
  • 257 mod 101 = 74

So log2 74 = 57 mod 101 and now we know A‘s private key.

This approach takes O(p) time so it quickly becomes impractical as p gets big.

Baby-step giant-step

I blogged earlier about a speed improvement: David Shanks’ baby-step/giant-step algorithm. The basic idea is to precompute gi mod p only up to s = ⌈√p⌉ and then store them in a lookup table. Then use a little algebra to query the lookup table up to s times. This takes O(√p) time and space so it can let us squeeze out a little more juice but as √p gets big too this also becomes impractical.

Pohlig-Hellman

Stephen Pohlig and Martin Hellman (yes, that Hellman) published their algorithm for taking discrete logarithms which works by analyzing the group Zpx.

The group has p – 1 elements {1, 2, …, p – 1}. (Note that 0 is not included.) This means that the order of the subgroup generated by any given element divides p – 1.

Factor p – 1 as a product of powers of primes: p – 1 = p1k1 p2k2pnkn.

To calculate y = logg x mod p:

  1. Calculate y mod p1k1 (more on this later)
  2. Calculate y mod p2k2
  3. Calculate y mod pnkn
  4. Use the Chinese Remainder Theorem to reassemble y from its various modular residues.

That’s all well and good, but how do we calculate y mod piki?

Let q = pi and let k = ki. The trick to the algorithm is to realize that y mod qk can be written in base q as y0 + y1 q + y2 q2 + … + yk – 1 qk – 1. Each of the yi is a number in {0, 1, …, q – 1} and can be evaluated in O(√q) time and space using a little algebra, e.g. the fact that g(p – 1)/q is an element of order q.

So even if p is big, if p – 1 has all small prime factors, private keys can be extracted from public keys very quickly!

I wrote up an implementation of Pohlig-Hellman in PowerShell and tested it out.

First I found a (p – 1, p) pair where p was reasonably big and p – 1 had all small prime factors:

> cd pohlig-hellman
> .\find-unsafe-prime.ps1 -n 10000000
Sieving primes up to 10000000 + 1...
Finding largest small prime and the product of small primes from 100000000000000 to 100000020000000
Finding the prime between 100000000000000 and 100000020000000 with the smoothest p - 1
Prime 100000008359681 is one more than a number whose biggest prime factor is 53
> $p = [bigint]::Parse("100000008359681")
> $p
100000008359681 

100000008359681 is prime, and 100000008359680 = 28 5 112 13 19 31 37 43 53 has only small prime factors.

Then I found a generator:

> .\find-generator.ps1 -p 100000008359681
2^((100000008359681 - 1)/2) mod 100000008359681 = 1, rejecting 2
3^((100000008359681 - 1)/5) mod 100000008359681 = 1, rejecting 3
4^((100000008359681 - 1)/2) mod 100000008359681 = 1, rejecting 4
5^((100000008359681 - 1)/2) mod 100000008359681 = 1, rejecting 5
6 passes all checks
> $g = [bigint]::Parse("6")
> $g
6

Then I (arbitrarily) picked public key 6a = 2 mod 100000008359681. Now I have enough to calculate the discrete logarithm to find the private key a.

Actually, √p is still small enough that baby-step giant-step can crack this logarithm directly, but it is slow. On my machine it finds log6 2 = 67708279093016 mod 100000008359681 in a little over eleven minutes:

> cd baby-step-giant-step
> Measure-Command { .\baby-step-giant-step.ps1 -p $p -g $g -x 2 }
Find y so that 6^y = 2 mod 100000008359681
log_6 2 mod 100000008359681 = 67708279093016
6^67708279093016 mod 100000008359681 = 2
...
TotalMinutes      : 11.3536922983333
...

By comparison, Pohlig-Hellman finds the same logarithm on the same machine in less than one-tenth of a second:

> cd pohlig-hellman
> Measure-Command { .\pohlig-hellman.ps1 -p $p -g $g -x 2 }
log_6 2 = 67708279093016 mod 100000008359681
6^67708279093016 = 2 mod 100000008359681
...
TotalMilliseconds : 74.797
...

How safe is the prime 2²⁵⁵ – 19?

A while back we talked about safe primes and unsafe primes and we gave some small examples of safe primes and unsafe primes.

Recap: A has a secret key a and a public key (ga mod p) with a known prime p and generator g. It is crucial that an attacker not be able to take the discrete logarithm a = logg ga mod p. If they could do that, the attacker would have A‘s private key, without even having to eavesdrop on any communications!

The Pohlig–Hellman algorithm allows an attacker to perform this discrete logarithm. It takes time proportional to the largest prime factor of p – 1. So if p – 1’s prime factors are all small, the attacker can extract private keys easily from public keys. But if there is at least one large prime factor of p – 1, then we’re OK; the larger, the better.

So the unsafest primes are the Fermat primes where p = 2n + 1 for some n. The largest prime factor of p – 1 is only 2!

The safest primes are those of the form p = 2q + 1 where q is a Sophie Germain prime. The largest prime factor of p – 1 is (p – 1) / 2 = q.

How safe is the prime p = 2255 – 19 = 57896044618658097711785492504343953926634992332820282019728792003956564819949? Let’s factor p – 1 and see.

p – 1 = 2255 – 19 – 1 = 2255 – 20 is divisible by 2 – and even by 4 – but not by 8. So we have p – 1 = 22(2253 – 5). So far so good.

What about 2253 – 5? It’s NOT divisible by 5. What about by 3?

With a little bit of thought we can see that it is divisible by 3. 2 = -1 mod 3 and 5 = -1 mod 3, so substituting we have 2253 – 5 = (-1)253 – (-1) = -1 – (-1) = 0 mod 3. (2253 – 5)/3 doesn’t seem to have a nice concise form but we can just write it out in full.

So far we have p – 1 = 22 3 4824670384888174809315457708695329493886249361068356834977399333663047068329. What about 4824670384888174809315457708695329493886249361068356834977399333663047068329?

A quick poke at it with Miller-Rabin shows that it is NOT prime but does not offer a hint as to any factors. So we turn to brute-force trial division and after some time we find that it is divisible by 65147.

So far we have p – 1 = 22 3 65147 74058212732561358302231226437062788676166966415465897661863160754340907. What about 74058212732561358302231226437062788676166966415465897661863160754340907?

Miller-Rabin suggests that this COULD BE prime. In fact, it is prime. Here’s a proof.

The largest prime factor of p – 1 is pretty big. So p is pretty safe. Good.

We also have to take care when choosing a generator not to fall into any of the small cycles of length 2a 3b 65147c with a ∈ {0, 1, 2}, b ∈ {0, 1}, c ∈ {0, 1}.

ElGamal signing and signature verification but with bigger numbers

Last time we looked at ElGamal encryption and decryption but with bigger numbers. This time let’s look at signing and signature verification.

As before we choose the prime p = 2255 – 19 and the generator g = 2. We’ll let A keep her private key x = 37472449428228375763830797714320932193767211994174601506224685902244496215115 and its associated public key gx mod p = 15440027450603317705548401356286425311467546169515779468886789260083908558514.

We’ll choose a different message this time:

> cd random-biginteger
> .\random-biginteger.ps1 -min 0 -max $p
3271986818346674557032810660246111638748594301259911249641768028517949236833
> $m = [bigint]::Parse("3271986818346674557032810660246111638748594301259911249641768028517949236833")
> $m
3271986818346674557032810660246111638748594301259911249641768028517949236833

A wants to sign the message. She passes the signing routine the prime p, the generator g, her private key x, and the message m. It returns the signature (r, s).

> cd elgamal
> .\sign.ps1 -prime $p -generator $g -signerPrivateKey $x -message $m
Signer chooses a random number k = 24618296997275480932053592551920025862416126779629427099702631482415856443479 such that gcd(k = 24618296997275480932053592551920025862416126779629427099702631482415856443479, p - 1 = 57896044618658097711785492504343953926634992332820282019728792003956564819948) = 1
Signer calculates the signature (r, s) = (3025781994294452178840475154004769894024031449788332246545481203211043144236, 20885180761607031118214763642348557339838023745006310985614125237008647415483)
These are calculated so g^m = y^r r^s mod p
Take r = g^k
Now we have g^m = g^(x r) g^(k s) mod p
Which is to say m = x r + k s mod (p - 1)
We want to solve for s - this is why we needed gcd(k, p - 1) is 1
    * r = (g = 2)^(k = 24618296997275480932053592551920025862416126779629427099702631482415856443479) mod (p = 57896044618658097711785492504343953926634992332820282019728792003956564819949) = 3025781994294452178840475154004769894024031449788332246545481203211043144236
    * s which makes (m = 3271986818346674557032810660246111638748594301259911249641768028517949236833) = (x = 37472449428228375763830797714320932193767211994174601506224685902244496215115) * (r = 3025781994294452178840475154004769894024031449788332246545481203211043144236) + (k = 24618296997275480932053592551920025862416126779629427099702631482415856443479) * (s = 20885180761607031118214763642348557339838023745006310985614125237008647415483) mod (p - 1 = 57896044618658097711785492504343953926634992332820282019728792003956564819948)
Signature is (r, s) = (3025781994294452178840475154004769894024031449788332246545481203211043144236, 20885180761607031118214763642348557339838023745006310985614125237008647415483)
> $signature = @([bigint]::Parse("3025781994294452178840475154004769894024031449788332246545481203211043144236"), [bigint]::Parse("20885180761607031118214763642348557339838023745006310985614125237008647415483"))
> $signature
3025781994294452178840475154004769894024031449788332246545481203211043144236
20885180761607031118214763642348557339838023745006310985614125237008647415483

Now B, the reader of the message, wants to verify that A really wrote it. He passes the signature verification routine the prime p, the generator g, A‘s public key gx, the message m, and the signature (r, s). It returns whether the signature is valid.

It should be valid…

> .\verify-signature.ps1 -prime $p -generator $g -signerPublicKey $g_x -message $m -signature $signature
Signature verifier calculates (g = 2)^(m = 3271986818346674557032810660246111638748594301259911249641768028517949236833) = 14784288759268166937333969698279404318706460490042691327601374955158440934277
Signature verifier calculates = (y = 15440027450603317705548401356286425311467546169515779468886789260083908558514)^(r = 3025781994294452178840475154004769894024031449788332246545481203211043144236) (r = 3025781994294452178840475154004769894024031449788332246545481203211043144236)^(s = 20885180761607031118214763642348557339838023745006310985614125237008647415483) mod (p = 57896044618658097711785492504343953926634992332820282019728792003956564819949) = 14784288759268166937333969698279404318706460490042691327601374955158440934277
Signature is valid
Signature is valid.

… and it is!

Now suppose someone had tampered with the message, e.g. replacing m with m – 1, and tried to pass it off as the original, with A‘s signature. The signature should now be invalid…

> .\verify-signature.ps1 -prime $p -generator $g -signerPublicKey $g_x -message ($m - 1) -signature $signature
Signature verifier calculates (g = 2)^(m = 3271986818346674557032810660246111638748594301259911249641768028517949236832) = 36340166688963132324559731101311679122670726411431486673665083479557502877113
Signature verifier calculates = (y = 15440027450603317705548401356286425311467546169515779468886789260083908558514)^(r = 3025781994294452178840475154004769894024031449788332246545481203211043144236) (r = 3025781994294452178840475154004769894024031449788332246545481203211043144236)^(s = 20885180761607031118214763642348557339838023745006310985614125237008647415483) mod (p = 57896044618658097711785492504343953926634992332820282019728792003956564819949) = 14784288759268166937333969698279404318706460490042691327601374955158440934277
Signature is invalid
Signature is invalid. 

… and it is!

So the same routines all work even if the numbers are big.

ElGamal encryption and decryption but with bigger numbers

Earlier I blogged about implementing ElGamal encrypt, decrypt, sign, and verify signature. I had coded up a toy example which used small numbers to show how the math worked.

Now let’s do the same thing but with bigger numbers. We’ll just do encryption and decryption this time.

First we need a prime p. Before we used p = 101. This time we’ll use p = 2255 – 19:

> $p = [bigint]::pow(2, 255) - 19
> $p
5789604461865809771178549250434395392663499233282028201972879200395656481994

That’s nice and big.

Now we need a generator g. Last time we used g = 2. We’ll use it again, but make sure using Tonelli-Shanks that 2 is not a square mod p.

> cd tonelli-shanks
> .\tonelli-shanks.ps1 -p $p -n 2
Looking for r so that r^2 = 2 mod 57896044618658097711785492504343953926634992332820282019728792003956564819949
Skipping primality check since 57896044618658097711785492504343953926634992332820282019728792003956564819949 is large
r^2 = 2 mod 57896044618658097711785492504343953926634992332820282019728792003956564819949 has no solutions
> $g = [bigint]2
> $g
2

It’s important to do this check. If we had chosen g = 3, for example, the check would have failed, because 3 = 150298394704333910222651756369397732876262961010368454990880792759863347428352 mod 57896044618658097711785492504343953926634992332820282019728792003956564819949.

> .\tonelli-shanks.ps1 -p $p -n 3
Looking for r so that r^2 = 3 mod 57896044618658097711785492504343953926634992332820282019728792003956564819949
Skipping primality check since 57896044618658097711785492504343953926634992332820282019728792003956564819949 is large
57896044618658097711785492504343953926634992332820282019728792003956564819949 - 1 = 14474011154664524427946373126085988481658748083205070504932198000989141204987 * 2^2
2 is a nonsquare modulo 57896044618658097711785492504343953926634992332820282019728792003956564819949
Initial: m = 2, c = 19681161376707505956807079304988542015446066515923890162744021073123829784752, t = 1, r = 15029839470433391022265175636939773287626296101036845499088079275986334742835
Solutions:
15029839470433391022265175636939773287626296101036845499088079275986334742835^2 = 3 mod 57896044618658097711785492504343953926634992332820282019728792003956564819949
42866205148224706689520316867404180639008696231783436520640712727970230077114^2 = 3 mod 57896044618658097711785492504343953926634992332820282019728792003956564819949

So we use g = 2.

Now we need a recipient A. A needs a private key x which determines her public key gx mod p. Generate x at random between 0 (inclusive) and p (exclusive).

> cd random-biginteger
> .\random-biginteger.ps1 -min 0 -max $p
37472449428228375763830797714320932193767211994174601506224685902244496215115
> $x = [bigint]::Parse("37472449428228375763830797714320932193767211994174601506224685902244496215115")
> $x
37472449428228375763830797714320932193767211994174601506224685902244496215115
> $g_x = [bigint]::ModPow($g, $x, $p)
> $g_x
15440027450603317705548401356286425311467546169515779468886789260083908558514

We want to encrypt a message so that only A can read it, using her public key. So we need a plaintext message m. Generate m at random in similar fashion.

> .\random-biginteger.ps1 -min 0 -max $p
22594335182736179043379580066191528397421893149471409561964986373640506607991
> $m = [bigint]::Parse("22594335182736179043379580066191528397421893149471409561964986373640506607991")
> $m
22594335182736179043379580066191528397421893149471409561964986373640506607991

Now we’re ready to encrypt! Give the encryption routine the prime p, the generator g, the recipient’s public key gx, and the plaintext message m. It will return the ciphertext (c1, c2).

> cd elgamal
> .\encrypt.ps1 -prime $p -generator $g -recipientPublicKey $g_x -clearText $m
Encryptor chooses a random number k = 6095530983035662394415401229300894157247477933650052462284854503095419296799
Encryptor calculates the key K = (y = 15440027450603317705548401356286425311467546169515779468886789260083908558514)^(k = 6095530983035662394415401229300894157247477933650052462284854503095419296799) mod (p = 57896044618658097711785492504343953926634992332820282019728792003956564819949) = 652128534616280580041338757470258024490635838501926874091259885745358830160
Encryptor calculates the encrypted message (c1, c2) = (25051380001384562314959307721043701048687850352094011138061884132819862724667, 46849668020527693475159053259226199454750762501012280228897524972061871412373)
    * c1 = (g = 2)^(k = 6095530983035662394415401229300894157247477933650052462284854503095419296799) mod (p = 57896044618658097711785492504343953926634992332820282019728792003956564819949) = 25051380001384562314959307721043701048687850352094011138061884132819862724667
    * c2 = (K = 652128534616280580041338757470258024490635838501926874091259885745358830160)(m = 22594335182736179043379580066191528397421893149471409561964986373640506607991) mod (p = 57896044618658097711785492504343953926634992332820282019728792003956564819949) = 46849668020527693475159053259226199454750762501012280228897524972061871412373
Encryption is (c1, c2) = (25051380001384562314959307721043701048687850352094011138061884132819862724667, 46849668020527693475159053259226199454750762501012280228897524972061871412373)
> $c = @([bigint]::Parse("25051380001384562314959307721043701048687850352094011138061884132819862724667"), [bigint]::Parse("46849668020527693475159053259226199454750762501012280228897524972061871412373"))
> $c
25051380001384562314959307721043701048687850352094011138061884132819862724667
46849668020527693475159053259226199454750762501012280228897524972061871412373

We pass the ciphertext (c1, c2) to A and now she wants to read it. She gives the decryption routine the prime p, the generator g, her private key x, and the ciphertext (c1, c2). It had better return the plaintext m

> .\decrypt.ps1 -prime $p -generator $g -recipientPrivateKey $x -cipherText $c
Recipient wants to decrypt the message
Recipient recovers K = (c1 = 25051380001384562314959307721043701048687850352094011138061884132819862724667)^(x = 37472449428228375763830797714320932193767211994174601506224685902244496215115) mod (p = 57896044618658097711785492504343953926634992332820282019728792003956564819949) = 652128534616280580041338757470258024490635838501926874091259885745358830160
Recipient calculates m = (c2 = 46849668020527693475159053259226199454750762501012280228897524972061871412373)/(K = 652128534616280580041338757470258024490635838501926874091259885745358830160) mod (p = 57896044618658097711785492504343953926634992332820282019728792003956564819949) = 22594335182736179043379580066191528397421893149471409561964986373640506607991
Decrypted message 22594335182736179043379580066191528397421893149471409561964986373640506607991
> $m
22594335182736179043379580066191528397421893149471409561964986373640506607991

… and it does! Yay.

Implementing ElGamal encrypt, decrypt, sign, and verify signature

In 1985, Taher ElGamal wrote A Public key Cryptosystem and A Signature Scheme based on discrete Logarithms (PDF). This lays out the math for:

  1. Encrypting a message so that only a particular person can read it
  2. If you are that particular person, decrypting the message so you can read it
  3. Signing a message so that anyone can convince themselves you wrote it
  4. Verifying that a message was really written by the person who signed it

I read through the paper and wrote a toy implementation in PowerShell.

In the toy implementation, A wants to send private message “92” to B. She encrypts it to the encrypted message “(33, 76)” using B’s public key 69. B uses his private key 55 to decrypt the message back to 92, so he can read it, but no-one else can.

Then A wants to sign a different message “28”. She uses her private key 57 to generate the signature “(29, 25)”. B uses A’s public key 74 to confirm that both the message 28 and the signature (29, 25) give the same value 80, so only A could have generated the signature.

Here is the output of the toy implementation. If you run it multiple times, the encryption (33, 76) and the signature (29, 25) will vary, but the messages are always the same, and the signature check value 80 is always the same.

-- AGREE ON COMMON PARAMETERS --
Everyone agrees to use prime field GF(p = 101) and generator g = 2

-- GENERATE KEYS --
A's private key is 57 and public key is 74
B's private key is 55 and public key is 69

-- ENCRYPT --
A wants to send B a message so that only B can read it
The first block of the message is m = 92
A chooses a random number k = 82
A calculates the key K = (y_B = 69)^(k = 82) mod (p = 101) = 14
A calculates and sends the encrypted message (c1, c2) = (33, 76)
    * c1 = (g = 2)^(k = 82) mod (p = 101) = 33
    * c2 = (K = 14)(m = 92) mod (p = 101) = 76

-- DECRYPT --
B wants to read the message that A sent B
B recovers K = (c1 = 33)^(x_B = 55) mod (p = 101) = 14
B calculates m = (c2 = 76)/(K = 14) mod (p = 101) = 92

-- SIGN --
A wants to sign a message so that anyone can prove that A wrote it
The first block of the message is m = 28
A chooses a random number k = 91 such that gcd(k = 91, p - 1 = 100) = 1
A calculates and sends the signature (r, s) = (29, 25)
These are calculated so g^m = y_A^r r^s mod p
Take r = g^k
Now we have g^m = g^(x_A r) g^(k s) mod p
Which is to say m = x_A r + k s mod (p - 1)
We want to solve for s - this is why we needed gcd(k, p - 1) is 1
    * r = (g = 2)^(k = 91) mod (p = 101) = 29
    * s which makes (m = 28) = (x = 57) * (r = 29) + (k = 91) * (s = 25) mod (p - 1 = 100)

-- VERIFY SIGNATURE --
B - or anyone - wants to verify A's signature
If the signature is valid, (g = 2)^(m = 28) = (y_A = 74)^(r = 29) (r = 29)^(s = 25) mod (p = 101)
This checks because both sides are equal to 80

Safe primes and unsafe primes

Secure communications like HTTPS, SSH, SFTP and others rely on cryptographic handshakes like the Diffie-Hellman key exchange to negotiate a shared key.

There are lots of ways to set up a Diffie-Hellman key exchange. The Logjam security vulnerability starts with a flaw in the negotiation process to ensure a relatively weak one is used – finite-field with a small prime – and ends up with a man-in-the-middle attack.

The paper uses the term “safe prime” and thought I would dig a little into exactly what that meant.

Refresher: a given prime p defines the Galois field GF(p). This contains p elements: 0 through p – 1 inclusive. Addition and multiplication are both modulo p. We can start with any one of these elements – call it g, for “generator” – and form the sequence 1, g, g2, g3, and so on, until we eventually come back to 1. Call the “period” (length) of this sequence x. It is a law of mathematics that x always evenly divides p – 1.

The vulnerable Diffie-Hellman key exchange goes something like this, skipping over some details:

  1. Alice has a secret key a; Bob has a secret key b.
  2. Alice and Bob start a conversation where the attacker Eve can hear them – they publicly agree to a particular prime p and a generator g.
  3. Alice calculates ga mod p and shares it; Bob calculates gb mod p and shares it.
  4. Alice calculates (gb)a mod p and Bob calculates (ga)b mod p. These are necessarily equal, and their shared value is used as a secret symmetric key for the rest of the conversation.

For cryptographic reasons, we want g’s period in GF(p) to be reasonably large. If the period is small, that gives Eve a reasonable chance to guess the shared secret, which is even worse than her man-in-the-middle attack from Logjam.

So how do we ensure the period is small? One way to do it (not necessarily a good way to do it) is to arrange for p – 1 to have very few divisors.

Could p – 1 be prime? Well, yes, there are two consecutive primes – namely, (2, 3) – but choosing p = 3 has some other cryptographic drawbacks which outweigh the benefit 🙂

OK, well, then, if p – 1 can’t be prime, can it be a semiprime?

Yes, it can. In fact, it can be of the form 2q where q is also prime. In this case, the prime pair (q, p = 2q + 1) are called a (Sophie Germain prime, safe prime) pair. This is the technical definition of “safe prime.”

To illustrate this, let’s choose a couple of primes and see what their generator cycles look like. In particular, let’s look at the safe prime 59 (59 – 1 = 2 * 29 so the corresponding Sophie Germain prime is 29) and the “unsafe prime” 61 (61 – 1 = 60 is a smooth number with quite a lot of factors.)

For each prime I made a table where the rows are all the possible generators g and the columns are the exponents i in gi. I highlighted the columns x where x is a factor of p – 1

Here’s the safe prime 59:

Generator cycles in GF(59)

And here’s the unsafe prime 61:

Generator cycles in GF(61)

Notice that there are only four periods for 59, in particular x = 1, 2, 29, and 58. These are the four factors of (59 – 1). When choosing a generator, periods 1 and 2 are of course to be avoided; this is easy to do. Period 58 is as good as possible. Period 29 might be accepted as good enough; but if you wanted to go the extra mile, you could detect and correct as follows.

  • Calculate g(p – 1)/2. This will always be -1 or 1.
  • If it’s -1, g has period p – 1, which is as good as it gets.
  • If it’s 1, g has period (p – 1)/2. Use –g instead, which has period p – 1, unless (q, p) = (2, 5), which we reject for the same reasons as (2, 3).

By contrast, there are no fewer than 12 periods for 61, in particular x = 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30, 60. These are the factors of (61 – 1). It is not so easy to decide where the threshold of acceptability lies (30? 12?), and it is not so easy to correct generators with small periods, or even to detect them (other than 0, 1, and p – 1, which are still trivial to avoid.)

Real cryptographic primes are of course much, much bigger than 61, but this serves as an illustration of why safe primes are called “safe”.

Perl scripts to encrypt and decrypt text using Rijndael

I talked about Rijndael in a few previous posts: Expressing a function f: GF(2⁸) → GF(2⁸) as a polynomial using a Lagrange polynomial, Generating the Rijndael S-box, Efficient multiplication and division in GF(28), Sieving irreducible monic polynomials over a finite field, Addition and multiplication table for GF(22).

Today I wrote a couple of Perl scripts which use off-the-shelf CPAN modules to encrypt and decrypt text using a passphrase. For ease of transport, at the cost of some storage space, the encrypted payload is Base64 encoded.

The particular CPAN modules in question are Crypt::CBC and Crypt::Rijndael.

encrypt.pl and decrypt.pl source in my GitHub repository. They operate on STDIN and produce output on STDOUT.

Note that if you pass an incorrect passphrase to decrypt.pl you get garbage output, rather than an error.

Here is a secret message which is encoded with the passphrase “hunter2”: encrypted.txt (right-click the link and save, rather than opening it in the browser.)

Expressing a function f: GF(2⁸) → GF(2⁸) as a polynomial using a Lagrange polynomial

I talked about Rijndael in a couple of previous posts: Generating the Rijndael S-box, Efficient multiplication and division in GF(28), Sieving irreducible monic polynomials over a finite field, Addition and multiplication table for GF(22).

I’m going to talk some more about it today.

The Rijndael non-linear S-box S(x) is a composition of two invertible functions f(g(x)). Last time we showed how to generate these, and their inverses, as 256-entry tables.

As Daemen and Rijmen point out, any function from a finite field to itself can be expressed as a polynomial. In fact, given a tabular form of the function, it is possible to generate the Lagrange polynomial and then simplify.

They also give the polynomial for S(x):

SRD[x] = 05·x255 + 09·x253 + F9·x251 + 25·x247 + F4·x239 + 01·x223 + B5·x191 + 8F·x127 + 63

Well, let’s check this.
And while we’re at it, let’s find the polynomials for g(x), f(x) and even S-1(x) too.
First of all, let’s start with the Lagrange polynomial.

Given a table of entries { (00, y00), (01, y01), …, (ff, yff) }, there is a polynomial L(x) which gives the same output, namely:

L(x) = Σi ∈ { 00, 01, …, ff } yi pi(x)
where pi(x) = Πj ∈ { 00, 01, …, ff }, ji (xj) / (ij)

Can we simplify this?
Yes. Note that (ij)-1 term varies over all the non-zero elements of the finite field. Since this is a field, every non-zero element has an inverse, which might or might not be itself.
If the inverse is not itself, we can pair the two inverse elements together, and we get 01, which is the multiplicative identity, so we can ignore it.
What are we left with? The product of those non-zero elements which are their own inverses.
In the case of GF(28), there is only one such element, namely 01.
Great! We can ignore the (ij)-1 terms altogether:

L(x) = Σi ∈ { 00, 01, …, ff } yi Πj ∈ { 00, 01, …, ff }, ji (xj)

What is this? A sum of 256 different polynomials.
Each of these summands is a product of 255 polynomials of degree 1, and so is a polynomial of degree 255.
But in GF(28), x255 = 01 for all x. So each of the summands is actually a polynomial of degree 254, or less.
So the sum is also a polynomial of degree 254, or less; terms with an exponent >= 255 go away and just contribute to lower-order terms.

Great. We have { (00, y00), (01, y01), …, (ff, yff) } for f, g, S, and S-1. Let’s plug them in and see what happens!

I wrote a Perl script to do this; the script is attached. Here’s the output. (It’s quite slow; it takes about two and a half minutes to run.)

g(x) = 01 x^254

f(x) =
63 + 05 x + 09 x^2 + f9 x^4 + 25 x^8 + f4 x^16 + 01 x^32 + b5 x^64 +
8f x^128

S(x) =
63 + 8f x^127 + b5 x^191 + 01 x^223 + f4 x^239 + 25 x^247 + f9 x^251 + 09 x^253 +
05 x^254

S^(-1)(x) =
52 + f3 x + 7e x^2 + 1e x^3 + 90 x^4 + bb x^5 + 2c x^6 + 8a x^7 +
1c x^8 + 85 x^9 + 6d x^10 + c0 x^11 + b2 x^12 + 1b x^13 + 40 x^14 + 23 x^15 +
f6 x^16 + 73 x^17 + 29 x^18 + d9 x^19 + 39 x^20 + 21 x^21 + cf x^22 + 3d x^23 +
9a x^24 + 8a x^25 + 2f x^26 + cf x^27 + 7b x^28 + 04 x^29 + e8 x^30 + c8 x^31 +
85 x^32 + 7b x^33 + 7c x^34 + af x^35 + 86 x^36 + 2f x^37 + 13 x^38 + 65 x^39 +
75 x^40 + d3 x^41 + 6d x^42 + d4 x^43 + 89 x^44 + 8e x^45 + 65 x^46 + 05 x^47 +
ea x^48 + 77 x^49 + 50 x^50 + a3 x^51 + c5 x^52 + 01 x^53 + 0b x^54 + 46 x^55 +
bf x^56 + a7 x^57 + 0c x^58 + c7 x^59 + 8e x^60 + f2 x^61 + b1 x^62 + cb x^63 +
e5 x^64 + e2 x^65 + 10 x^66 + d1 x^67 + 05 x^68 + b0 x^69 + f5 x^70 + 86 x^71 +
e4 x^72 + 03 x^73 + 71 x^74 + a6 x^75 + 56 x^76 + 03 x^77 + 9e x^78 + 3e x^79 +
19 x^80 + 18 x^81 + 52 x^82 + 16 x^83 + b9 x^84 + d3 x^85 + 38 x^86 + d9 x^87 +
04 x^88 + e3 x^89 + 72 x^90 + 6b x^91 + ba x^92 + e8 x^93 + bf x^94 + 9d x^95 +
1d x^96 + 5a x^97 + 55 x^98 + ff x^99 + 71 x^100 + e1 x^101 + a8 x^102 + 8e x^103 +
fe x^104 + a2 x^105 + a7 x^106 + 1f x^107 + df x^108 + b0 x^109 + 03 x^110 + cb x^111 +
08 x^112 + 53 x^113 + 6f x^114 + b0 x^115 + 7f x^116 + 87 x^117 + 8b x^118 + 02 x^119 +
b1 x^120 + 92 x^121 + 81 x^122 + 27 x^123 + 40 x^124 + 2e x^125 + 1a x^126 + ee x^127 +
10 x^128 + ca x^129 + 82 x^130 + 4f x^131 + 09 x^132 + aa x^133 + c7 x^134 + 55 x^135 +
24 x^136 + 6c x^137 + e2 x^138 + 58 x^139 + bc x^140 + e0 x^141 + 26 x^142 + 37 x^143 +
ed x^144 + 8d x^145 + 2a x^146 + d5 x^147 + ed x^148 + 45 x^149 + c3 x^150 + ec x^151 +
1c x^152 + 3e x^153 + 2a x^154 + b3 x^155 + 9e x^156 + b7 x^157 + 38 x^158 + 82 x^159 +
23 x^160 + 2d x^161 + 87 x^162 + ea x^163 + da x^164 + 45 x^165 + 24 x^166 + 03 x^167 +
e7 x^168 + 19 x^169 + e3 x^170 + d3 x^171 + 4e x^172 + dd x^173 + 11 x^174 + 4e x^175 +
81 x^176 + 91 x^177 + 91 x^178 + 59 x^179 + a3 x^180 + 80 x^181 + 92 x^182 + 7e x^183 +
db x^184 + c4 x^185 + 20 x^186 + ec x^187 + db x^188 + 55 x^189 + 7f x^190 + a8 x^191 +
c1 x^192 + 64 x^193 + ab x^194 + 1b x^195 + fd x^196 + 60 x^197 + 05 x^198 + 13 x^199 +
2c x^200 + a9 x^201 + 76 x^202 + a5 x^203 + 1d x^204 + 32 x^205 + 8e x^206 + 1e x^207 +
c0 x^208 + 65 x^209 + cb x^210 + 8b x^211 + 93 x^212 + e4 x^213 + ae x^214 + be x^215 +
5f x^216 + 2c x^217 + 3b x^218 + d2 x^219 + 0f x^220 + 9f x^221 + 42 x^222 + cc x^223 +
6c x^224 + 80 x^225 + 68 x^226 + 43 x^227 + 09 x^228 + 23 x^229 + c5 x^230 + 6d x^231 +
1d x^232 + 18 x^233 + bd x^234 + 5e x^235 + 1b x^236 + b4 x^237 + 85 x^238 + 49 x^239 +
bc x^240 + 0d x^241 + 1f x^242 + a6 x^243 + 6b x^244 + d8 x^245 + 22 x^246 + 01 x^247 +
7a x^248 + c0 x^249 + 55 x^250 + 16 x^251 + b3 x^252 + cf x^253 + 05 x^254

Daemen and Rijmen’s polynomial rendition of S(x) is confirmed.

Also note how simple g(x) is, and how complex S-1(x) is.

Finally, I must confess that none of this is actually useful. The tabular form is much more convenient for applications. This is just a fun theoretical exercise.

(Well, I had fun.)

EDIT: 2015-10-31 moved script to https://github.com/mvaneerde/blog/blob/master/rijndael/lagrange-interpolation.pl

Generating the Rijndael S-box

I talked about Rijndael in a couple of previous posts: Efficient multiplication and division in GF(28), Sieving irreducible monic polynomials over a finite field, Addition and multiplication table for GF(22).

I’m going to talk some more about it today.

One of the more interesting steps used in the Rijndael transformation is the non-linear S-box. This is a function S(a) that takes an element of GF(28) (which could be represented as a byte) and returns another one. It is invertible but non-linear.

The spec defines S(a) in terms of two other invertible functions g(a) and f(a). In particular S(a) = f(g(a)). It follows that S-1(a) = g-1(f-1(a)).

g(a) is the non-linear piece, and is quite simple:

g(00) is defined as 00.
g(a) = a-1 for all other a in GF(28). In particular, if a = 03b, then g(a) = 03255 – b.

This is clearly invertible; in fact, it is its own inverse.

Daemen and Rijmen seem to have been almost embarrassed by the simplicity of g so they introduced f. To quote from section 3.4.1:

By definition, g has a very simple algebraic expression. This could allow algebraic manipulations that can be used to mount attacks such as interpolation attacks. Therefore, we built the S-box as the sequence of g and an invertible affine transformation f.

I don’t know if I buy the “simplicity” argument. It seems to me that if Rijndael, without f, is robust, then it’s robust. And if you add f to a non-robust scheme, I don’t understand how that makes it robust; I do see that it complicates analysis, but that seems like a drawback rather than an advantage.

But I’ll play along for now. What is f?

b = f(a) is defined using the following matrix multiplication over GF(2) (each entry can be represented as a bit; a row or column as a byte.) Multiplication can be implemented as bitwise AND, and addition by bitwise XOR.

S-box-f

In particular, note that f(00) = 63. So S(00) = f(g(00)) = f(00) = 63. So S-1(63) = 00.

The reference implementation in the book uses hardcoded 256-byte lookup tables for S(a) and S-1(a). I wrote a Perl script which generates these lookup tables and prints them out. This script is attached.

Here’s its output, which matches the listing in the book:

>perl -w s-box.pl
S(xy)
| x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xa xb xc xd xe xf
—+————————————————
0y | 63 7c 77 7b f2 6b 6f c5 30 01 67 2b fe d7 ab 76
1y | ca 82 c9 7d fa 59 47 f0 ad d4 a2 af 9c a4 72 c0
2y | b7 fd 93 26 36 3f f7 cc 34 a5 e5 f1 71 d8 31 15
3y | 04 c7 23 c3 18 96 05 9a 07 12 80 e2 eb 27 b2 75
4y | 09 83 2c 1a 1b 6e 5a a0 52 3b d6 b3 29 e3 2f 84
5y | 53 d1 00 ed 20 fc b1 5b 6a cb be 39 4a 4c 58 cf
6y | d0 ef aa fb 43 4d 33 85 45 f9 02 7f 50 3c 9f a8
7y | 51 a3 40 8f 92 9d 38 f5 bc b6 da 21 10 ff f3 d2
8y | cd 0c 13 ec 5f 97 44 17 c4 a7 7e 3d 64 5d 19 73
9y | 60 81 4f dc 22 2a 90 88 46 ee b8 14 de 5e 0b db
ay | e0 32 3a 0a 49 06 24 5c c2 d3 ac 62 91 95 e4 79
by | e7 c8 37 6d 8d d5 4e a9 6c 56 f4 ea 65 7a ae 08
cy | ba 78 25 2e 1c a6 b4 c6 e8 dd 74 1f 4b bd 8b 8a
dy | 70 3e b5 66 48 03 f6 0e 61 35 57 b9 86 c1 1d 9e
ey | e1 f8 98 11 69 d9 8e 94 9b 1e 87 e9 ce 55 28 df
fy | 8c a1 89 0d bf e6 42 68 41 99 2d 0f b0 54 bb 16

S^(-1)(xy)
| x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xa xb xc xd xe xf
—+————————————————
0y | 52 09 6a d5 30 36 a5 38 bf 40 a3 9e 81 f3 d7 fb
1y | 7c e3 39 82 9b 2f ff 87 34 8e 43 44 c4 de e9 cb
2y | 54 7b 94 32 a6 c2 23 3d ee 4c 95 0b 42 fa c3 4e
3y | 08 2e a1 66 28 d9 24 b2 76 5b a2 49 6d 8b d1 25
4y | 72 f8 f6 64 86 68 98 16 d4 a4 5c cc 5d 65 b6 92
5y | 6c 70 48 50 fd ed b9 da 5e 15 46 57 a7 8d 9d 84
6y | 90 d8 ab 00 8c bc d3 0a f7 e4 58 05 b8 b3 45 06
7y | d0 2c 1e 8f ca 3f 0f 02 c1 af bd 03 01 13 8a 6b
8y | 3a 91 11 41 4f 67 dc ea 97 f2 cf ce f0 b4 e6 73
9y | 96 ac 74 22 e7 ad 35 85 e2 f9 37 e8 1c 75 df 6e
ay | 47 f1 1a 71 1d 29 c5 89 6f b7 62 0e aa 18 be 1b
by | fc 56 3e 4b c6 d2 79 20 9a db c0 fe 78 cd 5a f4
cy | 1f dd a8 33 88 07 c7 31 b1 12 10 59 27 80 ec 5f
dy | 60 51 7f a9 19 b5 4a 0d 2d e5 7a 9f 93 c9 9c ef
ey | a0 e0 3b 4d ae 2a f5 b0 c8 eb bb 3c 83 53 99 61
fy | 17 2b 04 7e ba 77 d6 26 e1 69 14 63 55 21 0c 7d

And here’s the interesting part, the implementation of g(a) and f(a):

sub g($) {
my ($a) = shift;

# g(0) we define to be 0
return 0 unless $a;

# otherwise g(a) = a^(-1)
# a = (x + 1)^loga
# so a^(-1) = (x + 1)^(-loga) = (x + 1)^(255 – loga)
my $loga = $log_xplusone_of[$a];
my $logb = 255 – $loga;
$logb -= 255 if $logb >= 255;

return $xplusone_to[$logb];
}

# f(a) = b is defined as follows:
#
# [ b7 ]   ( [ 1 1 1 1 1 0 0 0 ] [ a7 ] )   [ 0 ]
# [ b6 ]   ( [ 0 1 1 1 1 1 0 0 ] [ a6 ] )   [ 1 ]
# [ b5 ]   ( [ 0 0 1 1 1 1 1 0 ] [ a5 ] )   [ 1 ]
# [ b4 ] = ( [ 0 0 0 1 1 1 1 1 ] [ a4 ] ) + [ 0 ]
# [ b3 ]   ( [ 1 0 0 0 1 1 1 1 ] [ a3 ] )   [ 0 ]
# [ b2 ]   ( [ 1 1 0 0 0 1 1 1 ] [ a2 ] )   [ 0 ]
# [ b1 ]   ( [ 1 1 1 0 0 0 1 1 ] [ a1 ] )   [ 1 ]
# [ b0 ]   ( [ 1 1 1 1 0 0 0 1 ] [ a0 ] )   [ 1 ]
#
# where the + is XOR
sub f($) {
my ($a) = @_;

# start with the addition
my $b = 0x63; # 0b01100011;

# do the matrix multiplication
# one matrix column at a time
for (
my ($c, $i) = (0x8f, 0x80); # 0b10001111, 0b10000000
$i;
$c = rotate_byte_right($c), $i >>= 1
) {
# i is used to select a bit out of the a column
# c is the matrix column which is multiplied by that bit
# the resulting product influences the eventual b column

# printf(“%02x %02xn”, $c, $i);

# if this bit in the a column is 0, all of the products are 0, so don’t bother
next unless $a & $i;

# this bit in the a column is 1
# so XOR b with the matrix column
$b ^= $c;
}

return $b;
}

EDIT 2015-10-31: moved script to https://github.com/mvaneerde/blog/blob/master/rijndael/s-box.pl

Efficient multiplication and division in GF(2⁸)

I talked about Rijndael in a couple of previous posts: Sieving irreducible monic polynomials over a finite field, Addition and multiplication table for GF(2²)

I’m going to talk some more about it today.

Rijndael does a lot of arithmetic operations (addition and multiplication) on elements of GF(28)4.

These are represented as polynomials b0 + b1 x + b2 x2 + b3 x3.

The individual coefficients bi are elements of GF(28).

These are themselves polynomials a0 + a1 x + … + a7 x7 where the individual coefficients ai are bits (0 or 1).

Multiplication bi bj is made into a closed operation by taking the modulus of the product under the reduction polynomial m = x8 + x4 + x3 + x + 1.

The reduction polynomial is prime, which is important in establishing that this representation of the Galois field really is a field.

For convenience we will represent the elements of GF(28) as hexadecimal numbers 0x00, 0x01, … 0xfe, 0xff.

By contrast, we will represent the elements of GF(28)4 as vectors (b0 b1 b2 b3); for example, (0x00 0x01 0x02 0x03).

Multiplying vectors in GF(28)4 induces a bunch of multiplications in GF(28). It would be good to come up with a way to make this fast. Doing a polynomial reduction every time is not fast.

Of course, one way to do this is to create a multiplication table with 256 rows and 256 columns, do each multiplication the slow way once, and compile in the results. This would require on the order of 64 kB. But there’s a better way.

The trick that is used in Daemen and Rijmen’s implementation is to find a primitive element e in GF(28) so that the orbit of e (that is to say, the set {e0, e1, …, e254}) spans all of GF(28) – {0x00}.

If we find such an e, then we can save a cheat sheet which is almost as fast as the multiplication table, but requires storing only on the order of 256 bytes.

Well, let’s calculate the orbits of all the elements in order until we find a primitive element.

Orbit(0x00) = { 0x000, 0x001, 0x002, … } = { 0x01, 0x00, 0x00, … } = { 0x00, 0x01 }. Nope.
Orbit(0x01) = { 0x010, 0x011, 0x012, … } = { 0x01, 0x01, 0x01, … } = { 0x01 }. Even worse.
Orbit(0x02) = { 0x020, 0x021, 0x022, … } = { 0x01, 0x02, 0x04, …, 0x8d }. This looks promising at first but we end up with an orbit of only 51 out of the necessary 255 elements (note that 51 divides 255:)

0x020 = 0x01
0x021 = 0x02
0x022 = 0x04
0x023 = 0x08

0x0245 = 0x1d
0x0246 = 0x3a
0x0247 = 0x74
0x0248 = 0xe8
0x0249 = 0xcb
0x0250 = 0x8d
And then we circle back around to…
0x0251 = 0x01

Well, let’s keep looking. What about 0x03?

Orbit(0x03) = { 0x030, 0x031, 0x032, … } = { 0x01, 0x03, 0x05, …, 0xf6 }. Bingo, we hit all 255 non-0x00 elements!

0x030 = 0x01
0x031 = 0x03
0x032 = 0x05
0x033 = 0x0f

0x03250 = 0x6c
0x03251 = 0xb4
0x03252 = 0xc7
0x03253 = 0x52
0x03254 = 0xf6
… and only now do we circle around to…
0x03255 = 0x01

This gives us a one-to-one mapping between the 255 powers 0 through 254 and the 255 non-zero bytes 0x01 through 0xff.

Here is the Perl script I used to generate these

Now that we have this orbit, we use it to construct two lookup tables:

// 255 entries xPlusOne_ToThe[0] = 0x01 to xPlusOne_ToThe[254] = 0xf6
xPlusOne_ToThe[] = { 0x01, 0x03, 0x05, ..., 0x52, 0xf6 };

And the inverse mapping:

// 255 entries logXPlusOne_Of[0x01] = 0 to logXPlusOne_Of[0xff] = 7
logXPlusOne_Of[] = { UNDEFINED, 0, 25, 1, 50, ..., 112, 7 };

Now that we have paid for these 512-ish bytes, we can do multiplication very cheaply:

multiply(b1, b2) {
    if (0x00 == b1 || 0x00 == b2) { return 0x00; }

    logAns = logXPlusOne_Of[b1] + logXPlusOne_Of[b2];
    if (logAns >= 255) { logAns -= 255; }
    return xPlusOne_ToThe[logAns];
}

EDIT 2020-10-22: moved Perl script to GitHub