# Efficient multiplication and division in GF(2⁸)

I talked about Rijndael in a couple of previous posts: Sieving irreducible monic polynomials over a finite field, Addition and multiplication table for GF(2²)

I’m going to talk some more about it today.

Rijndael does a lot of arithmetic operations (addition and multiplication) on elements of GF(28)4.

These are represented as polynomials b0 + b1 x + b2 x2 + b3 x3.

The individual coefficients bi are elements of GF(28).

These are themselves polynomials a0 + a1 x + … + a7 x7 where the individual coefficients ai are bits (0 or 1).

Multiplication bi bj is made into a closed operation by taking the modulus of the product under the reduction polynomial m = x8 + x4 + x3 + x + 1.

The reduction polynomial is prime, which is important in establishing that this representation of the Galois field really is a field.

For convenience we will represent the elements of GF(28) as hexadecimal numbers 0x00, 0x01, … 0xfe, 0xff.

By contrast, we will represent the elements of GF(28)4 as vectors (b0 b1 b2 b3); for example, (0x00 0x01 0x02 0x03).

Multiplying vectors in GF(28)4 induces a bunch of multiplications in GF(28). It would be good to come up with a way to make this fast. Doing a polynomial reduction every time is not fast.

Of course, one way to do this is to create a multiplication table with 256 rows and 256 columns, do each multiplication the slow way once, and compile in the results. This would require on the order of 64 kB. But there’s a better way.

The trick that is used in Daemen and Rijmen’s implementation is to find a primitive element e in GF(28) so that the orbit of e (that is to say, the set {e0, e1, …, e254}) spans all of GF(28) – {0x00}.

If we find such an e, then we can save a cheat sheet which is almost as fast as the multiplication table, but requires storing only on the order of 256 bytes.

Well, let’s calculate the orbits of all the elements in order until we find a primitive element.

Orbit(0x00) = { 0x000, 0x001, 0x002, … } = { 0x01, 0x00, 0x00, … } = { 0x00, 0x01 }. Nope.
Orbit(0x01) = { 0x010, 0x011, 0x012, … } = { 0x01, 0x01, 0x01, … } = { 0x01 }. Even worse.
Orbit(0x02) = { 0x020, 0x021, 0x022, … } = { 0x01, 0x02, 0x04, …, 0x8d }. This looks promising at first but we end up with an orbit of only 51 out of the necessary 255 elements (note that 51 divides 255:)

0x020 = 0x01
0x021 = 0x02
0x022 = 0x04
0x023 = 0x08

0x0245 = 0x1d
0x0246 = 0x3a
0x0247 = 0x74
0x0248 = 0xe8
0x0249 = 0xcb
0x0250 = 0x8d
And then we circle back around to…
0x0251 = 0x01

Well, let’s keep looking. What about 0x03?

Orbit(0x03) = { 0x030, 0x031, 0x032, … } = { 0x01, 0x03, 0x05, …, 0xf6 }. Bingo, we hit all 255 non-0x00 elements!

0x030 = 0x01
0x031 = 0x03
0x032 = 0x05
0x033 = 0x0f

0x03250 = 0x6c
0x03251 = 0xb4
0x03252 = 0xc7
0x03253 = 0x52
0x03254 = 0xf6
… and only now do we circle around to…
0x03255 = 0x01

This gives us a one-to-one mapping between the 255 powers 0 through 254 and the 255 non-zero bytes 0x01 through 0xff.

Here is the Perl script I used to generate these

Now that we have this orbit, we use it to construct two lookup tables:

```// 255 entries xPlusOne_ToThe[0] = 0x01 to xPlusOne_ToThe[254] = 0xf6
xPlusOne_ToThe[] = { 0x01, 0x03, 0x05, ..., 0x52, 0xf6 };```

And the inverse mapping:

```// 255 entries logXPlusOne_Of[0x01] = 0 to logXPlusOne_Of[0xff] = 7
logXPlusOne_Of[] = { UNDEFINED, 0, 25, 1, 50, ..., 112, 7 };```

Now that we have paid for these 512-ish bytes, we can do multiplication very cheaply:

```multiply(b1, b2) {
if (0x00 == b1 || 0x00 == b2) { return 0x00; }

logAns = logXPlusOne_Of[b1] + logXPlusOne_Of[b2];
if (logAns >= 255) { logAns -= 255; }
return xPlusOne_ToThe[logAns];
}```

EDIT 2020-10-22: moved Perl script to GitHub