I talked about Rijndael in a couple of previous posts: Sieving irreducible monic polynomials over a finite field, Addition and multiplication table for GF(2²)

I’m going to talk some more about it today.

Rijndael does a lot of arithmetic operations (addition and multiplication) on elements of GF(2^{8})^{4}.

These are represented as polynomials *b*_{0} + *b*_{1} *x* + *b*_{2} *x*^{2} + *b*_{3} *x*^{3}.

The individual coefficients *b*_{i} are elements of GF(2^{8}).

These are themselves polynomials *a*_{0} + *a*_{1} *x* + … + *a*_{7} *x*^{7} where the individual coefficients *a*_{i} are bits (0 or 1).

Multiplication *b*_{i} *b*_{j} is made into a closed operation by taking the modulus of the product under the reduction polynomial *m* = *x*^{8} + *x*^{4} + *x*^{3} + *x* + 1.

The reduction polynomial is prime, which is important in establishing that this representation of the Galois field really is a field.

For convenience we will represent the elements of GF(2^{8}) as hexadecimal numbers **0x00**, **0x01**, … **0xfe**, **0xff**.

By contrast, we will represent the elements of GF(2^{8})^{4} as vectors (*b*_{0} *b*_{1} *b*_{2} *b*_{3}); for example, (**0x00** **0x01** **0x02** **0x03**).

Multiplying vectors in GF(2^{8})^{4} induces a bunch of multiplications in GF(2^{8}). It would be good to come up with a way to make this fast. Doing a polynomial reduction every time is not fast.

Of course, one way to do this is to create a multiplication table with 256 rows and 256 columns, do each multiplication the slow way once, and compile in the results. This would require on the order of 64 kB. But there’s a better way.

The trick that is used in Daemen and Rijmen’s implementation is to find a primitive element *e* in GF(2^{8}) so that the *orbit* of *e* (that is to say, the set {*e*^{0}, *e*^{1}, …, *e*^{254}}) spans all of GF(2^{8}) – {**0x00**}.

If we find such an *e*, then we can save a cheat sheet which is almost as fast as the multiplication table, but requires storing only on the order of 256 bytes.

Well, let’s calculate the orbits of all the elements in order until we find a primitive element.

Orbit(**0x00**) = { **0x00**^{0}, **0x00**^{1}, **0x00**^{2}, … } = { **0x01**, **0x00**, **0x00**, … } = { **0x00**, **0x01** }. Nope.

Orbit(**0x01**) = { **0x01**^{0}, **0x01**^{1}, **0x01**^{2}, … } = { **0x01**, **0x01**, **0x01**, … } = { **0x01** }. Even worse.

Orbit(**0x02**) = { **0x02**^{0}, **0x02**^{1}, **0x02**^{2}, … } = { **0x01**, **0x02**, **0x04**, …, **0x8d** }. This looks promising at first but we end up with an orbit of only 51 out of the necessary 255 elements (note that 51 divides 255:)

**0x02**^{0} = **0x01****0x02**^{1} = **0x02****0x02**^{2} = **0x04****0x02**^{3} = **0x08**

…**0x02**^{45} = **0x1d****0x02**^{46} = **0x3a****0x02**^{47} = **0x74****0x02**^{48} = **0xe8****0x02**^{49} = **0xcb****0x02**^{50} = **0x8d**

And then we circle back around to…**0x02**^{51} = **0x01**

Well, let’s keep looking. What about **0x03**?

Orbit(**0x03**) = { **0x03**^{0}, **0x03**^{1}, **0x03**^{2}, … } = { **0x01**, **0x03**, **0x05**, …, **0xf6** }. Bingo, we hit all 255 non-**0x00** elements!

**0x03**^{0} = **0x01****0x03**^{1} = **0x03****0x03**^{2} = **0x05****0x03**^{3} = **0x0f**

…**0x03**^{250} = **0x6c****0x03**^{251} = **0xb4****0x03**^{252} = **0xc7****0x03**^{253} = **0x52****0x03**^{254} = **0xf6**

… and only now do we circle around to…**0x03**^{255} = **0x01**

This gives us a one-to-one mapping between the 255 powers 0 through 254 and the 255 non-zero bytes **0x01** through **0xff**.

Here is the Perl script I used to generate these

Now that we have this orbit, we use it to construct two lookup tables:

// 255 entries xPlusOne_ToThe[0] = 0x01 to xPlusOne_ToThe[254] = 0xf6 xPlusOne_ToThe[] = { 0x01, 0x03, 0x05, ..., 0x52, 0xf6 };

And the inverse mapping:

// 255 entries logXPlusOne_Of[0x01] = 0 to logXPlusOne_Of[0xff] = 7 logXPlusOne_Of[] = { UNDEFINED, 0, 25, 1, 50, ..., 112, 7 };

Now that we have paid for these 512-ish bytes, we can do multiplication very cheaply:

multiply(b1, b2) { if (0x00 == b1 || 0x00 == b2) { return 0x00; } logAns = logXPlusOne_Of[b1] + logXPlusOne_Of[b2]; if (logAns >= 255) { logAns -= 255; } return xPlusOne_ToThe[logAns]; }

EDIT 2020-10-22: moved Perl script to GitHub

## 3 thoughts on “Efficient multiplication and division in GF(2⁸)”