Solving x = r₁ mod m₁, x = r₂ mod m₂, … x = rₙ mod mₙ using the Chinese Remainder Theorem

Suppose we have an unknown number of objects. When counted in threes, 2 are left over, when counted in fives, 3 are left over, and when counted in sevens, 2 are left over. How many objects are there?

Sunzi Suanjing, 3rd to 5th century CE

I coded up a PowerShell implementation of the Chinese Remainder Theorem which solves sets of simultaneous modular equations like this. The above story problem can be translated as:

Find x that satisfies the following modular equations:

  • x = 2 mod 3
  • x = 3 mod 5
  • x = 2 mod 7

The answer it comes up with is 23:

> .\chinese-remainder-theorem.ps1 -moduli (3, 5, 7) -remainders (2, 3, 2)
23

Checking, this does indeed meet the criteria:

  • 23 = 3(7) + 2 = 2 mod 3
  • 23 = 5(4) + 3 = 3 mod 5
  • 23 = 7(3) + 2 = 2 mod 7

Actually, combining the equations only tells us that x = 23 mod 105. There are infinitely many such x, in particular 23, 128, 233, 338, …. The general term is 23 + 105n.

Generating random numbers, part 4 – generate a random System.Numerics.BigInteger using System.Random

Posts in this series:

This is a PowerShell implementation of the two algorithms described in part 3. System.Random is the source of entropy. I return a System.Numerics.BigInteger X such that min <= X < max for a given min and max. To simplify things I define N = maxmin, generate a random number such that 0 <= X < N, and then return X + min at the end.

Here’s an example invocation:

5694168000071045347676722039165259190054180319143041473488317085119130071360

.\random-biginteger.ps1 -min 0 -max ([System.Numerics.BigInteger]::Pow(2, 255) – 19)

Helper function Get-RandomBit keeps a reservoir of 8 random bits from System.Random and hands them out one at a time, replenishing it as necessary.

Get-RandomBigIntegerNaive uses the “naïve algorithm” of building the whole X and then comparing it to N at the end.

Get-RandomBigInteger uses the “better algorithm” of building X a bit at a time, comparing against the corresponding bit of N as we go, until it’s clear whether the ultimate X will be < N or >= N.

As a performance optimization, Get-RandomBigInteger switches from operating on individual bits to operating on whole byte sequences as soon as it can. Helper function Get-RandomBytes simplifies this. Unfortunately PowerShell does not support assigning to array slices. I didn’t bother with this performance optimization in the naïve algorithm.

Let d = ⌊ log2 N ⌋ so 2d <= N <= 2d + 1 – 1. Note that d + 1 is the number of bits in N. We expect the following entropy usage from each algorithm:

NBits of entropy used
NaïveBetter
2d2d + 2d
2d + 12d + 2d + 4
2d + 1 – 1d + 1d + 1

It is possible to improve the performance of the naïve algorithm to use only d bits of entropy in the 2d case, instead of 2d + 2. I didn’t bother.

Get-BitsOfEntropyUsed and Clear-BitsOfEntropyUsed allow measuring this in testing. Here are some results with N = 26 = 64 to 27 = 128, showing the transition from d = 6 to d = 7. Other than the power of 2 at the beginning and the end, we expect the naïve algorithm to gradually ramp down from 2(6) + 2 = 14 to 6 + 1 = 7, and we expect the better algorithm to gradually ramp down from 6 + 4 = 10 to 6 + 1 = 7.

Flips per integer 64: naive 58.919, better 6
Flips per integer 65: naive 13.832, better 9.816
Flips per integer 66: naive 13.755, better 9.571
Flips per integer 67: naive 13.349, better 9.616
Flips per integer 68: naive 13.286, better 9.575
Flips per integer 69: naive 12.719, better 9.506
Flips per integer 70: naive 12.782, better 9.299
Flips per integer 71: naive 12.138, better 9.117
Flips per integer 72: naive 12.803, better 8.978
Flips per integer 73: naive 12.033, better 8.966
Flips per integer 74: naive 12.138, better 9.085
Flips per integer 75: naive 12.187, better 8.855
Flips per integer 76: naive 12.026, better 8.725
Flips per integer 77: naive 11.564, better 8.592
Flips per integer 78: naive 11.62, better 8.595
Flips per integer 79: naive 11.347, better 8.362
Flips per integer 80: naive 10.857, better 8.476
Flips per integer 81: naive 11.361, better 8.793
Flips per integer 82: naive 10.955, better 8.59
Flips per integer 83: naive 10.325, better 8.444
Flips per integer 84: naive 10.64, better 8.433
Flips per integer 85: naive 10.276, better 8.334
Flips per integer 86: naive 10.682, better 8.244
Flips per integer 87: naive 10.059, better 8.12
Flips per integer 88: naive 10.276, better 8.024
Flips per integer 89: naive 10.087, better 8.183
Flips per integer 90: naive 9.905, better 8.017
Flips per integer 91: naive 10.024, better 8.015
Flips per integer 92: naive 10.087, better 8.006
Flips per integer 93: naive 9.723, better 7.909
Flips per integer 94: naive 9.716, better 7.804
Flips per integer 95: naive 9.107, better 7.81
Flips per integer 96: naive 8.939, better 7.668
Flips per integer 97: naive 9.373, better 8.196
Flips per integer 98: naive 9.072, better 7.969
Flips per integer 99: naive 9.086, better 8.208
Flips per integer 100: naive 8.939, better 8.085
Flips per integer 101: naive 8.792, better 8.095
Flips per integer 102: naive 8.687, better 7.929
Flips per integer 103: naive 8.743, better 7.849
Flips per integer 104: naive 8.575, better 7.785
Flips per integer 105: naive 8.421, better 7.769
Flips per integer 106: naive 8.498, better 7.72
Flips per integer 107: naive 8.435, better 7.66
Flips per integer 108: naive 8.288, better 7.656
Flips per integer 109: naive 8.26, better 7.577
Flips per integer 110: naive 8.148, better 7.507
Flips per integer 111: naive 8.057, better 7.497
Flips per integer 112: naive 8.127, better 7.432
Flips per integer 113: naive 7.931, better 7.7
Flips per integer 114: naive 7.791, better 7.541
Flips per integer 115: naive 7.805, better 7.523
Flips per integer 116: naive 7.721, better 7.479
Flips per integer 117: naive 7.693, better 7.461
Flips per integer 118: naive 7.567, better 7.386
Flips per integer 119: naive 7.518, better 7.323
Flips per integer 120: naive 7.336, better 7.272
Flips per integer 121: naive 7.35, better 7.297
Flips per integer 122: naive 7.364, better 7.327
Flips per integer 123: naive 7.322, better 7.145
Flips per integer 124: naive 7.273, better 7.155
Flips per integer 125: naive 7.168, better 7.143
Flips per integer 126: naive 7.133, better 7.084
Flips per integer 127: naive 7.042, better 7.042
Flips per integer 128: naive 16.208, better 7

.\Tests.ps1

And here are some results with N = 230 = 1073741824 to 231 = 2147483648. We expect the naïve algorithm to gradually ramp down from 2(30) + 2 = 62 to 30 + 1 = 31, and we expect the better algorithm to gradually ramp down from 30 + 4 = 34 to 30 + 1 = 31.

Flips per integer 1073741824: naive 60.605, better 30
Flips per integer 1073741825: naive 62.031, better 33.81
Flips per integer 1073741826: naive 62, better 34.251
Flips per integer 1073741827: naive 61.597, better 33.896
Flips per integer 1073741828: naive 61.132, better 33.944
Flips per integer 1073741829: naive 62.899, better 33.932

Flips per integer 2147483643: naive 31, better 31
Flips per integer 2147483644: naive 31, better 31
Flips per integer 2147483645: naive 31, better 31
Flips per integer 2147483646: naive 31, better 31
Flips per integer 2147483647: naive 31, better 31
Flips per integer 2147483648: naive 63.552, better 31

.\Tests.ps1

Testing primeness of really big numbers with the Miller-Rabin algorithm

Cryptography deals with very large primes quite a bit, for example 2255 – 19 = 57896044618658097711785492504343953926634992332820282019728792003956564819949.

There are lots of ways to test whether a number p is prime:

One of the techniques that sees a lot of use in practice is the Miller–Rabin primality test. I wrote an implementation in PowerShell. Here’s some example output, testing the prime 2255 – 19, and the composite 2255 – 63:

.\miller-rabin.ps1 -prime ([System.Numerics.BigInteger]::Pow(2, 255) - 19) -rounds 10000
57896044618658097711785492504343953926634992332820282019728792003956564819949 = 2^2 14474011154664524427946373126085988481658748083205070504932198000989141204987 + 1
All witnesses state that 57896044618658097711785492504343953926634992332820282019728792003956564819949 COULD BE PRIME

.\miller-rabin.ps1 -prime ([System.Numerics.BigInteger]::Pow(2, 255) - 63) -rounds 10000
57896044618658097711785492504343953926634992332820282019728792003956564819905 = 2^6 904625697166532776746648320380374280103671755200316906558262375061821325311 + 1
11328271172437154995197519604260009402971642463559414996414256118294343278719^(2^r 904625697166532776746648320380374280103671755200316906558262375061821325311) with r = 0 to (6 - 1) has no 1 or -1 so 57896044618658097711785492504343953926634992332820282019728792003956564819905 is DEFINITELY COMPOSITE

The way Miller-Rabin works is kind of like Among Us. We have a player p who claims to be an innocent prime, but might actually be an evil composite.

So we surround p with witnesses who keep an eye on her activities. If we see her vent or kill, then AH-HAH! She is a composite! Vote her off!

If we don’t see her vent or kill, even though we have lots of witnesses around her, then there are two possibilities: either she’s legitimately an innocent crewmate, or she’s a very sneaky impostor who is able to pull off her killing and venting in such a way as to evade the gaze of our witnesses.

So Miller-Rabin will never tell you for sure that a number is prime. It will only tell you a number is DEFINITELY COMPOSITE or PROBABLY PRIME.

Generating random numbers, part 3 – generate a random integer using random bits

Posts in this series:

So far we’ve looked at generating a random choice out of seven, or out of five. How do we generalize this?

Problem statement

Given a positive integer N, choose a random non-negative integer X uniformly distributed among {0, 1, … N – 1} = {x: 0 <= x < N}, using only flips of a fair coin. If F is the number of flips required, try to minimize E(F).

The first blog post dealt with the case where N = 7, and the second where N = 5.

Write N in base 2 as nd 2d + xd – 1 2d – 1 + … + n1 21 + x0 20 = Σi = 0 to d ni 2i where d is the smallest integer >= log2 N and each ni ∈ {0, 1}.

Note that by construction, nd = 1.

For ease of notation, we can write N = 0bndnd – 1n1n0. We will put a ` before every n4k + 3 as a separator, in much the same way as , is used as separator in decimal numbers.

Some examples:

N Binary d ni
7 0b111 2 n2 = 1, n1 = 1, n0 = 1
5 0b101 2 n2 = 1, n1 = 0, n0 = 1
128 0b1000`0000 7 n7 = 1, n6 = 0, n5 = 0, n4 = 0, n3 = 0, n2 = 0, n1 = 0, n0 = 0
129 0b1000`0001 7 n7 = 1, n6 = 0, n5 = 0, n4 = 0, n3 = 0, n2 = 0, n1 = 0, n0 = 1
204 0b1100`1100 7 n7 = 1, n6 = 1, n5 = 0, n4 = 0, n3 = 1, n2 = 1, n1 = 0, n0 = 0
255 0b1111`1111 7 n7 = 1, n6 = 1, n5 = 1, n4 = 1, n3 = 1, n2 = 1, n1 = 1, n0 = 1

We wish to choose X = 0bxdxd – 1x1x0 = Σi = 0 to d xi 2i where each xi ∈ {0, 1} and X < N.

If N = 2d, this is easy. xd must be 0, and the other xi can be assigned arbitrarily. We can do this a single pass with exactly d coin flips. But what if N is not a power of 2?

A naïve algorithm

The naïve algorithm used in the first blog post is:

  1. Start with X ← 0.
  2. Repeat the following steps exactly d + 1 times.
    • Let X ← 2X.
    • Flip a coin. This is xi.
    • If it comes up Tails, let XX + 1.
  3. If X >= N then start over with step 1.

Performance of the naïve algorithm

How many flips does this take?

Let r be the chance that the algorithm produces a result on the first pass. Let F1 be the number of flips it takes on the first pass, and let F be the number of flips it takes overall.

Then E(F) = E(F1) + r(0) + (1 – r) E(F). Solving for E(F) we have E(F) = E(F1) / r.

  • r = N/2d + 1. This is somewhere between 1/2 and 1.
  • E(F1) = d + 1.
  • So E(F) = (d + 1) 2d + 1 / N.

In the best case, N = 2d + 1 – 1 and all the ni are 1. E(F) = (d + 1) 2d + 1 / (2d + 1 – 1) = (d + 1) / (1 – 2-(d + 1)). This is basically d + 1, divided by a corrective term that gets closer and closer to 1 as d increases.

In the worst case, N = 2d + 1 and all the ni are 0 except nd = n0 = 1. E(F) = (d + 1) (2d + 1) / (2d + 1) = (d + 1) 2 / (1 – 2d). This is basically 2d + 2, divided by a corrective term that gets closer and closer to 1 as d increases.

So if we blur our eyes a little bit, the number of flips for the naïve algorithm varies from d + 1 to 2d + 2 (with a small corrective factor) depending on the density of 1s in ni.

A better algorithm

The key to the better algorithm hinted at in part 2 is to do much the same thing, but stop flipping as soon as the current pass is doomed. This lowers E(F1).

Find the most significant “on” bit nd = 1. Also find the least significant “on” bit ns = 1 with ni = 0 for all i < s.

These may be the same, in which case N is a power of 2.

N Binary Most significant “on” bit Least significant “on” bit
7 0b111 0b100 0b001
5 0b101 0b100 0b001
128 0b1000`0000 0b1000`0000 0b1000`0000
129 0b1000`0001 0b1000`0000 0b0000`0001
204 0b1100`1100 0b1000`0000 0b0000`0100
255 0b1111`1111 0b1000`0000 0b0000`0001

If N is a power of 2, the algorithm is simple. Set xd = 0, and each of the other xi to a random bit. This can be done in d coin flips. Call this running the table on xd – 1 through x0.

Otherwise, generate random bits one at a time, starting at xd. If your xi < ni, run the table and you’re done. If your xi > ni, you’re doomed; abort the pass and start over. If your xi = ni then you’re on the edge; move on to the next bit and repeat until you’ve exhausted xs.

More explicitly:

  1. Start with id.
  2. Generate a random bit xi.
  3. If xi < ni, we have won. Run the table on xi – 1 through x0. We have chosen X = 0bxdxd – 1x1x0.
  4. If xi > ni, we’re doomed, because X > N, which is too big. No result. Discard all the xi generated so far and go back to step 1.
  5. If xi = ni, we’re on the edge. See if we have any “on” bits remaining:
    • If i > s, we still have a chance – move on to the next bit. Save xi, set ii – 1, and loop around to step 2 again.
    • If i = s, we’re doomed, because X >= N, which is too big. No result. Discard all the xi generated so far and go back to step 1.

Performance of the better algorithm

How many coin flips does this take?

As before, if N = 2d then E(F) = d

As before, if N is not a power of 2:

  • E(F) = E(F1) / r
  • r = N / 2d + 1

As before, in the best case, N = 2d + 1 – 1 and all the ni are 1. E(F1) = d + 1 and E(F) = (d + 1) / (1 – 2-(d + 1)).

What about the worst case N = 2d + 1 where nd = 1, n0 = 1, and all the rest of the ni are 0?

As before, r = N / 2d + 1 = (2d + 1) / 2d + 1 = 2-1 + 2-(d + 1). This is basically 1/2 plus a corrective factor that gets smaller as d increases.

But E(F1) is lower now. Let’s calculate it:

Half of the time, xd = 0 and we get a result in d + 1 flips.

1/4 of the time, xd = 1 and xd – 1 = 1 and we get no result in 2 flips.

1/8 of the time, xd = 1, xd – 1 = 0, and xd – 2 = 1 so we get no result in 3 flips.

1/2i of the time, xd = 1, xd – 1 = 0 through xdi + 2 = 0, and xdi + 1 = 1 so we get no result in i flips.

1/2d of the time, xd = 1, xd – 1 = 0 through x2 = 0, and x1 = 1 so we get no result in d flips.

1/2d + 1 of the time, xd = 1, xd – 1 = 0 through x1 = 0, and x0 = 1 so we get no result in d + 1 flips.

1/2d + 1 of the time, xd = 1, and xd – 1 = 0 through x0 = 0, so we get a result of X = 2d = N – 1 in d + 1 flips.

Adding these up, we have:

E(F1) = (d + 1)/2 + (2/4 + 3/8 + … + i/2i + … + d/2d + (d + 1)/2d + 1) + (d + 1)/2d + 1

= (d + 1)(2-1 + 2-(d + 1)) + Σi = 2 to d + 1 i/2i

= (d + 1)(2-1 + 2-(d + 1)) – 1/2 + Σi = 1 to d + 1 i/2i

So E(F) = E(F1) / r = ((d + 1)(2-1 + 2-(d + 1)) – 1/2 + Σi = 1 to d + 1 i/2i) / (2-1 + 2-(d + 1))

= d + 1 – 1/(1 + 2d) + 2(Σi = 1 to d + 1 i/2i) / (1 + 2d)

As d gets large, 2d gets close to 0, and Σi = 1 to d + 1 i/2i gets close to 2. (For a derivation of this fact see Solution: baby boys and baby girls.) So:

E(F1) approaches d + 1 – 1/(1 + 0) + 2(2) / (1 + 0) = d + 4.

So the number of flips for the better algorithm varies from d + 1 to d + 4 (with a small corrective factor) depending on the density of 1s in ni. This is a vast improvement over the naïve algorithm which varied from d + 1 to 2d + 2.

Next time, we’ll take a look at an implementation of this algorithm in PowerShell which generates a System.Numerics.BigInteger using System.Random.

Generating random numbers, part 2 – using a coin to choose a weekday

Posts in this series:

Last time we talked about how to use a fair coin to generate a random day of the week.

What if instead of generating a day of the week, we want to generate a weekday? That is, Saturday and Sunday shouldn’t count.

We can, of course, use exactly the same approach as before:

Coin flipWeekday
Heads, Heads, HeadsMonday
Heads, Heads, TailsTuesday
Heads, Tails, HeadsWednesday
Heads, Tails, TailsThursday
Tails, Heads, HeadsFriday
Tails, Heads, TailsNo result
Tails, Tails, HeadsNo result
Tails, Tails, TailsNo result

This time, E(L) = 1 + (3/8) E(L) so E(L) = 8/5 = 1.6 so E(F) = 24/5 = 4.8. Curiously, it takes more flips to choose an element of a smaller set!

Suppose that entropy is expensive. For example, suppose that the coin is one of the larger Rai stones used by the Yapese. We would like to flip this as few times a possible.

Rai stone

Can we design a strategy that does better than E(F) = 4.8 without sacrificing randomness?

KEY INSIGHT: If the first two of the coin flips come up (Tails, Tails), then it doesn’t matter what the result of the third flip is – either way, we’re guaranteed a “no result”. So don’t bother to flip the coin the third time.

NEW STRATEGY: Flip the coin twice and perform the appropriate action below.

  • (Heads, Heads): flip the coin once more and perform the appropriate action below.
    • Heads: Monday.
    • Tails: Tuesday.
  • (Heads, Tails): flip the coin once more and perform the appropriate action below.
    • Heads: Wednesday.
    • Tails: Thursday.
  • (Tails, Heads): flip the coin and perform the appropriate action below.
    • Heads: Friday.
    • Tails: no result.
  • (Tails, Tails): no result.

As before, E(L) = 1 + (3/8) E(L) so E(L) = 8/5 = 1.6. So we expect to have to repeat the procedure the same number of times.

But this time, F = 3L no longer holds! It’s cheaper than that.

Let F1 be the random variable for the number of flips ONLY in the first pass. What is E(F1)?

If the first two flips are (Heads, Tails) then F1 = 2. For all other results, F1 = 3. So E(F1) = (3/4) 3 + (1/4) 2 = 11/4.

Putting it together, E(F) = E(F1) E(L) = (11/4) (8/5) = 22/5 = 4.5. We’re saving 0.3 of a flip every time! This may not seem like a lot, but it adds up.

Exercise: we have designed strategies to choose elements from sets of size 7 and size 5. How do we generalize this approach?

Generating random numbers, part 1 – using a coin to choose a day of the week

Posts in this series:

Suppose you have a fair coin that you can flip. By “fair” I mean that it lands Heads or Tails with equal probability. And suppose you want to pick a day of the week at random, where each day of the weak is chosen with equal probability.

  1. Is this possible?
  2. Assuming it is possible, how many times would you expect to flip the coin, on average?

Here’s the approach that I came up with:

Flip the coin three times.

This will give you one of eight different results, with equal probability. See the table below:

Coin flipDay of the week
Heads, Heads, HeadsSunday
Heads, Heads, TailsMonday
Heads, Tails, HeadsTuesday
Heads, Tails, TailsWednesday
Tails, Heads, HeadsThursday
Tails, Heads, TailsFriday
Tails, Tails, HeadsSaturday
Tails, Tails, TailsNo result

If you get “no result,” flip the coin three more times and consult the table again. Repeat as necessary until you get a result.

How many times would you expect to flip the coin, on average?

Let F be the random variable which is the number of times we flip the coin before getting a result. We’re trying to find the expectation of F, which is written E(F).

To simplify things a little, let L be the random variable which is the number of times we look at the table before getting a result. For example, if we get Heads, Heads, Tails on the first time, L = 1; if we get Tails, Tails, Tails and then Heads, Tails, Heads then L = 2. Since we flip the coin three times for each time we consult the table, F = 3L.

So what is E(L)?

We have to consult the table at least once. And there’s a 1/8 chance that we’ll have to consult it more than once.

KEY INSIGHT: this 1/8 of the time, we’re exactly back where we started.

That is to say, E(L) = 1 + (1/8) E(L).

Solving for E(L) we have E(L) = 8/7 ≈ 1.143. This means E(F) = E(3L) = 3 E(L) = 24/7 ≈ 3.429

So with this approach, you would expect to flip the coin, on average, about 3.429 times.

Exercise: what if we want to limit the options to Monday, Tuesday, Wednesday, Thursday, or Friday?

Calculating discrete logarithms modulo a prime, using Shanks’ baby-step/giant-step algorithm

Suppose you’re working in a prime field GF(p); you have a generator g; a desired non-zero value x; and you want to find the power y that gives you gy = x mod p.

We can write this as y = logg x mod p – this is called the “discrete logarithm” problem.

As a toy example, suppose you’re working in GF(17) and you have generator 3 and desired value 12. You want to find y = log3 12 mod 17. Spoiler alert: y = 13 because 313 = 12 mod 17.

The most straightforward solution is to start with g0 = 1 and repeatedly multiply by g until you get the desired x. Then y is just the number of times you had to multiply.

Using our toy example:

  • 30 = 1 mod 17
  • 31 = 1 * 3 = 3 mod 17
  • 32 = 3 * 3 = 9 mod 17
  • 33 = 3 * 9 = 10 mod 17
  • 34 = 3 * 10 = 13 mod 17,
  • 312 = 3 * 7 = 4 mod 17
  • 313 = 3 * 4 = 12 mod 17

So y = 13 is the desired logarithm.

This takes O(p) time and constant space. If you like you can use this approach to generate a lookup table, which will take O(p) space, but will allow you to perform future logarithms in constant time. Start with an empty array and hop around filling it as you go until you get this:

x = 3y mod 17 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
y N/A 0 14 1 12 5 15 11 10 2 3 7 13 4 9 6 8

If p is large (for example 2255 – 19) then this approach fails miserably because there just isn’t enough time in the universe (and if you want a lookup table there isn’t enough space in the universe either.)

Mathematician Daniel Shanks (who we met last time in Calculating square roots modulo a prime, using the Tonelli-Shanks algorithm) found a faster algorithm which runs in O(√p) time at the cost of also requiring O(√p) space; it is called the baby-step giant-step algorithm.

The idea is to find the smallest integer whose square is at least p – call this s. (For p = 17, s = 5.)

Then we can write every number from 0 to p – 1 as a two-digit number in base s:

0 = (0, 0) 5 = (1, 0) 10 = (2, 0) 15 = (3, 0)
1 = (0, 1) 6 = (1, 1) 11 = (2, 1) 16 = (3, 1)
2 = (0, 2) 7 = (1, 2) 12 = (2, 2)
3 = (0, 3) 8 = (1, 3) 13 = (2, 3)
4 = (0, 4) 9 = (1, 4) 14 = (2, 4)

In general, we write (i, j) = is + j where 0 ≤ i, j < s.

The “baby step” part is to precompute the logarithms for all (0, j) and store them in a lookup table. This takes O(√p) time and O(√p) space:

  • 3(0, 0) = 1 mod 17
  • 3(0, 1) = 3 mod 17
  • 3(0, 2) = 9 mod 17
  • 3(0, 3) = 10 mod 17
  • 3(0, 4) = 13 mod 17

(We need the lookup table to be able to answer the questions “do we know the logarithm of x” and “if so, what is the logarithm of x” in constant time.)

If the lookup table contains our desired x, then return the corresponding y = (0, j) and we’re done.

Alas for our toy example, the lookup table does not contain 3(0, j) = 12 mod 17. So we keep going.

Take a “giant step”. Instead of targeting x, target gs x. If we find such a y′ = (0, j) in our lookup table, then return y′ + s = (1, j). This works because gy = gs x mod p which means x = gy′ + s mod p which means logg x = y′ + s mod p which means y = y′ + s mod p.

FIRST GIANT STEP: 3-5 = 7 mod 17 so our new target is 12 * 7 = 16 mod 17. Alas, the lookup table does not contain this either. So we keep going.

Repeat the “giant step” process until the lookup table contains the logarithm. Return y′ + is = (i, j) = is + j mod p where i is the number of giant steps you ultimately had to take. This works for the same reason taking the first giant step worked.

SECOND GIANT STEP: new target is 16 * 7 = 10 mod 17. The lookup table tells us 3(0, 3) = 10 mod 17. Yay! But we had to take two giant steps to reach this, which means instead of (0, 3) being the answer, it’s actually (2, 3). So the final answer is 2 * 5 + 3 = 13 mod 17, which checks.

Of course, if p is large enough, even this faster algorithm is too slow.

Here’s an implementation in PowerShell

> .\baby-step-giant-step.ps1 -p 17 -g 3 -x 12
Find y so that 3^y = 12 mod 17
log_3 12 mod 17 = 13
3^13 mod 17 = 12

Calculating square roots modulo a prime, using the Tonelli-Shanks algorithm

Suppose you are working in a prime field GF(p) and you want to know whether a particular element n is a square – that is to say, whether there is some other element r that satisfies r2 = n mod p.

Sometimes it’s trivial. 0 is always a square, regardless of p because 02 = 0 mod p. Or if p is the prime 2, then both of the options for n are squares, because 02 = 0 mod 2 and 12 = 1 mod 2.

Other than these trivial cases, there is a quick way to tell whether a given n is a square – raise it to the (p – 1) / 2 power, mod p. This will either be the number 1 or the number p – 1.

If n(p – 1) / 2 = 1 mod p then n is a square mod p. If n(p – 1) / 2 = p – 1 mod p, then n is not a square.

This is called Euler’s criterion.

Some quick examples:

  • 159(257 – 1) / 2 = 159128 = 1 mod 257, so 159 IS a square mod 257
  • 160(257 – 1) / 2 = 160128 = 256 mod 257, so 160 IS NOT a square mod 257
  • 4855(7883 – 1)/2 = 48553941 = 1 mod 7883, so 4855 IS a square mod 7883
  • 4856(7883 – 1)/2 = 48563941 = 7882 mod 7883, so 4856 IS NOT a square mod 7883

So if all you care about is whether there is a square root, it’s fast and easy.

But what if you want to know what the square root is?

Besides the trivial cases above, you have a decent chance to get lucky. If p = 3 mod 4 there is an easy way to find the square root – raise n to (p + 1) / 4. This is one square root – call it r. The other square root is pr.

Example:

  • 4855 is a square mod 7883, and 7883 mod 4 = 3. So the square roots are 4855(7883 + 1)/4 = 48551971 = 6047 and 7883 – 6047 = 1836. As a check, 60472 = 4855 mod 7883 and 18362 = 4855 mod 7883 as desired.

But what if p = 1 mod 4, like the p = 257 example?

In that case, there’s no way to immediately write down the answer. But there is an algorithm, discovered by Alberto Tonelli in 1891, and rediscovered by Daniel Shanks in 1973. This is called the Tonelli-Shanks algorithm.

You start by dividing p – 1 into a power of 2 and an odd part. For example, 7883 – 1 = 3941 * 21 and 257 – 1 = 1 * 28.

Then you make an initial guess for r which is probably wrong.

Then you perform a series of operations, which leads you to another, better, guess.

You keep making guesses until you get it right. You may have to make a guess for every power of 2 that divides p – 1, but no more than that. This isn’t exactly immediate, but it is a heck of a lot better than trying different values of r until you eventually find one.

I implemented a PowerShell script that calculates the square root of a given n modulo a given prime p, using the Tonelli-Shanks algorithm. Here is an input and output for a rather high power of 2:

> .\tonelli-shanks.ps1 -p 257 -n 159
Looking for r so that r^2 = 159 mod 257
257 - 1 = 1 * 2^8
3 is a nonsquare modulo 257
Initial: m = 8, c = 3, t = 159, r = 159
159^(2^7) = 1 mod 257
3^(2^(8 - 7 - 1)) = 3 mod 257
Loop 1: m = 7, c = 9, t = 146, r = 220
146^(2^6) = 1 mod 257
9^(2^(7 - 6 - 1)) = 9 mod 257
Loop 2: m = 6, c = 81, t = 4, r = 181
4^(2^3) = 1 mod 257
81^(2^(6 - 3 - 1)) = 249 mod 257
Loop 3: m = 3, c = 64, t = 256, r = 94
256^(2^1) = 1 mod 257
64^(2^(3 - 1 - 1)) = 241 mod 257
Loop 4: m = 1, c = 256, t = 1, r = 38
Solutions:
38^2 = 159 mod 257
219^2 = 159 mod 257

Among Us, the Axiom of Choice, and f(x) = x^a

I was watching Among Us on Twitch yesterday and I noticed that there were two players called “aoc” and “ilhan”. That’s funny, I thought – there’s a mathematician playing Twitch who is calling themselves AOC, referring, of course, to the Axiom of Choice

Gradually I came to realize that AOC actually stood for Alexandria Ocasio-Cortez – and then that the Twitch player in question actually was Alexandria Ocasio-Cortez – and the other was her Squad-mate Ilhan Omar

But anyway, what is the Axiom of Choice?

Suppose you have a bunch of sets A1, A2, A3, …, An, …. For example, every match of Among Us is a group of players. 

And suppose you know that all of these sets are non-empty. This is true for the example – every match has at least four players.

It is natural to suppose that you can choose an element of each of these sets. Pick a player a1 from match A1, a player a2 from match A2, …, a player an from match An, … and eventually you have a bunch of players a1, a2, …, an, … 

If the number of sets is finite, this is a plain truth. Even if the number of sets is infinite, it still seems reasonable intuitively – but this is the Axiom of Choice and it does not follow from the rules usually used to define set theory. You can believe it, or not, and either way the mathematical world that follows hangs together. If you believe it, this is called taking the Axiom of Choice. 

A lot of non-mathematicians and applied mathematicians are bothered by this. They want to know is the Axiom of Choice true or not? Well, it is, or it isn’t, depending on whether you want it to be. This is actually quite a common theme in pure mathematics. It’s about building worlds – systems of rules – and then using the tools available in that world to draw conclusions. Whether the world we built follows the same rules as the world we live in is often irrelevant, so long as you know at any given time which world you’re pretending to be in. 

Consider the function f(x) = xa where x is a real number and a is some constant to be defined later. Is f(x) defined for x < 0? It is, or it isn’t, depending on what world we want to live in. Let’s build some worlds and see what I mean.

The simplest world is where a is always an integer. In that case, the answer is “yes f(x) is defined for x < 0.” Here’s a picture showing a = -2, -1, 0, 1, and 2. (I’m not going to go into what happens at x = 0 for f(x) = x0.*)

graph of x^(-2), x^(-1), x^0, x^1, x^2

Then there’s the world where a is always a rational number p/q. In that case the answer is “maybe: f(x) is defined for x < 0 if q is odd, and is not defined when q is even.**” Here’s a picture showing a = 1/2, 1/3, 2/3. Note that whether f is defined or not depends on whether the fraction is reduced! Consider 1/3 vs. 2/6.

graph of x^(1/2), x^(1/3), x^(2/3)

Finally there’s the world where a is any real number. In that case the answer is “no f(x) is not defined for x < 0.” This allows us to consider f as a one-dimensional slice of the two-dimensional surface g(x, y) = xy which is continuous and nicely behaved (no I’m still not going to go into what happens at (x, y) = (0, 0)*.) Here’s a picture of a = 0.5, 0.333…, 0.6666… 

graph of x^0.5, x^0.333…, x^0.666…

Each of these three worlds exists and is perfectly consistent within itself. Which world we want to visit and do our math in depends on the problem we are trying to solve – each has its own tools. 

End of blog post. 

Why are you still reading? That’s the end. I made my point. With examples and everything. 

Ugh, fine. Okay, there’s a fourth, unifying, world. 

Or we could live in the world where f(z) = za where z is a complex number. Then yes f(z) is defined for all z, and it remains defined even if we later restrict z to be real and negative. Here’s a picture of the complex map with a = 1/2:

complex map of f(z) = z^(1/2)

* 00 = 1. Fight me. 

** Also note that, when defined, f(x) with x < 0 is positive if p is even and negative if p is odd. 

Iterating over permutations, part 3 – Steinhaus-Johnson-Trotter algorithm

We talked about Nariyana Pandita’s method for iterating over permutations in 1300s India, and also B. R. Heale’s completely different algorithm from 1963 Britain.

Yet a third, also completely different, algorithm was discovered by American mathematician Hale Trotter in 1962, and, independently, American mathematician Selmer Johnson in 1963. This is called the Steinhaus–Johnson–Trotter algorithm. I have a PowerShell implementation of this too.

The key concept is to assign each element in the permutation a direction – either left or right. Use < to indicate that the element is looking left, and > to indicate that the element is looking right.

Example: <A <B <C <D

Then the algorithm is follows:

  1. Start with the permutation in ascending order. All elements start looking left.
  2. Define a mobile element as one which is larger than the element it is looking at.
  3. Find the largest mobile element x. (If no elements are mobile, then you are done iterating.)
  4. Swap x with the element it is looking it. This is the next permutation.
  5. Reverse the direction for all elements greater than x.
  6. Repeat steps 2 through 5 until step 3 fails.

Working through the example:

  • <A <B <C <D – move D to the left
  • <A <B <D <C – move D to the left
  • <A <D <B <C – move D to the left
  • <D <A <B <C – move C to the left and reverse direction of D
  • D> <A <C <B – move D to the right
  • <A D> <C <B – move D to the right
  • <A <C D> <B – move D to the right
  • <A <C <B D> – move C to the left and reverse direction of D
  • <C <A <B <D – move D to the left
  • <C <A <D <B – move D to the left
  • <C <D <A <B – move D to the left
  • <D <C <A <B – move B to the left and reverse direction of C and D
  • D> C> <B <A – move D to the right
  • C> D> <B <A – move D to the right
  • C> <B D> <A – move D to the right
  • C> <B <A D> – move C to the right and reverse direction of D
  • <B C> <A <D – move D to the left
  • <B C> <D <A – move D to the left
  • <B <D C> <A – move D to the left
  • <D <B C> <A – move C to the right and reverse direction of C
  • D> <B <A C> – move D to the right
  • <B D> <A C> – move D to the right
  • <B <A D> C> – move D to the right
  • <B <A C> D> – no mobile elements, we’re done

Note that the largest element (in this case, D) goes back and forth, bouncing off of the ends of the permutation.

So (n – 1)/n of the time, the next permutation can be found very quickly (in constant time) by just keeping track of the largest element.

The other 1/n of the time, the largest element is stuck, and a scan is necessary to find the largest mobile element (or lack thereof,) and then to reverse the direction of larger elements. This takes time proportional to n and unsticks the largest element again.

So on average it takes constant time to find the next permutation: O(1) + (O(n)/n) = O(1).