How safe is the prime 2²⁵⁵ – 19?

A while back we talked about safe primes and unsafe primes and we gave some small examples of safe primes and unsafe primes.

Recap: A has a secret key a and a public key (ga mod p) with a known prime p and generator g. It is crucial that an attacker not be able to take the discrete logarithm a = logg ga mod p. If they could do that, the attacker would have A‘s private key, without even having to eavesdrop on any communications!

The Pohlig–Hellman algorithm allows an attacker to perform this discrete logarithm. It takes time proportional to the largest prime factor of p – 1. So if p – 1’s prime factors are all small, the attacker can extract private keys easily from public keys. But if there is at least one large prime factor of p – 1, then we’re OK; the larger, the better.

So the unsafest primes are the Fermat primes where p = 2n + 1 for some n. The largest prime factor of p – 1 is only 2!

The safest primes are those of the form p = 2q + 1 where q is a Sophie Germain prime. The largest prime factor of p – 1 is (p – 1) / 2 = q.

How safe is the prime p = 2255 – 19 = 57896044618658097711785492504343953926634992332820282019728792003956564819949? Let’s factor p – 1 and see.

p – 1 = 2255 – 19 – 1 = 2255 – 20 is divisible by 2 – and even by 4 – but not by 8. So we have p – 1 = 22(2253 – 5). So far so good.

What about 2253 – 5? It’s NOT divisible by 5. What about by 3?

With a little bit of thought we can see that it is divisible by 3. 2 = -1 mod 3 and 5 = -1 mod 3, so substituting we have 2253 – 5 = (-1)253 – (-1) = -1 – (-1) = 0 mod 3. (2253 – 5)/3 doesn’t seem to have a nice concise form but we can just write it out in full.

So far we have p – 1 = 22 3 4824670384888174809315457708695329493886249361068356834977399333663047068329. What about 4824670384888174809315457708695329493886249361068356834977399333663047068329?

A quick poke at it with Miller-Rabin shows that it is NOT prime but does not offer a hint as to any factors. So we turn to brute-force trial division and after some time we find that it is divisible by 65147.

So far we have p – 1 = 22 3 65147 74058212732561358302231226437062788676166966415465897661863160754340907. What about 74058212732561358302231226437062788676166966415465897661863160754340907?

Miller-Rabin suggests that this COULD BE prime. In fact, it is prime. Here’s a proof.

The largest prime factor of p – 1 is pretty big. So p is pretty safe. Good.

We also have to take care when choosing a generator not to fall into any of the small cycles of length 2a 3b 65147c with a ∈ {0, 1, 2}, b ∈ {0, 1}, c ∈ {0, 1}.

ElGamal signing and signature verification but with bigger numbers

Last time we looked at ElGamal encryption and decryption but with bigger numbers. This time let’s look at signing and signature verification.

As before we choose the prime p = 2255 – 19 and the generator g = 2. We’ll let A keep her private key x = 37472449428228375763830797714320932193767211994174601506224685902244496215115 and its associated public key gx mod p = 15440027450603317705548401356286425311467546169515779468886789260083908558514.

We’ll choose a different message this time:

> cd random-biginteger
> .\random-biginteger.ps1 -min 0 -max $p
3271986818346674557032810660246111638748594301259911249641768028517949236833
> $m = [bigint]::Parse("3271986818346674557032810660246111638748594301259911249641768028517949236833")
> $m
3271986818346674557032810660246111638748594301259911249641768028517949236833

A wants to sign the message. She passes the signing routine the prime p, the generator g, her private key x, and the message m. It returns the signature (r, s).

> cd elgamal
> .\sign.ps1 -prime $p -generator $g -signerPrivateKey $x -message $m
Signer chooses a random number k = 24618296997275480932053592551920025862416126779629427099702631482415856443479 such that gcd(k = 24618296997275480932053592551920025862416126779629427099702631482415856443479, p - 1 = 57896044618658097711785492504343953926634992332820282019728792003956564819948) = 1
Signer calculates the signature (r, s) = (3025781994294452178840475154004769894024031449788332246545481203211043144236, 20885180761607031118214763642348557339838023745006310985614125237008647415483)
These are calculated so g^m = y^r r^s mod p
Take r = g^k
Now we have g^m = g^(x r) g^(k s) mod p
Which is to say m = x r + k s mod (p - 1)
We want to solve for s - this is why we needed gcd(k, p - 1) is 1
    * r = (g = 2)^(k = 24618296997275480932053592551920025862416126779629427099702631482415856443479) mod (p = 57896044618658097711785492504343953926634992332820282019728792003956564819949) = 3025781994294452178840475154004769894024031449788332246545481203211043144236
    * s which makes (m = 3271986818346674557032810660246111638748594301259911249641768028517949236833) = (x = 37472449428228375763830797714320932193767211994174601506224685902244496215115) * (r = 3025781994294452178840475154004769894024031449788332246545481203211043144236) + (k = 24618296997275480932053592551920025862416126779629427099702631482415856443479) * (s = 20885180761607031118214763642348557339838023745006310985614125237008647415483) mod (p - 1 = 57896044618658097711785492504343953926634992332820282019728792003956564819948)
Signature is (r, s) = (3025781994294452178840475154004769894024031449788332246545481203211043144236, 20885180761607031118214763642348557339838023745006310985614125237008647415483)
> $signature = @([bigint]::Parse("3025781994294452178840475154004769894024031449788332246545481203211043144236"), [bigint]::Parse("20885180761607031118214763642348557339838023745006310985614125237008647415483"))
> $signature
3025781994294452178840475154004769894024031449788332246545481203211043144236
20885180761607031118214763642348557339838023745006310985614125237008647415483

Now B, the reader of the message, wants to verify that A really wrote it. He passes the signature verification routine the prime p, the generator g, A‘s public key gx, the message m, and the signature (r, s). It returns whether the signature is valid.

It should be valid…

> .\verify-signature.ps1 -prime $p -generator $g -signerPublicKey $g_x -message $m -signature $signature
Signature verifier calculates (g = 2)^(m = 3271986818346674557032810660246111638748594301259911249641768028517949236833) = 14784288759268166937333969698279404318706460490042691327601374955158440934277
Signature verifier calculates = (y = 15440027450603317705548401356286425311467546169515779468886789260083908558514)^(r = 3025781994294452178840475154004769894024031449788332246545481203211043144236) (r = 3025781994294452178840475154004769894024031449788332246545481203211043144236)^(s = 20885180761607031118214763642348557339838023745006310985614125237008647415483) mod (p = 57896044618658097711785492504343953926634992332820282019728792003956564819949) = 14784288759268166937333969698279404318706460490042691327601374955158440934277
Signature is valid
Signature is valid.

… and it is!

Now suppose someone had tampered with the message, e.g. replacing m with m – 1, and tried to pass it off as the original, with A‘s signature. The signature should now be invalid…

> .\verify-signature.ps1 -prime $p -generator $g -signerPublicKey $g_x -message ($m - 1) -signature $signature
Signature verifier calculates (g = 2)^(m = 3271986818346674557032810660246111638748594301259911249641768028517949236832) = 36340166688963132324559731101311679122670726411431486673665083479557502877113
Signature verifier calculates = (y = 15440027450603317705548401356286425311467546169515779468886789260083908558514)^(r = 3025781994294452178840475154004769894024031449788332246545481203211043144236) (r = 3025781994294452178840475154004769894024031449788332246545481203211043144236)^(s = 20885180761607031118214763642348557339838023745006310985614125237008647415483) mod (p = 57896044618658097711785492504343953926634992332820282019728792003956564819949) = 14784288759268166937333969698279404318706460490042691327601374955158440934277
Signature is invalid
Signature is invalid. 

… and it is!

So the same routines all work even if the numbers are big.

ElGamal encryption and decryption but with bigger numbers

Earlier I blogged about implementing ElGamal encrypt, decrypt, sign, and verify signature. I had coded up a toy example which used small numbers to show how the math worked.

Now let’s do the same thing but with bigger numbers. We’ll just do encryption and decryption this time.

First we need a prime p. Before we used p = 101. This time we’ll use p = 2255 – 19:

> $p = [bigint]::pow(2, 255) - 19
> $p
5789604461865809771178549250434395392663499233282028201972879200395656481994

That’s nice and big.

Now we need a generator g. Last time we used g = 2. We’ll use it again, but make sure using Tonelli-Shanks that 2 is not a square mod p.

> cd tonelli-shanks
> .\tonelli-shanks.ps1 -p $p -n 2
Looking for r so that r^2 = 2 mod 57896044618658097711785492504343953926634992332820282019728792003956564819949
Skipping primality check since 57896044618658097711785492504343953926634992332820282019728792003956564819949 is large
r^2 = 2 mod 57896044618658097711785492504343953926634992332820282019728792003956564819949 has no solutions
> $g = [bigint]2
> $g
2

It’s important to do this check. If we had chosen g = 3, for example, the check would have failed, because 3 = 150298394704333910222651756369397732876262961010368454990880792759863347428352 mod 57896044618658097711785492504343953926634992332820282019728792003956564819949.

> .\tonelli-shanks.ps1 -p $p -n 3
Looking for r so that r^2 = 3 mod 57896044618658097711785492504343953926634992332820282019728792003956564819949
Skipping primality check since 57896044618658097711785492504343953926634992332820282019728792003956564819949 is large
57896044618658097711785492504343953926634992332820282019728792003956564819949 - 1 = 14474011154664524427946373126085988481658748083205070504932198000989141204987 * 2^2
2 is a nonsquare modulo 57896044618658097711785492504343953926634992332820282019728792003956564819949
Initial: m = 2, c = 19681161376707505956807079304988542015446066515923890162744021073123829784752, t = 1, r = 15029839470433391022265175636939773287626296101036845499088079275986334742835
Solutions:
15029839470433391022265175636939773287626296101036845499088079275986334742835^2 = 3 mod 57896044618658097711785492504343953926634992332820282019728792003956564819949
42866205148224706689520316867404180639008696231783436520640712727970230077114^2 = 3 mod 57896044618658097711785492504343953926634992332820282019728792003956564819949

So we use g = 2.

Now we need a recipient A. A needs a private key x which determines her public key gx mod p. Generate x at random between 0 (inclusive) and p (exclusive).

> cd random-biginteger
> .\random-biginteger.ps1 -min 0 -max $p
37472449428228375763830797714320932193767211994174601506224685902244496215115
> $x = [bigint]::Parse("37472449428228375763830797714320932193767211994174601506224685902244496215115")
> $x
37472449428228375763830797714320932193767211994174601506224685902244496215115
> $g_x = [bigint]::ModPow($g, $x, $p)
> $g_x
15440027450603317705548401356286425311467546169515779468886789260083908558514

We want to encrypt a message so that only A can read it, using her public key. So we need a plaintext message m. Generate m at random in similar fashion.

> .\random-biginteger.ps1 -min 0 -max $p
22594335182736179043379580066191528397421893149471409561964986373640506607991
> $m = [bigint]::Parse("22594335182736179043379580066191528397421893149471409561964986373640506607991")
> $m
22594335182736179043379580066191528397421893149471409561964986373640506607991

Now we’re ready to encrypt! Give the encryption routine the prime p, the generator g, the recipient’s public key gx, and the plaintext message m. It will return the ciphertext (c1, c2).

> cd elgamal
> .\encrypt.ps1 -prime $p -generator $g -recipientPublicKey $g_x -clearText $m
Encryptor chooses a random number k = 6095530983035662394415401229300894157247477933650052462284854503095419296799
Encryptor calculates the key K = (y = 15440027450603317705548401356286425311467546169515779468886789260083908558514)^(k = 6095530983035662394415401229300894157247477933650052462284854503095419296799) mod (p = 57896044618658097711785492504343953926634992332820282019728792003956564819949) = 652128534616280580041338757470258024490635838501926874091259885745358830160
Encryptor calculates the encrypted message (c1, c2) = (25051380001384562314959307721043701048687850352094011138061884132819862724667, 46849668020527693475159053259226199454750762501012280228897524972061871412373)
    * c1 = (g = 2)^(k = 6095530983035662394415401229300894157247477933650052462284854503095419296799) mod (p = 57896044618658097711785492504343953926634992332820282019728792003956564819949) = 25051380001384562314959307721043701048687850352094011138061884132819862724667
    * c2 = (K = 652128534616280580041338757470258024490635838501926874091259885745358830160)(m = 22594335182736179043379580066191528397421893149471409561964986373640506607991) mod (p = 57896044618658097711785492504343953926634992332820282019728792003956564819949) = 46849668020527693475159053259226199454750762501012280228897524972061871412373
Encryption is (c1, c2) = (25051380001384562314959307721043701048687850352094011138061884132819862724667, 46849668020527693475159053259226199454750762501012280228897524972061871412373)
> $c = @([bigint]::Parse("25051380001384562314959307721043701048687850352094011138061884132819862724667"), [bigint]::Parse("46849668020527693475159053259226199454750762501012280228897524972061871412373"))
> $c
25051380001384562314959307721043701048687850352094011138061884132819862724667
46849668020527693475159053259226199454750762501012280228897524972061871412373

We pass the ciphertext (c1, c2) to A and now she wants to read it. She gives the decryption routine the prime p, the generator g, her private key x, and the ciphertext (c1, c2). It had better return the plaintext m

> .\decrypt.ps1 -prime $p -generator $g -recipientPrivateKey $x -cipherText $c
Recipient wants to decrypt the message
Recipient recovers K = (c1 = 25051380001384562314959307721043701048687850352094011138061884132819862724667)^(x = 37472449428228375763830797714320932193767211994174601506224685902244496215115) mod (p = 57896044618658097711785492504343953926634992332820282019728792003956564819949) = 652128534616280580041338757470258024490635838501926874091259885745358830160
Recipient calculates m = (c2 = 46849668020527693475159053259226199454750762501012280228897524972061871412373)/(K = 652128534616280580041338757470258024490635838501926874091259885745358830160) mod (p = 57896044618658097711785492504343953926634992332820282019728792003956564819949) = 22594335182736179043379580066191528397421893149471409561964986373640506607991
Decrypted message 22594335182736179043379580066191528397421893149471409561964986373640506607991
> $m
22594335182736179043379580066191528397421893149471409561964986373640506607991

… and it does! Yay.

My online identities

These are the places I hang out online, and the handle I use in each places. If you see someone claiming to be me, and it’s not one of the handles on this list, it’s probably not me.

Why did Trump lose?

It is now reasonably clear that Donald Trump has lost his bid for re-election as President of the United States.

Trump says he lost because Biden cheated. If that’s true, we should try to find out how, so we can make sure it doesn’t happen again.

But if it isn’t true – if the election fairly represented the will of the people – there must be another explanation for why Trump lost.

What is it?

In 1948, economist Duncan Black published an article On the Rationale of Group Decision-Making in the Journal of Political Economy.

The TL:DR is “a majority rule voting system will select the outcome most preferred by the median voter” – this is called the median voter theorem.

So how did Trump lose the median voter?

Trump is a bit to the right of center. Clinton was a bit to the left of center. So their election was a bit of a tossup.

Biden is ABSOLUTELY SMACK DAB IN THE MIDDLE OF THE CENTER. Like, if you look up “center” in the dictionary, there’s just a picture of Biden.

So the surprising thing to me is not that Biden won the general election. The surprising thing to me is that Biden won the Democratic nomination!

Okay, so how did Biden win the Democratic nomination?

BECAUSE TRUMP AGGRESSIVELY REJECTED CENTRIST REPUBLICANS.

Here’s a rough picture of the distribution of the electorate:

In his four years as President, Trump has consistently placed party loyalty over every other interest. If you’re a moderate or far-right Republican, this is attractive to you. But if you’re a centrist Republican, this puts you in his cross-hairs a lot. A lot of centrist Republicans toughed it out. But a lot didn’t, and left the Republican party, to become independents or Democrats.

THIS SHIFTED THE MAKEUP OF THE DEMOCRATIC PARTY TO THE CENTER.

Because the Democratic Party was now more centrist, they chose a more centrist candidate. And by the median voter theorem, the more centrist candidate won the election.

Love moderately. Long love doth so.
Too swift arrives as tardy as too slow.

William Shakespeare, Romeo and Juliet

Running node.js on Ubuntu on Windows 10 on ARM64

I have an ARM64 Windows 10 machine as my primary device (basically a Surface Pro X) and I occasionally contribute to open-source projects that are built on top of node.js. When making code changes it is very important to run all the tests, and usually add some new ones too.

Sometimes the tests just work by running node.js on Windows 10 directly. But sometimes there is some peculiarity in the test that makes it run differently on Linux. So I install Ubuntu using Windows Services for Linux (WSL) and run the tests there too.

To get node.js tests running under Ubuntu on Windows 10 on ARM64 here’s what I did:

  1. Install Windows Services for Linux (WSL) as described at Install Windows Subsystem for Linux (WSL) on Windows 10.
  2. Install a Linux distribution as described in the above link Step 6. I chose Ubuntu 20.04 LTS. At this point Ubuntu shows up in the Start menu, so I pin it.
Ubuntu in the Windows 10 Start menu

Ubuntu gives a prompt that looks like this – set up a username and password.

Installing, this may take a few minutes…
Please create a default UNIX user account. The username does not need to match your Windows username.
For more information visit: https://aka.ms/wslusers
Enter new UNIX username: mvaneerde
New password:
Retype new password:
passwd: password updated successfully
Installation successful!
To run a command as administrator (user "root"), use "sudo ".
See "man sudo_root" for details.
 
Welcome to Ubuntu 20.04.1 LTS (GNU/Linux 4.19.128-microsoft-standard aarch64)

  * Documentation:  https://help.ubuntu.com
  * Management:     https://landscape.canonical.com
  * Support:        https://ubuntu.com/advantage

 System information as of Sun Nov 29 07:23:07 PST 2020
 System load:  0.0                Processes:             8
   Usage of /:   [redacted]         Users logged in:       0
   Memory usage: 0%                 IPv4 address for eth0: [redacted]
   Swap usage:   0%

0 updates can be installed immediately.
0 of these updates are security updates.

The list of available updates is more than a week old.
To check for new updates run: sudo apt update


This message is shown once once a day. To disable it please create the
/home/mvaneerde/.hushlogin file.
mvaneerde@[redacted]:~$

Run these commands immediately. If all goes well the second update will tell you that you are up to date.

$ sudo apt update
$ sudo apt upgrade
$ sudo apt update

I store all of my repositories under ${env:userprofile}\source which points to something like C:\Users\mateer\source so to make all of this conveniently accessible in Linux I create a soft link:

$ ln -s /mnt/c/users/mateer/source ~/source

That’s basically it. I install node.js and npm:

$ sudo apt install nodejs
$ sudo apt install npm

And now I can cd to my repository of choice and run sudo npm install to get all the dependencies I need to build and run tests.

Generating random numbers, part 4 – generate a random System.Numerics.BigInteger using System.Random

Posts in this series:

This is a PowerShell implementation of the two algorithms described in part 3. System.Random is the source of entropy. I return a System.Numerics.BigInteger X such that min <= X < max for a given min and max. To simplify things I define N = maxmin, generate a random number such that 0 <= X < N, and then return X + min at the end.

Here’s an example invocation:

5694168000071045347676722039165259190054180319143041473488317085119130071360

.\random-biginteger.ps1 -min 0 -max ([System.Numerics.BigInteger]::Pow(2, 255) – 19)

Helper function Get-RandomBit keeps a reservoir of 8 random bits from System.Random and hands them out one at a time, replenishing it as necessary.

Get-RandomBigIntegerNaive uses the “naïve algorithm” of building the whole X and then comparing it to N at the end.

Get-RandomBigInteger uses the “better algorithm” of building X a bit at a time, comparing against the corresponding bit of N as we go, until it’s clear whether the ultimate X will be < N or >= N.

As a performance optimization, Get-RandomBigInteger switches from operating on individual bits to operating on whole byte sequences as soon as it can. Helper function Get-RandomBytes simplifies this. Unfortunately PowerShell does not support assigning to array slices. I didn’t bother with this performance optimization in the naïve algorithm.

Let d = ⌊ log2 N ⌋ so 2d <= N <= 2d + 1 – 1. Note that d + 1 is the number of bits in N. We expect the following entropy usage from each algorithm:

NBits of entropy used
NaïveBetter
2d2d + 2d
2d + 12d + 2d + 4
2d + 1 – 1d + 1d + 1

It is possible to improve the performance of the naïve algorithm to use only d bits of entropy in the 2d case, instead of 2d + 2. I didn’t bother.

Get-BitsOfEntropyUsed and Clear-BitsOfEntropyUsed allow measuring this in testing. Here are some results with N = 26 = 64 to 27 = 128, showing the transition from d = 6 to d = 7. Other than the power of 2 at the beginning and the end, we expect the naïve algorithm to gradually ramp down from 2(6) + 2 = 14 to 6 + 1 = 7, and we expect the better algorithm to gradually ramp down from 6 + 4 = 10 to 6 + 1 = 7.

Flips per integer 64: naive 58.919, better 6
Flips per integer 65: naive 13.832, better 9.816
Flips per integer 66: naive 13.755, better 9.571
Flips per integer 67: naive 13.349, better 9.616
Flips per integer 68: naive 13.286, better 9.575
Flips per integer 69: naive 12.719, better 9.506
Flips per integer 70: naive 12.782, better 9.299
Flips per integer 71: naive 12.138, better 9.117
Flips per integer 72: naive 12.803, better 8.978
Flips per integer 73: naive 12.033, better 8.966
Flips per integer 74: naive 12.138, better 9.085
Flips per integer 75: naive 12.187, better 8.855
Flips per integer 76: naive 12.026, better 8.725
Flips per integer 77: naive 11.564, better 8.592
Flips per integer 78: naive 11.62, better 8.595
Flips per integer 79: naive 11.347, better 8.362
Flips per integer 80: naive 10.857, better 8.476
Flips per integer 81: naive 11.361, better 8.793
Flips per integer 82: naive 10.955, better 8.59
Flips per integer 83: naive 10.325, better 8.444
Flips per integer 84: naive 10.64, better 8.433
Flips per integer 85: naive 10.276, better 8.334
Flips per integer 86: naive 10.682, better 8.244
Flips per integer 87: naive 10.059, better 8.12
Flips per integer 88: naive 10.276, better 8.024
Flips per integer 89: naive 10.087, better 8.183
Flips per integer 90: naive 9.905, better 8.017
Flips per integer 91: naive 10.024, better 8.015
Flips per integer 92: naive 10.087, better 8.006
Flips per integer 93: naive 9.723, better 7.909
Flips per integer 94: naive 9.716, better 7.804
Flips per integer 95: naive 9.107, better 7.81
Flips per integer 96: naive 8.939, better 7.668
Flips per integer 97: naive 9.373, better 8.196
Flips per integer 98: naive 9.072, better 7.969
Flips per integer 99: naive 9.086, better 8.208
Flips per integer 100: naive 8.939, better 8.085
Flips per integer 101: naive 8.792, better 8.095
Flips per integer 102: naive 8.687, better 7.929
Flips per integer 103: naive 8.743, better 7.849
Flips per integer 104: naive 8.575, better 7.785
Flips per integer 105: naive 8.421, better 7.769
Flips per integer 106: naive 8.498, better 7.72
Flips per integer 107: naive 8.435, better 7.66
Flips per integer 108: naive 8.288, better 7.656
Flips per integer 109: naive 8.26, better 7.577
Flips per integer 110: naive 8.148, better 7.507
Flips per integer 111: naive 8.057, better 7.497
Flips per integer 112: naive 8.127, better 7.432
Flips per integer 113: naive 7.931, better 7.7
Flips per integer 114: naive 7.791, better 7.541
Flips per integer 115: naive 7.805, better 7.523
Flips per integer 116: naive 7.721, better 7.479
Flips per integer 117: naive 7.693, better 7.461
Flips per integer 118: naive 7.567, better 7.386
Flips per integer 119: naive 7.518, better 7.323
Flips per integer 120: naive 7.336, better 7.272
Flips per integer 121: naive 7.35, better 7.297
Flips per integer 122: naive 7.364, better 7.327
Flips per integer 123: naive 7.322, better 7.145
Flips per integer 124: naive 7.273, better 7.155
Flips per integer 125: naive 7.168, better 7.143
Flips per integer 126: naive 7.133, better 7.084
Flips per integer 127: naive 7.042, better 7.042
Flips per integer 128: naive 16.208, better 7

.\Tests.ps1

And here are some results with N = 230 = 1073741824 to 231 = 2147483648. We expect the naïve algorithm to gradually ramp down from 2(30) + 2 = 62 to 30 + 1 = 31, and we expect the better algorithm to gradually ramp down from 30 + 4 = 34 to 30 + 1 = 31.

Flips per integer 1073741824: naive 60.605, better 30
Flips per integer 1073741825: naive 62.031, better 33.81
Flips per integer 1073741826: naive 62, better 34.251
Flips per integer 1073741827: naive 61.597, better 33.896
Flips per integer 1073741828: naive 61.132, better 33.944
Flips per integer 1073741829: naive 62.899, better 33.932

Flips per integer 2147483643: naive 31, better 31
Flips per integer 2147483644: naive 31, better 31
Flips per integer 2147483645: naive 31, better 31
Flips per integer 2147483646: naive 31, better 31
Flips per integer 2147483647: naive 31, better 31
Flips per integer 2147483648: naive 63.552, better 31

.\Tests.ps1

Testing primeness of really big numbers with the Miller-Rabin algorithm

Cryptography deals with very large primes quite a bit, for example 2255 – 19 = 57896044618658097711785492504343953926634992332820282019728792003956564819949.

There are lots of ways to test whether a number p is prime:

One of the techniques that sees a lot of use in practice is the Miller–Rabin primality test. I wrote an implementation in PowerShell. Here’s some example output, testing the prime 2255 – 19, and the composite 2255 – 63:

.\miller-rabin.ps1 -prime ([System.Numerics.BigInteger]::Pow(2, 255) - 19) -rounds 10000
57896044618658097711785492504343953926634992332820282019728792003956564819949 = 2^2 14474011154664524427946373126085988481658748083205070504932198000989141204987 + 1
All witnesses state that 57896044618658097711785492504343953926634992332820282019728792003956564819949 COULD BE PRIME

.\miller-rabin.ps1 -prime ([System.Numerics.BigInteger]::Pow(2, 255) - 63) -rounds 10000
57896044618658097711785492504343953926634992332820282019728792003956564819905 = 2^6 904625697166532776746648320380374280103671755200316906558262375061821325311 + 1
11328271172437154995197519604260009402971642463559414996414256118294343278719^(2^r 904625697166532776746648320380374280103671755200316906558262375061821325311) with r = 0 to (6 - 1) has no 1 or -1 so 57896044618658097711785492504343953926634992332820282019728792003956564819905 is DEFINITELY COMPOSITE

The way Miller-Rabin works is kind of like Among Us. We have a player p who claims to be an innocent prime, but might actually be an evil composite.

So we surround p with witnesses who keep an eye on her activities. If we see her vent or kill, then AH-HAH! She is a composite! Vote her off!

If we don’t see her vent or kill, even though we have lots of witnesses around her, then there are two possibilities: either she’s legitimately an innocent crewmate, or she’s a very sneaky impostor who is able to pull off her killing and venting in such a way as to evade the gaze of our witnesses.

So Miller-Rabin will never tell you for sure that a number is prime. It will only tell you a number is DEFINITELY COMPOSITE or PROBABLY PRIME.

Generating random numbers, part 3 – generate a random integer using random bits

Posts in this series:

So far we’ve looked at generating a random choice out of seven, or out of five. How do we generalize this?

Problem statement

Given a positive integer N, choose a random non-negative integer X uniformly distributed among {0, 1, … N – 1} = {x: 0 <= x < N}, using only flips of a fair coin. If F is the number of flips required, try to minimize E(F).

The first blog post dealt with the case where N = 7, and the second where N = 5.

Write N in base 2 as nd 2d + xd – 1 2d – 1 + … + n1 21 + x0 20 = Σi = 0 to d ni 2i where d is the smallest integer >= log2 N and each ni ∈ {0, 1}.

Note that by construction, nd = 1.

For ease of notation, we can write N = 0bndnd – 1n1n0. We will put a ` before every n4k + 3 as a separator, in much the same way as , is used as separator in decimal numbers.

Some examples:

N Binary d ni
7 0b111 2 n2 = 1, n1 = 1, n0 = 1
5 0b101 2 n2 = 1, n1 = 0, n0 = 1
128 0b1000`0000 7 n7 = 1, n6 = 0, n5 = 0, n4 = 0, n3 = 0, n2 = 0, n1 = 0, n0 = 0
129 0b1000`0001 7 n7 = 1, n6 = 0, n5 = 0, n4 = 0, n3 = 0, n2 = 0, n1 = 0, n0 = 1
204 0b1100`1100 7 n7 = 1, n6 = 1, n5 = 0, n4 = 0, n3 = 1, n2 = 1, n1 = 0, n0 = 0
255 0b1111`1111 7 n7 = 1, n6 = 1, n5 = 1, n4 = 1, n3 = 1, n2 = 1, n1 = 1, n0 = 1

We wish to choose X = 0bxdxd – 1x1x0 = Σi = 0 to d xi 2i where each xi ∈ {0, 1} and X < N.

If N = 2d, this is easy. xd must be 0, and the other xi can be assigned arbitrarily. We can do this a single pass with exactly d coin flips. But what if N is not a power of 2?

A naïve algorithm

The naïve algorithm used in the first blog post is:

  1. Start with X ← 0.
  2. Repeat the following steps exactly d + 1 times.
    • Let X ← 2X.
    • Flip a coin. This is xi.
    • If it comes up Tails, let XX + 1.
  3. If X >= N then start over with step 1.

Performance of the naïve algorithm

How many flips does this take?

Let r be the chance that the algorithm produces a result on the first pass. Let F1 be the number of flips it takes on the first pass, and let F be the number of flips it takes overall.

Then E(F) = E(F1) + r(0) + (1 – r) E(F). Solving for E(F) we have E(F) = E(F1) / r.

  • r = N/2d + 1. This is somewhere between 1/2 and 1.
  • E(F1) = d + 1.
  • So E(F) = (d + 1) 2d + 1 / N.

In the best case, N = 2d + 1 – 1 and all the ni are 1. E(F) = (d + 1) 2d + 1 / (2d + 1 – 1) = (d + 1) / (1 – 2-(d + 1)). This is basically d + 1, divided by a corrective term that gets closer and closer to 1 as d increases.

In the worst case, N = 2d + 1 and all the ni are 0 except nd = n0 = 1. E(F) = (d + 1) (2d + 1) / (2d + 1) = (d + 1) 2 / (1 – 2d). This is basically 2d + 2, divided by a corrective term that gets closer and closer to 1 as d increases.

So if we blur our eyes a little bit, the number of flips for the naïve algorithm varies from d + 1 to 2d + 2 (with a small corrective factor) depending on the density of 1s in ni.

A better algorithm

The key to the better algorithm hinted at in part 2 is to do much the same thing, but stop flipping as soon as the current pass is doomed. This lowers E(F1).

Find the most significant “on” bit nd = 1. Also find the least significant “on” bit ns = 1 with ni = 0 for all i < s.

These may be the same, in which case N is a power of 2.

N Binary Most significant “on” bit Least significant “on” bit
7 0b111 0b100 0b001
5 0b101 0b100 0b001
128 0b1000`0000 0b1000`0000 0b1000`0000
129 0b1000`0001 0b1000`0000 0b0000`0001
204 0b1100`1100 0b1000`0000 0b0000`0100
255 0b1111`1111 0b1000`0000 0b0000`0001

If N is a power of 2, the algorithm is simple. Set xd = 0, and each of the other xi to a random bit. This can be done in d coin flips. Call this running the table on xd – 1 through x0.

Otherwise, generate random bits one at a time, starting at xd. If your xi < ni, run the table and you’re done. If your xi > ni, you’re doomed; abort the pass and start over. If your xi = ni then you’re on the edge; move on to the next bit and repeat until you’ve exhausted xs.

More explicitly:

  1. Start with id.
  2. Generate a random bit xi.
  3. If xi < ni, we have won. Run the table on xi – 1 through x0. We have chosen X = 0bxdxd – 1x1x0.
  4. If xi > ni, we’re doomed, because X > N, which is too big. No result. Discard all the xi generated so far and go back to step 1.
  5. If xi = ni, we’re on the edge. See if we have any “on” bits remaining:
    • If i > s, we still have a chance – move on to the next bit. Save xi, set ii – 1, and loop around to step 2 again.
    • If i = s, we’re doomed, because X >= N, which is too big. No result. Discard all the xi generated so far and go back to step 1.

Performance of the better algorithm

How many coin flips does this take?

As before, if N = 2d then E(F) = d

As before, if N is not a power of 2:

  • E(F) = E(F1) / r
  • r = N / 2d + 1

As before, in the best case, N = 2d + 1 – 1 and all the ni are 1. E(F1) = d + 1 and E(F) = (d + 1) / (1 – 2-(d + 1)).

What about the worst case N = 2d + 1 where nd = 1, n0 = 1, and all the rest of the ni are 0?

As before, r = N / 2d + 1 = (2d + 1) / 2d + 1 = 2-1 + 2-(d + 1). This is basically 1/2 plus a corrective factor that gets smaller as d increases.

But E(F1) is lower now. Let’s calculate it:

Half of the time, xd = 0 and we get a result in d + 1 flips.

1/4 of the time, xd = 1 and xd – 1 = 1 and we get no result in 2 flips.

1/8 of the time, xd = 1, xd – 1 = 0, and xd – 2 = 1 so we get no result in 3 flips.

1/2i of the time, xd = 1, xd – 1 = 0 through xdi + 2 = 0, and xdi + 1 = 1 so we get no result in i flips.

1/2d of the time, xd = 1, xd – 1 = 0 through x2 = 0, and x1 = 1 so we get no result in d flips.

1/2d + 1 of the time, xd = 1, xd – 1 = 0 through x1 = 0, and x0 = 1 so we get no result in d + 1 flips.

1/2d + 1 of the time, xd = 1, and xd – 1 = 0 through x0 = 0, so we get a result of X = 2d = N – 1 in d + 1 flips.

Adding these up, we have:

E(F1) = (d + 1)/2 + (2/4 + 3/8 + … + i/2i + … + d/2d + (d + 1)/2d + 1) + (d + 1)/2d + 1

= (d + 1)(2-1 + 2-(d + 1)) + Σi = 2 to d + 1 i/2i

= (d + 1)(2-1 + 2-(d + 1)) – 1/2 + Σi = 1 to d + 1 i/2i

So E(F) = E(F1) / r = ((d + 1)(2-1 + 2-(d + 1)) – 1/2 + Σi = 1 to d + 1 i/2i) / (2-1 + 2-(d + 1))

= d + 1 – 1/(1 + 2d) + 2(Σi = 1 to d + 1 i/2i) / (1 + 2d)

As d gets large, 2d gets close to 0, and Σi = 1 to d + 1 i/2i gets close to 2. (For a derivation of this fact see Solution: baby boys and baby girls.) So:

E(F1) approaches d + 1 – 1/(1 + 0) + 2(2) / (1 + 0) = d + 4.

So the number of flips for the better algorithm varies from d + 1 to d + 4 (with a small corrective factor) depending on the density of 1s in ni. This is a vast improvement over the naïve algorithm which varied from d + 1 to 2d + 2.

Next time, we’ll take a look at an implementation of this algorithm in PowerShell which generates a System.Numerics.BigInteger using System.Random.

Generating random numbers, part 2 – using a coin to choose a weekday

Posts in this series:

Last time we talked about how to use a fair coin to generate a random day of the week.

What if instead of generating a day of the week, we want to generate a weekday? That is, Saturday and Sunday shouldn’t count.

We can, of course, use exactly the same approach as before:

Coin flipWeekday
Heads, Heads, HeadsMonday
Heads, Heads, TailsTuesday
Heads, Tails, HeadsWednesday
Heads, Tails, TailsThursday
Tails, Heads, HeadsFriday
Tails, Heads, TailsNo result
Tails, Tails, HeadsNo result
Tails, Tails, TailsNo result

This time, E(L) = 1 + (3/8) E(L) so E(L) = 8/5 = 1.6 so E(F) = 24/5 = 4.8. Curiously, it takes more flips to choose an element of a smaller set!

Suppose that entropy is expensive. For example, suppose that the coin is one of the larger Rai stones used by the Yapese. We would like to flip this as few times a possible.

Rai stone

Can we design a strategy that does better than E(F) = 4.8 without sacrificing randomness?

KEY INSIGHT: If the first two of the coin flips come up (Tails, Tails), then it doesn’t matter what the result of the third flip is – either way, we’re guaranteed a “no result”. So don’t bother to flip the coin the third time.

NEW STRATEGY: Flip the coin twice and perform the appropriate action below.

  • (Heads, Heads): flip the coin once more and perform the appropriate action below.
    • Heads: Monday.
    • Tails: Tuesday.
  • (Heads, Tails): flip the coin once more and perform the appropriate action below.
    • Heads: Wednesday.
    • Tails: Thursday.
  • (Tails, Heads): flip the coin and perform the appropriate action below.
    • Heads: Friday.
    • Tails: no result.
  • (Tails, Tails): no result.

As before, E(L) = 1 + (3/8) E(L) so E(L) = 8/5 = 1.6. So we expect to have to repeat the procedure the same number of times.

But this time, F = 3L no longer holds! It’s cheaper than that.

Let F1 be the random variable for the number of flips ONLY in the first pass. What is E(F1)?

If the first two flips are (Heads, Tails) then F1 = 2. For all other results, F1 = 3. So E(F1) = (3/4) 3 + (1/4) 2 = 11/4.

Putting it together, E(F) = E(F1) E(L) = (11/4) (8/5) = 22/5 = 4.5. We’re saving 0.3 of a flip every time! This may not seem like a lot, but it adds up.

Exercise: we have designed strategies to choose elements from sets of size 7 and size 5. How do we generalize this approach?