When solving Project Euler problems I frequently need to iterate over prime numbers less than a given n. A Sieve of Eratosthenes method quickly and easily finds the small prime numbers; there are more complicated methods that find larger prime numbers, but with a couple of tweaks the Sieve of Eratosthenes can get quite high.
A naive implementation for finding the set of primes below n will:
- Allocate an array of n booleans, initialized to false.
- Allocate an empty list
- For each i in the range 2 to n:
- If the boolean value at this index in the array is true, i is composite. Skip to the next value and check that.
- If the boolean value at this index in the array is false, i is prime!
- Add i to the list of primes
- For each multiple of i in the range 2i to n, set the boolean value at that index in the array to true
There are a handful of simple optimizations that can be made to this naive implementation:
- Step 3d) will have no effect until the multiple of i reaches i2, so the range can be changed to “i2 to n“
- As a direct consequence of this, step 3d) can be skipped entirely once i2 passes n.
- Instead of allocating an array of n booleans, an array of nbits will suffice.
- All the even-indexed bits are set to true on the first pass. Manually recognize that 2 is prime, and only allocate bits for odd-numbered values. Change the outer loop in 3) to “in the range 3 to n“, incrementing by two each time. Change the loop 3d) to increment by 2i each time.
- Storing the list of primes takes a lot of memory – more than the sieve. Don’t bother creating a list of primes, just write an enumerator that travels the sieve directly.
With these optimizations I can enumerate primes from 2 up to 5 billion (5 * 109) in about seven minutes. Source and binaries attached.
>primes 5000000000 Will enumerate primes <= 5000000000 = 5e+009 Memory for sieve: 298.023 MB Initialization complete: 983 milliseconds since start Sieving to 70711 Sieving complete: 4.70292 minutes since start Picking up the rest to 5000000000 Pickup complete: 6.12252 minutes since start Primes: 234954223 1: 2 23495423: 442876981 46990845: 920233121 70486267: 1410555607 93981689: 1909272503 117477111: 2414236949 140972533: 2924158169 164467955: 3438252577 187963377: 3955819157 211458799: 4476550979 234954221: 4999999883 Enumerating complete: 7.43683 minutes since start Freeing CPrimes object
There are more complicated sieves like the Sieve of Atkin which perform better but at the cost of being much more complex. So far I haven’t had to resort to any of those.