Prime numbers are ridiculously important in cryptography, and I recently found myself needing a way to generate them (for a toy implementation of the Diffie-Hellman key exchange algorithm). It keeps amazing me how few languages actually have libraries for generating prime numbers, or even just for primality testing.
Because I didn’t feel like thinking, and I didn’t need incredibly massive primes, I just wrote a quick implementation of the sieve of Eratosthenes, which is an interesting enough algorithm that I’d thought I’d share it (this is first-year stuff even for us, but I guess I have non-programmers in my audience).
The basic idea is that you start with a list of integers from 2 up to the number you want to find the largest prime smaller than. You then take the smallest number in this list (2), and cross off all multiples of this number (except for 2 itself). You repeat this for every other number, in order, and you’ll be left with just a list of primes.
Wikipedia has a visualisation which could be clearer but is still pretty good:
The implementation (in Python) is dead simple:
def sieve1(n):
l = range(n)
l[0], l[1] = None, None
for i in xrange(int(__import__('math').sqrt(n)) + 1):
if l[i] is not None:
j = i * 2
while j < n:
l[j] = None
j += i
return filter(None, l)
We could just use range(2, n) and then work with an offset, but the convenience of having the indices equal to the values at those indices is too useful to complain about the extra memory required for two additional integers. Note also the obvious optimisation of only going up to sqrt(n) + 1 (no apologies for the unorthodox import).
filter(None, l) gets rid of the Nones in our list. Passing None as the function argument just applies the identity function, which has the effect of getting rid of None, False, 0, and empty lists, tuples, and dictionaries (anything which would evaluate to False in an if statement).
This is nice, but it does have the disadvantage of having to generate the whole list every time you want more primes. You can’t just input a list of pregenerated primes and have it continue from there.
The fix is easy enough:
def sieve2(n, primes = [2, 3]):
if n <= primes[-1]:
return filter(lambda m: m <= n, primes)
offset = primes[-1] + 1
l = range(offset, n)
max = int(__import__('math').sqrt(n))
for p in primes:
if p > max:
break
i = p * 2 - offset
while i < 0:
i += p
while i < len(l):
l[i] = None
i += p
for k in xrange(max - offset + 1):
if l[k] is not None:
i = k * 2 + offset
while i < len(l):
l[i] = None
i += l[k]
l = filter(None, l)
primes.extend(l)
return primes
This function uses memoisation, and takes advantage of a feature of Python’s function defaults which some believe to be a bug: if mutable default arguments are altered, they will remain altered for the next time the function is called.
This is a side-effect of the fact that functions have a single field that holds the default arguments (a tuple called func_defaults in the function’s namespace), which isn’t duplicated into a working copy when the function is called. Most default arguments will be things like integers and strings, which are immutable, so most people never even run into this. Being able to use it for memoisation is really handy, though. Alternatively, you could just use a global variable and access it from inside the function.
The algorithm is basically the same, except that if we’ve already generated the primes requested, we can just return them. Then we construct a list of all integers larger than our largest prime, and sieve it with our existing primes (now we really do have to work with an offset). Then we just sieve the rest classically. At the end, we add our newly sieved list to our old primes, and return that.
Next time we invoke the function, primes will still have the new primes. You can access them from outside the function at sieve2.func_defaults[0].
Evaluating sieve2(100) and then sieve2(1000) will be slower than just evaluating sieve1(1000), but it will be faster than evaluating sieve1(100) and then sieve1(1000). And if you happen to have a large number of pregenerated primes, you can save some more time.
The sieve of Eratosthenes is just one of a number of sieve-based prime number generators, but it’s the oldest one, and it compares favorably to modern improvements.