Controversy!
If you follow these things at all, you’ve probably heard by now: creationists are once again inventing a controversy where there is none, this time by pretending that it matters at all whether Dawkins’ venerable Weasel program uses locking or not, and claiming that his “unwillingness” to dig up code written in the ’80s and release it means… well, something significant.
In case you’ve forgotten what the Weasel program is, here’s a video (using a different phrase, but the same concept):
If you’re a long-time reader and that looks familiar, it’s because I’ve talked about it twice before. The experiment is simple enough that any idiot can repeat it, but of course creationists are a very special kind of idiot.
So this time, let’s walk through writing our own Weasel program and settle this once again.
I’ll be doing this in a kind of “literate Python”, because everyone understands Python and it doesn’t require compilation, so even the most technologically inept don’t have any excuse not to follow along.
If you don’t have Python installed (or aren’t sure if you do), get it here. Get the 2.6.2 one (if you aren’t sure which you need, you’ll need this one). If that’s too hard already, you shouldn’t be on the Internet in the first place.
I’ll be preceding lines of Python code with > signs, façon literate Haskell. The Python interpreter doesn’t understand this style, so I’ll also be providing a link to the final script at the end.
To recap, we’ll start with a random string composed of symbols chosen from a specific alphabet, and a target string which we’re hoping to achieve.
For randomness, we’ll be using Python’s inbuilt random module, so let’s import it.
> import random
The genetic alphabet is just “CGTA” for DNA, but ours will be a bit longer:
> alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ "
Note that that includes the space at the end. We could make it even longer by including minuscules and punctuation, but it really makes no difference to the principle of the thing.
And here’s our target string:
> target = "METHINKS IT IS LIKE A WEASEL"
Of course it’s important that all symbols in the target string are also in the genetic alphabet, or we’ll never find it.
At the heart of every genetic algorithm, there’s a fitness function. In our case, this is just something that compares our string of “DNA” with the target string symbol by symbol, and just says how many symbols match. Strings with more matching symbols are obviously more like the target string, and will be selected to breed for the next generation.
> def fitness(child): > fit = 0 > > for a, b in zip(child, target): > if a == b: > fit += 1 > > return fit
The built-in zip function pairs items in a given list, creating a pair of the first character in both the “organism” and the target string, then the second, then the third, and so on. For each pair, it’s then going to check if both members of the pair are the same symbol. If they are, the fitness value is incremented by one. At the end, it’s returned.
This should be obvious.
Equally important, of course, is the reproduction function. Dawkins’ Weasel strings reproduce asexually, so there’s only one parent for each child. The parent copies his entire “DNA”, and each locus has a small chance of mutating.
How high the mutation rate is isn’t that important, because the point of the Weasel program isn’t to simulate real-life life perfectly, but just to demonstrate that descent with modification is more effective than a random search. Let’s give our strings a mutation rate of 1 chance in 50 at each locus. If that seems too high, remember that our genome will be 28 loci in length, so there’ll already be a lot of generations where no mutation will happen at all.
> def reproduce(parent): > child = "" > for gene in parent: > child += random.choice(alphabet) if random.randint(1, 50) == 1 \ > else gene > return child
Did you follow that? Our child starts off as an empty string, and then we iterate over the genes of the parent. There’s 1 chance in 50 that a mutation occurs, in which case we randomly select a gene from the alphabet to add to the genome. Otherwise, we use the parent’s gene.
At the end, the constructed child is returned, ready to have its fitness judged.
Now that we have those two important functions, the rest of the program is straightforward. First, we construct a completely random starting point:
> parent = [random.choice(alphabet) for _ in target]
Our Adam. Let’s put him and his fitness on display:
> generation = 1 > print "%d %s (%d)" % (generation, parent, fitness(parent))
The generation variable will keep track of how many generations have passed.
We’re just about ready to enter our reproduction cycle now. Every generation, our parent will have one child, and if this child is fitter than his parent, this child will be the next parent. Otherwise it is mercilessly discarded and the parent will parent the next generation as well.
> while True: > child = reproduce(parent) > generation += 1 > > child_fit, parent_fit = fitness(child), fitness(parent) > > if child_fit > parent_fit: > parent, parent_fit = child, child_fit > > print "%d %s (%d)" % (generation, parent, parent_fit) > > if parent_fit == len(target): > break
As you can undoubtedly tell, this loop will exit once the fitness of the mutating string equals the length of the target; that is, when both strings are equal.
The string "METHINKS IT IS LIKE A WEASEL" is 28 symbols long. With a 27-symbol alphabet, a random search would take on average 2728 / 2 generations to match it. That's 5,986,257,591,281,009,894,301,370,013,358,523,552,840 generations. Even if we're doing a trillion1 generations a second, that would take 189 million trillion2 years to finish.
Our little program will be doing a bit better than that. The final generation number will be displayed before the final generation genome, of course, but let's rub it in again just to be sure:
> print "Finished! Target reached in %d generations!" % generation
Running it a thousand times, I got an average of 8012.58 generations. Suck it, creationists. Descent with modification and selection really is faster. Just like the last time. And every other fucking time.
And as promised, the full code is here. Just save that somewhere and double-click it to run it (you'll need to chmod +x on sane platforms, but you know that).
And that really should be that. Dembski can wave giant-sleeve-clad arms about free lunches all he likes, but in the real world, not everyone is innumerate. It’s just sad that decent people have to waste time on his bullshit.
I doubt this will actually do much good (of course it won’t; even the creationists themselves (exceptionally dense specimens excepted, as usual) realise this time that there’s no controversy here, just a giant heap of time-wasting nonsense), but if nothing else, I hope I’ve demonstrated even an elementary school student could do this. If the original code is of any interest at all, it’s because of archaeological reasons, because old code is usually interesting, not because the algorithm is that fascinating or complicated.
1 Short scale. 1012.
2 1018. Yes, I know the proper name for that is a quintillion in the short scale and a trillion in the long scale.
Elp said,
August 30th, 2009 at 3:00 am
5964 generations for me. I find this number very… conceivable.
Haxus the Lesser said,
August 31st, 2009 at 7:27 pm
It seems rather counter-intuitive to me that python doesn’t come with a way of doing literate programming. Maybe Guido thinks it’s literate enough?
Dembski is an awesome troll though, he always makes me laugh.
Anonymous said,
May 2nd, 2010 at 8:20 pm
Thank you. I have now lost faith in humanity, theist and atheist alike. I simply cannot comprehend what pushes people to spawn and agree to such petty arguments on either side of the fence.
Human intelligence is dead. ;_;