2011 in books
Fine, let’s do this. Last year’s entry is here, though both it and that of the year before are still on the front page, too.
Apparently I finished 64 books in 2011. Specifically, these:
Fine, let’s do this. Last year’s entry is here, though both it and that of the year before are still on the front page, too.
Apparently I finished 64 books in 2011. Specifically, these:
So that thing I hinted at earlier is about as done as I care to get it.

Officially the point of this project was for the school to have something with which to replace ClusterKnoppix for their Besturingssystemen II class, but really I just wanted to have something nicer to make use of my ever-growing pile of old computers, which is why I finished it. The README explains. (HPC there stands for “high-performance computing” rather than “Hasty Pudding cipher”.)
What I need now is people to test it, ideally by building images and then trying them.
If you’d rather not put that kind of effort in, I’ve also pre-built an image (291 MB). It doesn’t come with X to save space, but it does come with (what else?) this Open MPI tripcode finder I wrote a while ago. It’s not particularly fast, but it reports its progress if you poke it with SIGUSR1 (as in pkill -USR1 tripfind).
The unprivileged user is called gjs, and the password for both him and root is t. I’ve also included the MPI-patched JtR tarball in the home directory for you to build, if that’s less pointless.
Feedback appreciated, even if you don’t find any problems. If you find this project useful at all, or if you have any suggestions, I’d like to hear about it.
#!/usr/bin/python2 import sys import bz2 def classify(text, langs=('english', 'german', 'french')): results = {} for lang in langs: with open(lang + '.txt') as f: corpus = f.read() compressed = len(bz2.compress(corpus)) results[lang] = len(bz2.compress(corpus + text)) - compressed return sorted(results, key=results.__getitem__) if __name__ == '__main__': print "Most likely %s." % classify(sys.stdin.read())[0].capitalize()
$ wget -qO - http://www.gutenberg.org/ebooks/31469.txt.utf8 | ./classific.py
Most likely English.
$ wget -qO - http://www.gutenberg.org/ebooks/22367.txt.utf8 | ./classific.py
Most likely German.
$ wget -qO - http://www.gutenberg.org/ebooks/4968.txt.utf8 | ./classific.py
Most likely French.
If you were a USB/PXE-bootable Linux distro for HPC, what kind of features would you want to have?
Let’s do this while I’m still sort of sober.
Looks like I didn’t blog a lot this year. Last year’s book post is still on the front page. Anyway, I didn’t finish quite as many books this year as I did then, but I surpassed my annual target of fifty; I finished fifty-eight:
Alright, we’ve covered search trees in some detail, and they work great for problems where we have clear states and rules of production to move from one state to the next. Sometimes that’s not a very convenient way to state a problem, though, and a more natural way to think about things is as a bunch of variables which can take values in a certain domain, and a number of constraints which describe the relationships of these variables to each other.
The canonical example here is Dijkstra’s eight queens problem. However, that’s been done to death, so let’s instead have two queens and seven knights, and instead of the usual 8×8 chess board, let’s have a 6×6 one.

But people are fucking morons upon whom democracy is wasted.
Yes, we had federal elections yesterday. I was summoned as a bijzitter, which meant I had to get up at 7 to help open a polling station and then put 1,242 stamps onto 1,242 ballots (621 each for Kamer and Senaat) and about five hundred on as many election summons (compulsory voting, dontchaknow), and I didn’t even get to keep the stamp afterwards. Then at the end we all got to wait around for two more hours because apparently the problem with selecting members of the public to help manage elections is that the vast majority of the public is terminally innumerate, so our ballot count was off by one.
It eventually got resolved, but the upshot of this is that I was too tired to bother with writing this post last night, when the votes were counted (we count more quickly than the Americans or the British or even the Dutch).
You may be wondering why we’re having federal elections in 2010 when the last time I complained about federal elections was in 2007. A lot of countries have been having elections lately, but Belgium’s wasn’t planned for another year.
The reason for that is that Alexander De Croo felt he wasn’t getting enough media attention, so he took his ball (the Flemish liberal party) and went home, thereby imploding the Leterme government for, what, the eighth time now? Even the King couldn’t pretend there was no problem this time, so in the good tradition of Christian democrat majority governments, emergency elections had to be called.0
You can find all results here, as usual, and as usual I don’t know how long they’ll last, so I’ll steal some graphics.
Everyone knows BBCode is a pain to work with, and while WordPress supports limited HTML in user comments, it should be obvious HTML is no better. The unnecessary repetition of SGML-based languages and the insistence on the proper nesting of tags makes them all hideous and unnecessarily error-prone. We can do better.
The discussions of learned societies on the subject have been less than satisfactory, so I decided to just implement my own mark-up language, based on the venerable S-expression:
{b This} is {i {u expert} {o mark-up}}.
This will turn into:
This is expert mark-up.
The immediate effect is that nesting problems and text redundancy immediately disappear. The syntax also lends itself to easy function composition:
{b.i.o.u EXPERT}
EXPERT
Finally, for this first version,0 we also support function iteration:
{sup*3 To the moon}{sub*3 and back.}
To the moonand back.
It goes without saying this can be combined with function composition in arbitrarily complex expressions, with the iteration operator having a higher precedence than the function composition operator.
I’ve elected to use curly braces rather than the more typical parentheses, because curly braces barely see any use in natural language, which is where this mark-up would generally be used. If you do need literal curly braces, you can escape them with a backslash (and if you need a literal \{, you can escape your backslash with a backslash).
As a proof of concept, and because I eat my own dog food, I’ve written (and enabled) a WordPress plugin that enables this SexpCode in blog comments. For sanity, iteration doesn’t go beyond *3. Supported tags are b, i, u, s, o, sub, sup, code, spoiler, quote, blockquote, and m. If you want to use it yourself, adding more tags or changing their definitions should be straightforward.
Trying to use an unsupported or empty tag, or having unbalanced braces (except for closing braces at the end), will assume you’re actually trying to post C-like code, and disable SexpCode for your comment.
Ladies and gentlemen, BBCode was our COBOL. This is our Lisp.
Edit: People who want to implement this themselves should be following this document rather than this post.
Edit again: Play with it!
0 Future versions of the language are expected to add support for function arguments (for things like url, img, and colour) and the ability to define aliases (for example, {define exp b.i.o.u}, which would let you use a new exp function as if it were b.i.o.u).
I’ve written this, so I might as well share it.
In my post on the Mandelbrot set earlier, I mentioned the Julia sets of the quadratic polynomial fc(z) = z2 + c where c is a given (constant) complex number and z are the points of the complex plane. Because I wanted to visualise how those Julia sets changed as c varied, I’ve written a short program to do that for me.
You can find it here. As usual, you’ll need Allegro, and the compilation instruction is on the first line.
What it does is take two complex numbers as parameters, plus the number of steps it should take to go from the first to the second. At each step, it will calculate and display the Julia set of the quadratic polynomial with that complex number as c, and hopefully your computer is fast enough that the successive Julia sets look like an animation.
For example, if you invoke it as:
./julia -0.8 -1 -0.8 1 200
you’ll see the following:

Though probably not at the same speed. I’ve made no effort to maintain a certain frame rate; the whole thing moves as quickly as your CPU can keep up, because I just wanted a visualisation of how Julia sets change, not a screensaver. If it’s moving too slowly for you, you can try reducing the number of steps, or lowering the numbers in the ZOOM or ITERS #defines, though the first one will make the image smaller and the second will make it darker. If you aren’t interested in the window title, you can also remove the snprintf and set_window_title steps for a significant speed-up.
If it’s too fast, you can do the reverse, or you can build in a delay with Allegro’s install_timer and rest, or POSIX’s usleep or nanosleep.
(Once it’s done, it will just pause at the last Julia set. Press any key to close it. If you want to close it before it’s done, you’ll have to kill it manually.)
The interesting points to explore are the ones inside the Mandelbrot set, as anything else will just be Fatou dust (though you’ll probably still be able to see it because of the grey). For those points, the salient area is the one within 2 unit lengths of the origin, which is why the field displayed ranges from (-2, 2) in the top left corner to (2, -2) in the bottom right (or probably (-2, -2) to (2, 2), I don’t remember). If you need a bigger plane, replace all instances of ZOOM * 4 with ZOOM * (bigger number), and all instances of ZOOM * 2 with ZOOM * (half of bigger number) (if you want to keep the origin in the center of the window).
If you actually want to save the animations, man 3alleg save_bitmap and assemble the images yourself in something like the GIMP. I initially started out doing it this way, but animated GIFs get really big really quickly, so I went with this instead.
Enjoy.
People who take an active interest in AI are quite unlikely to have very many friends, so it should come as no surprise that trying to get computers to play games has always been a popular subfield of AI. Traditionally that game has mostly been chess, but I feel chess has a grinding tedium to it, so we’re going to look at tic-tac-toe instead, because that at least has the benefit of being over quickly.

I was bored, so I made this.

Basic introduction to the Mandelbrot set and what this image represents follows.
Last time we looked at how to solve the eight puzzle using the hill climbing algorithm, which gave us a result much more quickly than a blind depth-first search did, but we wondered if the solution we found was the best we could do, and we asked if there was a way to use heuristics to find not just a solution, but the best solution. Today, we’ll see that there is, and it’s actually really straightforward.
(This post assumes you read the previous one.)
Today we’ll be looking at the hill climbing algorithm, which is just a plain old depth-first search with heuristics added.
“Heuristics” is a fancy word (from the Greek εὑρίσκω, “I discover”) for a very simple concept. In the context of search trees, it simply means that at a every node, you’re going to look at each possible branch, and take the one that looks the most promising first, instead of just one at random. “Most promising” can be a tricky concept, though.0
Our river-crossing example isn’t necessarily the best one to demonstrate the concept, so let’s go with another classic: the 8 puzzle.

A decent proportion of my readers are noobie programmers or people who aren’t in a position to receive a formal CS education, so I thought I’d cover the basics of a fundamental concept most people cover in their first semester of algorithms or AI today: search trees. The fact that my college considers this to be third-year material so advanced they cannot in good faith make the class compulsory is neither here nor there.
Consider the famous problem of the farmer who wants to cross a river with his fox, goose, and grain, though the only boat can only carry himself and one of these three possesions. Ignore for a moment why a farmer would own a fox, and let’s stretch credibility a bit more by assuming that while the fox and goose are well-trained enough not to wander off in the absence of the farmer, they are not trained not to eat the goose or the grain, respectively, in said absence. How can he safely get to the other side without losing his goose or grain?

Having finished another popsci book on chaos theory recently (Ian Stewart’s Does God Play Dice?), I thought it’d be an interesting exercise to visualise the Lorenz attractor, and since it’s been a while since I’ve done anything new in programming, to take the opportunity to get into Xlib, the X Window System C library. Results aren’t very encouraging.
I mean, I got something to work easily enough, but any attempt at introducing color beyond black and white for clarity fails miserably and in non-deterministic ways. Eventually I gave up and redid it using something I know.
Compare:

(It’s prettier animated, so do compile the code yourself and see.)
In both cases, the screen represents the Cartesian plane (X-axis horizontal, Y-axis vertical, origin right in the center; one unit is ten pixels). In the Xlib version (left) the Z component is ignored entirely (so it’s really a projection of the attractor onto the Cartesian plane), in the Allegro version (right) some attempt at representing it using shades of gray has been made, with z=0 being black and z=55 being white (though because it is drawn with no real care, it will happily scribble dark lines over light ones if it has to).
You can mess with the variables and starting condition to see how it behaves, or swap around some Xs and Ys and Zs to get different angles, and at least in the Allegro version, messing with color is trivial enough.
Which brings me to my question: does anyone know of decent introductions to Xlib? The Internet is full of tutorials, and as usual, all of them seem to suck. I know Xlib isn’t really supposed to be used directly, but I want to.
It’s probably unlikely that I’m going to finish another book in the five and a half hours left in 2009, so I’m going to post this while we’re waiting for snacks.
I don’t have any bad habits, so my only New Year’s resolution tends to be to read fifty books in a year. Last year I didn’t quite make it, but apparently I more than made up for it this year. I finished eighty-eight:
So I guess I only need 46 next year to maintain my average.
Reviews for a significant number of those can be found on the Facebook (or here, which is the same place). I’d use LibraryThing, but you need a paid account to have more than two hundred books on it, and I don’t trust them enough to give them credit card information.
If someone wants to send me money for it, though, you know my Paypal address.
Highlights:
Best fiction: probably Salman Rushdie’s Midnight’s Children, though Margaret Atwood is a good writer.
Worst fiction: Rand, obviously. CS Lewis is a close second. Even Neal Stephenson isn’t that shit.
Best non-fiction: Nothing earth-shattering this year. I guess John Allen Paulos’s A Mathematician Reads the Newspaper was pretty good.
Worst non-fiction: Mary Midgley’s Myths We Live By. I’m not sure it even deserves to be called non-fiction. Runner-up goes to Gould’s The Mismeasure of Man.
My social and academic environments aren’t exactly intellectually stimulating, so I get most of the programming problems I fill my days with—and of which the ones that are the most fun to talk about end up here—from books I read. Since I’ve already read every interesting sciencey non-fiction book available in Leuven, I’ve mostly been reading fiction lately, which doesn’t exactly inspire interesting algorithms, which is why I haven’t been bloggering as much.
In an effort not to let my programming skills get too rusty, I decided to write a thing that validates and parses ISBNs, extracting the publisher information and other things that are supposed to be in ISBNs. This turned out to be annoyingly non-trivial, so instead I’m just going to write about the numbers themselves.
As you probably know, ISBNs are a book numbering scheme standardised by ISO in 1970 (as ISO 2108), based on an earlier 9-digit scheme (SBN) used in the UK. It had ten digits until recently (January 2007), when it was expanded to 13. I assumed the expansion was because they were running out of numbers (which they were), but I also noticed every 13-digit ISBN started with 978, which was odd.
Old ten-digit ISBNs consist of a group identifier, which mostly identifies the language the work is in and is of variable length (it’s a prefix code,0 to avoid ambiguity; the 9-digit SBNs ISBN is based on didn’t have a group identifier, but prepending a 0 to them (one of the codes for English-language works) turns them into valid ISBNs), followed by a publisher code (again of variable length), followed by an item identifier, followed by a single check digit, used to make sure the other numbers were entered properly.1
New thirteen-digit ISBNs are basically the same thing with 978 prepended, and the check digit is calculated differently.
So hey, this doesn’t expand the number space. What’s the deal?
The deal turns out to be EAN, or European Article Numbers.
EANs are similar to North-American UPCs, with which they are compatible. It’s a barcoding technology intended to help track items in stores. UPC numbers are twelve digits long, and EANs thirteen.2
EANs start with a two- or three-digit GS1 prefix, which is basically a country code. Somewhere along the way someone realised that books are things that are sold too, and books have ISBNs, and let’s not waste a lot of disk space storing two numbers when one will do, so the GS1 prefix 978 was created, for Bookland, the magical land where all books are printed.
Because someone had the foresight to realise ISBN would run out of numbers eventually, they also reserved 979, and since the last digit of an EAN is also a checksum digit, people didn’t want to maintain two different methods of computing checksums, and the 13-digit ISBN was created. All of the old ISBNs map to new ones seamlessly, and new ones will mostly continue to be allocated in area 978 until that’s full, which is why 978 numbers are still by far the most common ISBN-13s.3
The term Bookland is now considered deprecated because people are boring twats and GS1 prefixes stopped being country codes and started being organisation codes, and 978 and 979 are registered to the International ISBN Agency, but it’s a cute bit of trivia.
Anyway, because I don’t want this post to be entirely worthless, here‘s a tiny script that takes a 9-digit SBN or 10-digit ISBN as input and produces the new 13-digit equivalent.
(Incidentally, that image is the ISBN for Karl Popper’s Logik der Forschung. It should not be taken as an endorsement of that tedious asshole’s work, but rather as laziness on my part, because it’s the first picture in the Wikipedia article on ISBN.)
0 Meaning that no valid code is the prefix of another valid code. Like in Huffman coding.
1 Wikipedia claims it’s a modulo 11 affair, with X substituting for 10, but I don’t think I’ve ever actually seen X as a check digit. I’ll admit I haven’t been paying a lot of attention, though.
2 EAN-13, at least, which is the most common. There are others, but I’ve never seen them used. Apparently EAN-8 is common on cigarettes.
3 Something analogous happened with periodicals and their ISSN, with Unique Country Code 977, but that story is a bit more complicated because ISSNs are only eight digits long.
I’m in a class called Netwerkbeheer (Network Management), which spans two semesters and is a transparent excuse to peddle CCNA certifications. As a result, I spend a lot of time playing with Cisco routers and switches, and one of the many, many things that annoy me about Cisco’s IOS is their cavalier attitude towards security and cryptosystems. A particularly egregious example of this is Cisco’s type 7 encryption.
If you’ve ever configured a Cisco router, you’ve probably encountered it. When the misleadingly named service password-encryption is running, setting a password with the enable password command “encrypts” the password, so that when you issue the show running-config command, you’ll see a line like
enable password 7 08314940000A
instead of the plaintext password, which you’d see if the so-called “password-encryption” was turned off.
Type 7 “encryption” manifests itself in a few other places, including in FTP passwords and various routing protocol authentication passwords.
Type 7 has been known to be broken for a decade and a half now,0 but people continue to use it, almost always for bad reasons.1,2 To drive home just how broken type 7 is, let’s look at it in detail.
The general form of the type 7 “ciphertext” is (0[0-9]|1[0-5])([0-9A-F]{2})+. Some experimenting finds that the length of the “ciphertext” is always twice the length of the plaintext, plus two. Can you guess why?
The “encryption” key is always a number in the range 0-15, which would be easy enough to bruteforce, but that turns out to be unnecessary, since it’s provided (in decimal form) as the first two characters of the “ciphertext”.
That key determines the starting point in a table of twenty-six secondary keys (which, incidentally, is dsfd;kfoA,.iyewrkldJKDHSUB; I don’t know why the table has 26 entries instead of 16), which are XORed in turn with the characters in the plaintext. If the key is, say, 7, the first character in the plaintext is XORed with the seventh character in the table, the second character in the plaintext is XORed with the eighth character in the table, the third with the ninth, &c.
Each resulting character is then converted to two hexadecimal digits (the input can only be ASCII, of course) and appended to the ciphertext.
And that’s seriously all there’s to it. The result is a “cipher” that’s either slightly less or slightly more secure than writing out your passwords in permanent marker on the outside of the door of the server room, depending on how you manage your configuration files.
Because I know this is going to be an issue at some point, I’ve written a simple utility that encrypts and decrypts passwords using type 7, which you can find here.
You’d think this would be a moot point because people should realise their configuration files are sensitive information, but people are, of course, idiots. In that sense, type 7 isn’t just worthless, but actively harmful, because it gives people a false sense of security.
0 http://insecure.org/sploits/cisco.passwords.html
1 The original intent of type 7 was apparently to foil shoulder-surfers, who might see your configuration file as it scrolls by on your screen. Cisco’s official stance (now) is that if security is an issue, the router configuration file itself should be treated as vulnerable data, not just the passwords that may or may not be displayed in it. That would be fair enough, if it wasn’t at odds with Cisco’s default way of saving and loading configuration files, which is through plain TFTP over the regular network, with no options for encryption of either the config or the passwords themselves. But, you know.
(The claim that type 7 is so weak because the router has to be able to reverse it is bullshit, of course. At most it’s true for PAP authentication, but anyone who considers PAP passwords secret information has no business being anywhere near a router.)
2 Cisco themselves now advise against using it, instead suggesting people use type 5, which isn’t encryption, but just hashing with MD5. Which is also broken, of course. The CCNA materials also state that at least type 7 is “better than no encryption”, but I’d argue that it’s worse, because its security is equivalent to plaintext, while also giving idiot network admins the impression that it’s not.
I’m told a type 6 exists now, which is based on AES and supposed to be better. AFAIK our routers don’t support it, and I’m not holding my breath either way.