ISBN and EAN
My social and academic environments aren’t exactly intellectually stimulating, so I get most of the programming problems I fill my days with—and of which the ones that are the most fun to talk about end up here—from books I read. Since I’ve already read every interesting sciencey non-fiction book available in Leuven, I’ve mostly been reading fiction lately, which doesn’t exactly inspire interesting algorithms, which is why I haven’t been bloggering as much.
In an effort not to let my programming skills get too rusty, I decided to write a thing that validates and parses ISBNs, extracting the publisher information and other things that are supposed to be in ISBNs. This turned out to be annoyingly non-trivial, so instead I’m just going to write about the numbers themselves.
As you probably know, ISBNs are a book numbering scheme standardised by ISO in 1970 (as ISO 2108), based on an earlier 9-digit scheme (SBN) used in the UK. It had ten digits until recently (January 2007), when it was expanded to 13. I assumed the expansion was because they were running out of numbers (which they were), but I also noticed every 13-digit ISBN started with 978, which was odd.
Old ten-digit ISBNs consist of a group identifier, which mostly identifies the language the work is in and is of variable length (it’s a prefix code,0 to avoid ambiguity; the 9-digit SBNs ISBN is based on didn’t have a group identifier, but prepending a 0 to them (one of the codes for English-language works) turns them into valid ISBNs), followed by a publisher code (again of variable length), followed by an item identifier, followed by a single check digit, used to make sure the other numbers were entered properly.1
New thirteen-digit ISBNs are basically the same thing with 978 prepended, and the check digit is calculated differently.
So hey, this doesn’t expand the number space. What’s the deal?
The deal turns out to be EAN, or European Article Numbers.
EANs are similar to North-American UPCs, with which they are compatible. It’s a barcoding technology intended to help track items in stores. UPC numbers are twelve digits long, and EANs thirteen.2
EANs start with a two- or three-digit GS1 prefix, which is basically a country code. Somewhere along the way someone realised that books are things that are sold too, and books have ISBNs, and let’s not waste a lot of disk space storing two numbers when one will do, so the GS1 prefix 978 was created, for Bookland, the magical land where all books are printed.
Because someone had the foresight to realise ISBN would run out of numbers eventually, they also reserved 979, and since the last digit of an EAN is also a checksum digit, people didn’t want to maintain two different methods of computing checksums, and the 13-digit ISBN was created. All of the old ISBNs map to new ones seamlessly, and new ones will mostly continue to be allocated in area 978 until that’s full, which is why 978 numbers are still by far the most common ISBN-13s.3
The term Bookland is now considered deprecated because people are boring twats and GS1 prefixes stopped being country codes and started being organisation codes, and 978 and 979 are registered to the International ISBN Agency, but it’s a cute bit of trivia.
Anyway, because I don’t want this post to be entirely worthless, here’s a tiny script that takes a 9-digit SBN or 10-digit ISBN as input and produces the new 13-digit equivalent.
(Incidentally, that image is the ISBN for Karl Popper’s Logik der Forschung. It should not be taken as an endorsement of that tedious asshole’s work, but rather as laziness on my part, because it’s the first picture in the Wikipedia article on ISBN.)
0 Meaning that no valid code is the prefix of another valid code. Like in Huffman coding.
1 Wikipedia claims it’s a modulo 11 affair, with X substituting for 10, but I don’t think I’ve ever actually seen X as a check digit. I’ll admit I haven’t been paying a lot of attention, though.
2 EAN-13, at least, which is the most common. There are others, but I’ve never seen them used. Apparently EAN-8 is common on cigarettes.
3 Something analogous happened with periodicals and their ISSN, with Unique Country Code 977, but that story is a bit more complicated because ISSNs are only eight digits long.

It is genuinely true that, if you measure the total variation in the human species and then partition it into a between-race component and a within-race component, the between-race component is a very small fraction of the total. Most of the variation among humans can be found within races as well as between them. Only a small admixture of extra variation distinguishes races from each other. That is all correct. What is not correct is the inference that race is therefore a meaningless concept. This point has been clearly made by the distinguished Cambridge geneticist A. W. F. Edwards in the recent paper called ‘Human genetic diversity: Lewontin’s fallacy’. R. C. Lewontin is an equally distinguished Cambridge (Mass.) geneticist, known for the strength of his political convictions and his weakness for dragging them into science at every possible opportunity. Lewontin’s view of race has become near-universal orthodoxy in scientific circles. He wrote, in a famous paper of 1972:
The obvious problem is that installing Vista on powerful hardware and putting people in front of it for five minutes isn’t the same thing as making people pay for that hardware (and for the OS), and then having them use it for a few weeks. What’s being evaluated here is the at-a-glance flashy wank, not the actual OS.





The example Akerlof used was of the used car market. Suppose that there are crappy used cars (“lemons”) worth $2,000, high-quality used cars worth $6,000, and everything in between, and that the buyer cannot reliably tell the difference between them before buying them.