My social and academic environments aren’t exactly intellectually stimulating, so I get most of the programming problems I fill my days with—and of which the ones that are the most fun to talk about end up here—from books I read. Since I’ve already read every interesting sciencey non-fiction book available in Leuven, I’ve mostly been reading fiction lately, which doesn’t exactly inspire interesting algorithms, which is why I haven’t been bloggering as much.
In an effort not to let my programming skills get too rusty, I decided to write a thing that validates and parses ISBNs, extracting the publisher information and other things that are supposed to be in ISBNs. This turned out to be annoyingly non-trivial, so instead I’m just going to write about the numbers themselves.
As you probably know, ISBNs are a book numbering scheme standardised by ISO in 1970 (as ISO 2108), based on an earlier 9-digit scheme (SBN) used in the UK. It had ten digits until recently (January 2007), when it was expanded to 13. I assumed the expansion was because they were running out of numbers (which they were), but I also noticed every 13-digit ISBN started with 978, which was odd.
Old ten-digit ISBNs consist of a group identifier, which mostly identifies the language the work is in and is of variable length (it’s a prefix code,0 to avoid ambiguity; the 9-digit SBNs ISBN is based on didn’t have a group identifier, but prepending a 0 to them (one of the codes for English-language works) turns them into valid ISBNs), followed by a publisher code (again of variable length), followed by an item identifier, followed by a single check digit, used to make sure the other numbers were entered properly.1
New thirteen-digit ISBNs are basically the same thing with 978 prepended, and the check digit is calculated differently.
So hey, this doesn’t expand the number space. What’s the deal?
The deal turns out to be EAN, or European Article Numbers.
EANs are similar to North-American UPCs, with which they are compatible. It’s a barcoding technology intended to help track items in stores. UPC numbers are twelve digits long, and EANs thirteen.2
EANs start with a two- or three-digit GS1 prefix, which is basically a country code. Somewhere along the way someone realised that books are things that are sold too, and books have ISBNs, and let’s not waste a lot of disk space storing two numbers when one will do, so the GS1 prefix 978 was created, for Bookland, the magical land where all books are printed.
Because someone had the foresight to realise ISBN would run out of numbers eventually, they also reserved 979, and since the last digit of an EAN is also a checksum digit, people didn’t want to maintain two different methods of computing checksums, and the 13-digit ISBN was created. All of the old ISBNs map to new ones seamlessly, and new ones will mostly continue to be allocated in area 978 until that’s full, which is why 978 numbers are still by far the most common ISBN-13s.3
The term Bookland is now considered deprecated because people are boring twats and GS1 prefixes stopped being country codes and started being organisation codes, and 978 and 979 are registered to the International ISBN Agency, but it’s a cute bit of trivia.
Anyway, because I don’t want this post to be entirely worthless, here‘s a tiny script that takes a 9-digit SBN or 10-digit ISBN as input and produces the new 13-digit equivalent.
(Incidentally, that image is the ISBN for Karl Popper’s Logik der Forschung. It should not be taken as an endorsement of that tedious asshole’s work, but rather as laziness on my part, because it’s the first picture in the Wikipedia article on ISBN.)
1 Wikipedia claims it’s a modulo 11 affair, with X substituting for 10, but I don’t think I’ve ever actually seen X as a check digit. I’ll admit I haven’t been paying a lot of attention, though.
2 EAN-13, at least, which is the most common. There are others, but I’ve never seen them used. Apparently EAN-8 is common on cigarettes.
3 Something analogous happened with periodicals and their ISSN, with Unique Country Code 977, but that story is a bit more complicated because ISSNs are only eight digits long.