Forced indentation of Vigenère
The Vigenère cipher is a more interesting cipher, in that it uses an actual key, and isn’t trivial to bruteforce. Central to the Vigenère cipher (and, in fact, most classical substitution ciphers) is the tabula recta:

Suppose that you have a message to encrypt starting with “Proletarians of Europe! The war has lasted more than a year.”, and a key “secret”. To encrypt your message, you line the plaintext and the ciphertext up like this:
PROLETARIANSOFEUROPETHEWARHASLASTEDMORETHANAYEAR
SECRETSECRETSECRETSECRETSECRETSECRETSECRETSECRETAnd then you just look up vertical pairs in the table. The first letter to encrypt is P, so you go to the row marked P. The first letter to encrypt it with is S, so you look in the column S of the row P. The resulting ciphertext letter is H. You repeat this for the rest of the letters.
The resulting ciphertext will be:
HVQCIMSVKRRLGJGLVHHIVYIPSVJRWESWVVHFGVGKLTFEAVEKDecrypting is the reverse of this: you go to the column corresponding to the key letter, find the letter H in that column, and then look at what row that letter is in.
Implementing this in Python turns out not to be very hard.
I actually constructed the tabula recta as a lookup table, mostly to see how hard it would be, but you don’t, of course, have to do that. All you need is a way to turn letters into numbers (A = 0, B = 1, &c.) and back, and encryption is as easy as:
ciphertext_letter =
to_letter((to_number(plaintext_letter) + to_number(key_letter)) % 26)
Decryption just changes the + to a -.
This is also how I constructed the tabula recta. The to_letter function is easily accomplished by treating a string of the letters in alphabetical order as an array, and the for the to_number function I just used a reverse dictionary lookup of the same.
The tabula recta then takes literally three lines of code:
lookup = dict([(a, 0) for a in alpha])
for a in lookup:
lookup[a] = dict([(b, alpha[(tonum[a] + tonum[b]) % 26]) for b in alpha])
You really don’t want to know how much longer that would be in Java.
A sample session:
$ ./vigenere.py
[E]ncrypt or [D]ecrypt? E
Phrase to encrypt? Proletarians of Europe! The war has lasted more than a year.
Strip whitespace and punctuation? (Y/n) Y
Key? secret
hvqcimsvkrrlgjglvhhivyipsvjrweswvvhfgvgkltfeavek
$ ./vigenere.py
[E]ncrypt or [D]ecrypt? D
Phrase to decrypt? hvqcimsvkrrlgjglvhhivyipsvjrweswvvhfgvg kltfeavek
Key? secret
proletariansofeuropethewarhaslastedmorethanayear
The Vigenère cipher is less easy to crack than the Caesar cipher, but not much.
If you know the length of the key, the ciphertext simply turns into so many Caesar-shifted ciphertexts, which may be subjected to frequency analysis. Though of course, we’ve already seen how great that turns out if you don’t have a massive amount of text to analyse.
If you don’t know the length of the key, but do have a huge ciphertext, you can look for repeating clusters of letters, which are likely to be bits of plaintext that happen to repeat at intervals equal to some multiple of the key length. If you have a bunch of different ones, you can find the largest common factor of the intervals, and this will likely be the key length. After this you can just use frequency analysis again. This method is known as the Kasiski examination, after the Prussian Friedrich Kasiski (though Charles Babbage also came up with it).
These things aren’t that hard to implement, but also not that interesting.
It should be clear, though, that the Vigenère cipher is a pretty weak one (it certainly doesn’t deserve the name «le chiffre indéchiffrable»), which is ironic, given that Blaise de Vigenère never actually came up with it, and the one he did come up with was actually quite a bit stronger.
Vigenère’s autokey cipher uses the same basic principle of the tabula recta, but instead of a repeating key, it’s going to use a short key to start with, and then for the rest of it use the plaintext.1 Vigenère himself suggested using a single letter of the alphabet for the key, but that’s pretty stupid. Let’s instead use a full word, like so:
PROLETARIANSOFEUROPETHEWARHASLASTEDMORETHANAYEAR
SECRETPROLETARIANSOFEUROPETHEWARHASLASTEDMORETHAAgain, not that hard to implement.
Rather than repeating, the key is now just the provided key plus the ciphertext, which saves a modulo operation in the lookup.
Decryption has an extra step, because every deciphered letter has to be added to the key while the decryption is in progress, but on the whole, it’s very similar to the other “Vigenère” cipher.
Breaking it is rather harder than breaking the other one, though. Frequency analysis isn’t really going to work, though if you use a single letter for the key, like Vigenère suggested, bruteforcing is trivial, and our own example is vulnerable to a dictionary attacks. Both of those are attacks against the key, though, not the cipher.
There are some relatively straightforward attacks possible, one of which is described in the Wikipedia article, and I’m working on those now.
More to come.
1 You’ll remember that this basic principle also returns in a slightly different form in certain block cipher modes of operation in modern cryptography. It’s also used for some types of self-synchronising stream ciphers.
steve said,
September 3rd, 2008 at 8:20 pm
You should do a post on the Hill cipher.
Cairnarvon said,
September 5th, 2008 at 3:40 am
I haven’t done matrix multiplications in years. I’ll see what I can come up with.