There’s been some interest in tripcode crackers lately, so I thought I’d write on in Haskell. I mentioned this before, but I’ve improved it a bit since.
I’ll be discussing the code step by step in this post. By the end, we should have a working application that takes a POSIX regex as an argument, and then outputs tripcodes that match it.
If everything goes right, this post should be literate Haskell, but I can’t promise that it’ll actually work, what with WordPress being what it is.
Let’s get started.
The tripcode algorithm is relatively straightforward: the input is converted to SJIS, there are some XML character entity substitutions, then a salt is calculated, and the whole thing is passed to Unix crypt.
We won’t be dealing with SJIS conversions, since our input will be ASCII only, which (with one exception) is a subset of SJIS anyway, and our target board will probably be Shiichan or Futallaby anyway, neither of which even does it. We also won’t be writing our own crypt implementation in Haskell, so we’ll have to get it from a C library. To do this, we use the Foreign Function Interface language extension, like so:
> {-# LANGUAGE ForeignFunctionInterface #-}
The usual boilerplate:
> module Main (main) where
> import Char (chr, ord)
> import Data.List (inits)
> import Foreign.C
> import System (getArgs, exitFailure)
> import System.IO.Unsafe (unsafePerformIO)
> import Text.Regex.Posix ((=~))
It’ll become clear why we need all of these as we go along.
Let’s import our C crypt:
> foreign import ccall unsafe "DES_crypt" crypt :: CString -> CString -> CString
We’re using OpenSSL’s implementation, because GNU crypt is slow as fuck, and this thing is going to be slow enough as it is. That bit of code is just saying to take a C function named DES_crypt from a linked library, and expose it as a Haskell function named crypt, with the listed type signature.
We said the tripcode algorithm involves some XML character entity substitutions, so let’s write a function to do that. The canonical algorithm just escapes ", <, and >. If yours escapes more (or fewer), just add (or remove) them here.
> xmlescape :: String -> String
> xmlescape [] = []
> xmlescape (x:xs) = case x of
> '"' -> (++) """ $ xmlescape xs
> '<' -> (++) "<" $ xmlescape xs
> '>' -> (++) ">" $ xmlescape xs
> otherwise -> (:) x $ xmlescape xs
Straightforward enough.
Next, we’ll need a function to generate the salt. crypt‘s salt is string of length 2 whose characters are in the range [a-zA-Z0-9.]. The tripcode function obtains this by appending H.. to the input and taking the second and third characters, performing some transformations to ensure they’re in the allowed range:
> salt :: String -> String
> salt t =
> map f . take 2 . tail $ t ++ "H.."
> where
> f c
> | notElem c ['.'..'z'] = '.'
> | elem c [':'..'@'] = chr $ ord c + 7
> | elem c ['['..'`'] = chr $ ord c + 6
> | otherwise = c
Now we’re ready for the actual tripcode.
This will happen in the IO monad, because we’re dealing with the FFI. We don’t have to use unsafePerformIO to escape from it, but to keep our algorithm conceptually cleaner, we will anyway.1
> tripcode :: String -> String
> tripcode tr = unsafePerformIO $ do
> trip <- newCString t
> salt <- newCString $ salt t
> peekCString (crypt trip salt) >>= return . drop 3
> where t = xmlescape tr
Great. Now that we have everything we need to calculate the tripcode for a given input, we need a way to generate inputs.
Let’s first start by specifying the characters we want to allow. We’re leaving out # and ! because they’re usually separator characters and as such illegal in tripcodes (though most boards will allow !), and \ because that’s the aforementioned ASCII/SJIS exception.
If you’re targetting a specific algorithm and you know which characters are fine and which aren’t, you can edit this. For Shiichan, for instance, the only forbidden character is #.
> allowedChars :: [Char]
> allowedChars = filter (\c -> c `notElem` "#!\\") [' '..'~']
You can avoid the call to xmlescape altogether by disallowing characters that would be escaped, of course; that’s what I did for my C implementation, too. That will obviously reduce your search space, though.
Now we can use these characters to generate our (infinite) list of inputs:
> ins :: [String]
> ins = (inits . repeat) allowedChars >>= sequence
This will generate all possible combinations, from "" to strings of infinite length. Since crypt disregards input over eight characters wide, it will generate far more combinations than you’ll ever need. Since it will take almost forever to even get up to that point, though, the issue is kind of moot.2
Now that we have our infinite input list, we can turn this into the input/output combinations we need; first we’ll team up each input with its corresponding output with zip, and then we’ll filter it with our regular expression. Obviously, this is where Haskell’s laziness is really handy:
> tripPairs :: String -> [(String, String)]
> tripPairs regex = filter (\(a, b) -> b =~ regex) $ zip ins $ map tripcode ins
It’s cute that (=~) is used for regular expressions. It’s also handy that it takes a string as its second argument and we don’t have to dick around with compiling regular expressions and what have you.
(=~) actually has a pretty complicated type signature, but its return value is a polymorphic value that converts into a boolean without complaints, so we don’t have to worry about that.
So what do we have now? We have our tripcode function, our list of inputs, and a function that filters this list based on a regex argument. We’re pretty much done. Now we only need the main function:
> main :: IO ()
> main = getArgs >>= f
> where
> f [] = putStrLn " USAGE: tripcode [regex]" >> exitFailure
> f (arg:xs) = mapM_ (putStrLn . show) $ tripPairs arg
When no argument is passed on the command line, we display a short usage note and return failure, otherwise we generate our list of matching pairs, turn them into displayable strings, and then print them to stdout. Try it, it works!3
It’ll be slow as fuck, though,4 for a number of reasons. The first is that OpenSSL’s crypt, while much faster than GNU crypt, still isn’t very fast. The second is that Haskell’s Text.Regex.Posix is very slow (if you know how, I suggest you use another regex library; I went with Text.Regex.Posix because most casual Haskellers (which I am too) aren’t likely to have any of the others). The third is that it can only use one processor at a time, whereas most others are multithreaded or multiprocess affairs. The fourth is that a high-level language like Haskell is obviously going to be slower than tripcode crackers written in C and Sepples by moon people. The fifth is that I suck at Haskell.
Still, at least it’s written in the world’s leading fictional programming language.
Edit: Just to make it absolutely clear, because this post gets a lot of hits: you’d have to be an idiot to actually use this to look for tripcodes. Write your own cracker in C. It’s easy and will be much faster. (Or if you don’t want to, I wrote one that was posted elsewhere in the thread you got this link from.)