Rosio Pavoris a blog

Optimal search is easy

Last time we looked at how to solve the eight puzzle using the hill climbing algorithm, which gave us a result much more quickly than a blind depth-first search did, but we wondered if the solution we found was the best we could do, and we asked if there was a way to use heuristics to find not just a solution, but the best solution. Today, we’ll see that there is, and it’s actually really straightforward.

Read the rest of this entry »

Permalink 6 Comments

Heuristics are easy

(This post assumes you read the previous one.)

Today we’ll be looking at the hill climbing algorithm, which is just a plain old depth-first search with heuristics added.
“Heuristics” is a fancy word (from the Greek εὑρίσκω, “I discover”) for a very simple concept. In the context of search trees, it simply means that at a every node, you’re going to look at each possible branch, and take the one that looks the most promising first, instead of just one at random. “Most promising” can be a tricky concept, though.0
Our river-crossing example isn’t necessarily the best one to demonstrate the concept, so let’s go with another classic: the 8 puzzle.

Read the rest of this entry »

Permalink 1 Comment

Search trees are easy

A decent proportion of my readers are noobie programmers or people who aren’t in a position to receive a formal CS education, so I thought I’d cover the basics of a fundamental concept most people cover in their first semester of algorithms or AI today: search trees. The fact that my college considers this to be third-year material so advanced they cannot in good faith make the class compulsory is neither here nor there.

Consider the famous problem of the farmer who wants to cross a river with his fox, goose, and grain, though the only boat can only carry himself and one of these three possesions. Ignore for a moment why a farmer would own a fox, and let’s stretch credibility a bit more by assuming that while the fox and goose are well-trained enough not to wander off in the absence of the farmer, they are not trained not to eat the goose or the grain, respectively, in said absence. How can he safely get to the other side without losing his goose or grain?

Read the rest of this entry »

Permalink 4 Comments

Xlib hates me

Having finished another popsci book on chaos theory recently (Ian Stewart’s Does God Play Dice?), I thought it’d be an interesting exercise to visualise the Lorenz attractor, and since it’s been a while since I’ve done anything new in programming, to take the opportunity to get into Xlib, the X Window System C library. Results aren’t very encouraging.

I mean, I got something to work easily enough, but any attempt at introducing color beyond black and white for clarity fails miserably and in non-deterministic ways. Eventually I gave up and redid it using something I know.
Compare:

Lorenz attractor (X11) Lorenz attractor (Allegro)

(It’s prettier animated, so do compile the code yourself and see.)
In both cases, the screen represents the Cartesian plane (X-axis horizontal, Y-axis vertical, origin right in the center; one unit is ten pixels). In the Xlib version (left) the Z component is ignored entirely (so it’s really a projection of the attractor onto the Cartesian plane), in the Allegro version (right) some attempt at representing it using shades of gray has been made, with z=0 being black and z=55 being white (though because it is drawn with no real care, it will happily scribble dark lines over light ones if it has to).
You can mess with the variables and starting condition to see how it behaves, or swap around some Xs and Ys and Zs to get different angles, and at least in the Allegro version, messing with color is trivial enough.

Which brings me to my question: does anyone know of decent introductions to Xlib? The Internet is full of tutorials, and as usual, all of them seem to suck. I know Xlib isn’t really supposed to be used directly, but I want to.

Permalink 3 Comments

Cisco sucks at crypto

I’m in a class called Netwerkbeheer (Network Management), which spans two semesters and is a transparent excuse to peddle CCNA certifications. As a result, I spend a lot of time playing with Cisco routers and switches, and one of the many, many things that annoy me about Cisco’s IOS is their cavalier attitude towards security and cryptosystems. A particularly egregious example of this is Cisco’s type 7 encryption.
If you’ve ever configured a Cisco router, you’ve probably encountered it. When the misleadingly named service password-encryption is running, setting a password with the enable password command “encrypts” the password, so that when you issue the show running-config command, you’ll see a line like

enable password 7 08314940000A

instead of the plaintext password, which you’d see if the so-called “password-encryption” was turned off.
Type 7 “encryption” manifests itself in a few other places, including in FTP passwords and various routing protocol authentication passwords.

Type 7 has been known to be broken for a decade and a half now,0 but people continue to use it, almost always for bad reasons.1,2 To drive home just how broken type 7 is, let’s look at it in detail.

The general form of the type 7 “ciphertext” is (0[0-9]|1[0-5])([0-9A-F]{2})+. Some experimenting finds that the length of the “ciphertext” is always twice the length of the plaintext, plus two. Can you guess why?

The “encryption” key is always a number in the range 0-15, which would be easy enough to bruteforce, but that turns out to be unnecessary, since it’s provided (in decimal form) as the first two characters of the “ciphertext”.
That key determines the starting point in a table of twenty-six secondary keys (which, incidentally, is dsfd;kfoA,.iyewrkldJKDHSUB; I don’t know why the table has 26 entries instead of 16), which are XORed in turn with the characters in the plaintext. If the key is, say, 7, the first character in the plaintext is XORed with the seventh character in the table, the second character in the plaintext is XORed with the eighth character in the table, the third with the ninth, &c.
Each resulting character is then converted to two hexadecimal digits (the input can only be ASCII, of course) and appended to the ciphertext.

And that’s seriously all there’s to it. The result is a “cipher” that’s either slightly less or slightly more secure than writing out your passwords in permanent marker on the outside of the door of the server room, depending on how you manage your configuration files.
Because I know this is going to be an issue at some point, I’ve written a simple utility that encrypts and decrypts passwords using type 7, which you can find here.

You’d think this would be a moot point because people should realise their configuration files are sensitive information, but people are, of course, idiots. In that sense, type 7 isn’t just worthless, but actively harmful, because it gives people a false sense of security.


0 http://insecure.org/sploits/cisco.passwords.html

1 The original intent of type 7 was apparently to foil shoulder-surfers, who might see your configuration file as it scrolls by on your screen. Cisco’s official stance (now) is that if security is an issue, the router configuration file itself should be treated as vulnerable data, not just the passwords that may or may not be displayed in it. That would be fair enough, if it wasn’t at odds with Cisco’s default way of saving and loading configuration files, which is through plain TFTP over the regular network, with no options for encryption of either the config or the passwords themselves. But, you know.
(The claim that type 7 is so weak because the router has to be able to reverse it is bullshit, of course. At most it’s true for PAP authentication, but anyone who considers PAP passwords secret information has no business being anywhere near a router.)

2 Cisco themselves now advise against using it, instead suggesting people use type 5, which isn’t encryption, but just hashing with MD5. Which is also broken, of course. The CCNA materials also state that at least type 7 is “better than no encryption”, but I’d argue that it’s worse, because its security is equivalent to plaintext, while also giving idiot network admins the impression that it’s not.
I’m told a type 6 exists now, which is based on AES and supposed to be better. AFAIK our routers don’t support it, and I’m not holding my breath either way.

Permalink 6 Comments

Valid Brainfuck code

                ++++                         +';cloolollllllooooddddoddddoollllcccccc:;;;;;;+          ++k0XXXXXXXXXXXNNNNNNNNNNNNNWWWWWWWWWWWWWWWWWWWWWWWWWWW
               cO0:+                    +~~~~;;;;;::[>c:::::+clooodoodddddooolccccccc;;''''~           ~:++XXXXXXXXXXXNNNNNNNNNNNNNNNNNNNNWWWWWWWWWWWWWWWWWWWW
               ~~             ~~~~+++''~~~~~~~  ~~~+';;;;;;'';;;:ccclloloollllccccccccc;;>~~           ++d0KKKKXKKXXXXXNNNNNNNNNNNNNNNNNNWWWWWWWWWWWWWWWWWWWWW
                     ~~~~~~~~~~~~~~                  ~~          ~~~~~~~~~~~~~~~~~';;;;'++              ;ok00KKKKKXXXXXXNXXXNNNXNNNNNNNNNNNNWWWWWWWWWWWWWWWWWW
                      ~                                                                                 ~;okO0KKKKKKKXXXXXXXNNNXXNNNNNNNNWWWNNWWWWWWWWWWWWWWWW
                                                                                                          ':odk00KKKXXXXXXXXXXNXXXXNNNNNNNNNNNNWWWWMMWWWWWWWWW
                                                                          ~~~~~~~~                         ++cdkO0KKKKKKXXXXXXXXXXXXXNNNNNNNNNNWWWWWWWWWWWWWWW
                                                                    ~~;;;;~                                 ++++O00KKKKKXXXXNXXKKKXXXXXXXNNNNNNWWWWWNNXXKKKKKX
                                                                  ~;::;'~                                    >;okO0KKKKKXXXXXXXXKKKKXXXXXNXXXNNNWWWNKK0OkkkkOO
                                                               ~;::;'~                                          +++0KKKXKKXXXXXXKKKKXXKKKXXXNNNWWWWNKK0OOOOOOO
                                                            ~::cl;~                                   ~~           >d0KKKXKXXXXXXXXXXXXXXXXXXXXNWWNXKK00KKXKKK
                                                          ~d0Okl;~                                   ;~oo~           +++KKXXXXXXXXXKXXXXXXXXXXNNWWNXKKKKXXNXXX
                                                        ~lKNWXk+;~           ~;            ~         :++N+       ~~ ~~+:++KKKKKKXXXKXXKKKKKKXNNNWNNX0KKXNNNNNX
                                                       ~xNWWN0<~ ~~   ~~'<~;co~         'l<o~~   ~   ;<k0Wd~    -':~~c:c;:]KK0KKKKKK>XXXXXXXXNNNWNNXKKKKNNNNNN
                                                      :KWMWX++';;'';';cdc;lOOl          .dK>do;   --;kkl0NWd- .'ckoo;k:Odx;'xK0<<00KKKKKKKKKKKXXNWNNK0OOOKXXXXX
                                                     lNMWNd;;+k+:+dddOK0o[X>>x~          ~OWNKK; ~ '++N++kNOl+ ;c<<x-;oOx]~;x00000>>KKKKKKKKKKXNNNX0OkkxxOOOOO
                                                    oNNN0ccx0KOkO0KO0XW0kWNNMX;;c..~'~ ~'':<<NNX;;~ ;dkk+++0dc :cONk;;:k[x:~:>>0000KKKKK00000KXNNNXK0OOkkkkkkk
                                                   lOOKOox0KK0O0XNKOKWWOKMNWMKcoxkl:cloxxdokXNKXx;:  ;+++ld0dx~;:oN<~~;<ddd-~d]0000KK0000O00KKXXNNNNXXXKKKKKK0
                                                  ;clkkoOKK0kk>NW>kOXXXNWNXNMO:kkOoc::OOOOdkdO0xO;::~.~:ldx0Od~:>cNN'~;lcod++c00000000000OO00KXXNNNNNNNNNNNNNN
                                                 ~;:dkk000kxk0XWWOoO0xkKOOkOXocOoxlo;cOxlocl:;llxl~;:~  ~cdO0c~;.cKX;~~;:ll'~;>O0000000O0OO00KXXNNNWWWNNNNWWNN
                                                ~~:okOkoxOkO0XNWKc;co;xc;;;okcloll~;'dKocc;:o:;;;c~ --   .;d+;++ lOO+  ;;l;+++oO00KKK0OOOOO00KXNNNNNNNNNNNNNNN
                                                ~;oxOx';dkOKXWWNo~~';~l'~~~lc:l:;~  ~xKddo;':l;;~;;  .   ~~;d;  ~:xc   ~:o:~~~c<00<00000OOO00<<NNNNNNNNNNNNNNN
                                                'odxd;~:xk0XWMMX;~~~~~'~ ~~c::::~~  ~xKkOOo'';;'~~;~      ~~c'   ~:    ~ol;~  :x0000OOOOOkkO00KXNNNNNNNNNNNNNN
                                                :oo+kc~lxkKXWWM0'  ~~~~~~~;:;c::~   ~+K0O0[:~;~~~~~'     ~' ;:         ~O;;~  'dOOOOOO>>kkkOO00KXXXXNNNNNNNNNN
                                               ~:o;dkd~;:d00KNNk~  ---~~~~:::olc'   ~<<KxOOd;''-  ~'     ~~ ~'~        ;];~   ~o>OOOO>xxxkkkOO000KKKXXXXXXXNNN
                                               ~;o'ccc~~~~;ldxkd-  .~~~~~~;;l++c;.   c00d>O>:~~~-.-;         --        :;-    .lxkkkkkxxxkkkkkOOOOO0KKKKKKKXXX
                                               ~'lllolcclllc:'';-- ~~~~~''-;cll::;.  'k0xoxd:~~~~ ~;~        ~~       ~c~     ~:<kkkkkxxxxxxkkkkkOO000K0KKKK00
                                              ~~~~~~~'~~;cxXWXxc;~ ~~~~''''~';'';;'~  ;oxolo:~~~~ ~:'         ~       ~;      ~;<xkkkkxxxxxxkkkkOO00000K00KKK0
                                           ~~ol';~~~~       ~:xKk;~~~~';''';'''~~~~    ~';';;+++  +c'          ~               +dxkkkkkkkkkkkkOOO00000O000KKKK
                                        ~;c:'kXlc:c;.          ~<O;';cl::;;cdxxkxxxdo<:'~++++~~   ~[~      ~    ~              ;o>xxxx>kkkkkkk>OOOOO00OOO00000
                                      ~;c;~'::::clo;;;;''co+     ld:xKXk;cddlll:;;;cdk0XWXd;+      '       '                   +<dxxx<kkkkkkkk<OOOOOO000000000
                                       ~  'odcldd:clkKXKO0X0l::;;'cod0Kd'~~~~~~  ~     ~'c0O;-    ~~       ~    ~              ;]oxxxx>>>xkkkkOOOOOOO00000000K
                                         'c;~ :odo0WMMMMMMWWNNWWK:lx00k:~~~~;~  ~.        ~x<~~    ~       ~      ~~           ;<o<+++kk+x[kkkOOOOOOO0O000000K
                                         '~    lxOWMMMMMMMMMWMMWxoxOWWOc';cxkxocod;         c:o:                               ;od>xx>xxx>kkOkkkOOOOOOOOOOOO00
                                                ;OWMMMMMMMMMMMXxx0XNNNxo:cNMMWNNNNXOdc:;;;'';;;;'~                   ---      ~;ld<xx<xxx<xkkkOOOOOOOOOkOOOO0O
                                                 'lk0KKXXXNX0dlxNMMMMMNko;KMMMMWWNWWMWWWNXKk:;:l'                  ~~~~~    -~;:]>>xxxxxxxxk>kOkkkkkOOOOOOOOOO
                                                 dXNNXKKK00OO00NMMMMMMWOollXMMMMMMMMMMMMMMMXk;;;'~~~~~                ~~~   .~;cxO<<<kxdddxkkkkxxxxxxxkkkkkkkO
                                                ~kWMMMMMMMMMWWNWWMMMWWKoll:;kWMMMMMMMMMMMMWOxd;~  ~~~~            ~      ~~+++;[o>>>NX>-----xkkkxxxxxxxxxkkkOO
                                                 dNWMMMMWXXNNkccdkOkdxO0KKo;:lxOKNWMMMMMMWXK0Oc~~  ~                       ~':<<0O<x<-]cclodxxxxxxxkxxxxxxxkOK
                                                 ;KNWMWWNNNWMXxl:;;'~~~~;c;l0X0kdddkO00KKKKOxl;~~                           ~~'>d>>do>ooddddxxxxkkkxxxxxxkkk0N
                                                 ~xXWWNNWMMMMMMMMWX0OkdolldKNNWMMWN0dl:;;;;;'~~                               ~~+.coooddoddd<<<<+++++++xx[kOKW
                                                  :0NWWWWMWWMWWMMMMMMMMWWWWWWMMMMMMMWX0ko:'~~~~                                   ;lllo>o>-xxxxxxxxxxxxxxxk0NM
                                                  ~dOKXXXK0ko:;cdOO0KNWNNXXXWWMMMMWWNK0Okd:'~~                                 ~~-;<<llooddxddxxxxxddddxxxolxX
                                                   'x0KKOd;~     ~~~~';oxOO0XNWWWWNXK0Oxoc:;~~~                                ~~-;c]lloodd>ddddddddddoddd;  ~
                                                    lk0000Ol~           ~';oOKKXXK00Oxdo:;'~~~                                  ~':c>.looooddddddddoooo<<o;+
                                              ':okO0Kxk0KNNKo;'~           ~lkO0Okxddoc:;~~~~                                  +;:cc[lllooo>>dddoolllllol;---
                                        ~':oxXMMWMMWWXkk0KK0xoodl;;;;;';;;lxkkkkkxdoc:;'~~~~~                            --~~~~;<l<llllllooooodolcc:;;::::;'-
                                     ~oKWWWWWNWMWWMMWNNOk00Okoccc:colcllooxkkkxdoo]:;'~~~~~                         ~;:>>llloc;.:<<lllllllooooolc:;;''++++;:c:
                                    cXNNNWMMMWWMMNWMMWNMOk000ko:;;;'';;;;;cloo[:;;'~~~~~                       ~' ~;>dxxO>NNNX0oxoc;:lclllllollc:;;'+++++~~~''
                                  ~xKXXKXNNNKKNMMWWMMMXMX0OKXNWNX0kdc:;;:cccc:;'~~~~                           d:~:>>00OkOKKX00XNX0o;cccllllllllc;;'++++~~~~~~
                                 ;0NNNN0KNWWNNNMMMMMWMKNWNOOKNWMMMMMNX0Okdol:'~~~~                            'Nclk0XXXK0KKKXKkXOXNOo;:<<llllllll::;;;;<dd<;~-
                                :N0NNMMWWMMXNMWWMMMWWWWXWXx;:xkxdxOKKK0Okol;~~~                             ~~]WoxkOKKKKXXX0>kKN0k00Okl>looooooollccc-.lONWN>O
                              ~lXW0XXWNXKNKoO0xOX0N0dxKX0Xo; ~~~~~~;cll:;~~~                               ;;>OWkWNNWNNXXOdl0KWNkOWNXOd++looooolloool+++dKWMMM
                             ~xXXNKOXKNNNWO:0xl0WNWWNWWMWWNXO'                                          ~++Ox0NNlX0OKXNWX0oO0ONNkxWKxdc;.lloooollooocclc-l0WMM
                            cOXNKKX0kXKKNNx'xo-cWWWMWWNMNNXKNO                                         o-coNKWWO-oocddkxcoK0c0WK:oOOKXx;-.lloooooooc;;;;;;<ONM
                           cNK0XK00K0OXkkk:~:l~~dWWMNNXWWXKkXX~                                       +NK+OWXWW0lOk++O0<<<0+dNWd;ONXKK0l+'+[oooool:~ ~~;;;:l>>
                          dKXNK000OxOOxxkKo~oOo~~koxd;;:o'~~::~                                      :>Xkk0NKNNWWWcxXkd+0O+;0Xk:l0NXOxxOo';loooll+~     ~;<<dO
                         cKXOKN0k0OOooOc~lc ;ol~~;  ~'~~';<lldxl;-                                    ~;'';xccdXNk;O0~c0k'~xNXx:o0K0KK00k;~cooo]>'        ~;>k
                         xNKK0KN0xkOOkOO;;o ;kk:ldolxxokOOkddc~:kd;~                         ~'~~   ~;c:~~ ~   ~'~~ oOxl~;KWWKc;kXNNNXkxo>+;oolll:.
                        oKk0000KXKkxkxdol'c 'kKd;'::'lxO0Oxk0;~lxkd;~~~~~                ~;:lo:;~  ~0NMWWNXNKKOkc~  ;K~ 'OWNKd;cKNKOkxO0kdl:coooolc;'~
                       cK00KO00Okk0kdollc;; 'ooclxkdcc0WMWNXkccc:dxd:~ ~~~            ~;lxkxodko'''~:o0xdO0WNNNMO;  ;W:~0XXXKo~:oxO0KKKK0olcllllllllllc'
                      lXkdkXKOOkkxdxkkl;:dl ;Oxxk0KKWOxNMMNkKXX0l:;coc;~            ~;ooc::coOx:~~ ~lc:;~'cod:;o;   :NO:0KXNk''d00KKKK0Oxdxoolllllllllc'
                     ;KXKK00KKkkOxxdo::'':~ co;:oxxXNNMMMMMMNdxNOod:';cl:~~         ~~~~';;;;::~~':;':oodxWNMNko'ok~xKWlkkxx:'lkKKKK0O0X00Ol:'cllllcccc~
                     oK0KXKKK00xdxxdo:'~ ' 'xd:o0K0WNNNWMMMNd~ONk0XOdc~~~~          col;~~~:llo;'oxxl;;dkoK0NM0lccc'l0N;;O0x::oxOO0KKKNXxoool;;cllcccc:~
                    l0OK00KK0kdlcdool;~~ ~ ;l'~lkdOMNXc;ddl~  dxkNXkkWl~~          oWWWNOdllc;;~~;;;c' :o:o0kWl  ~~cxOX~~kOo;cx0K00kxxOkxOOOx;'ccccccc;
                  ~kWWO0OXX0Okxl:;;;;;'~~''o;~co''xM0kKc~;~  ~NO;oxkXXlok;~~      :NW0xox0XNKxc:o0olkNO0NMNN0o ~:~~kNX0 ~doccxKKkkxoxOK00OxOOo:;cccccc~
                 ~xWMWNKXKOOOxdoc:;~     ~~~  ;;~'OWdoXO;~;oloOxxllcW0lcx;:Xc;'oc~:dk;;;kl;:xXoxxWxlkWl;ok0:~~;xkl;XXKx  o:~;xKKK0xok0K00xdOXo~~c:ccc:~
                 lNWWWNN0O0Okdll::~           ~  ~d0:lXd;~oOKkl:;lNc;;;:O:cK:''ko~~;d~ 'kl;:kWoddWk:cd~   :xcXKkdKdNXXO~ ~~'kNWN0xclkO0KKklxd' ~c::::;
                 dXXNNXK00xoodol:;'~              'o':O;';xkO0Oxo:ol'~ ;x~;Kl;;Oc~ 'l~ ~ko;:kNoOxMK;'~    ~''NXkdWKNO:;   'xXNKOdccOXNNXk:;;~~';';::;~
                :xkKXK0Okxdo:;~~~~~~~  ~~          ~ ;d;~;0XXWW0dl:'~~  ;''c~~~d;~'cd~~'0x;'xWxkdW0;:l       oxoc0;~'~   ~xKK0Oo~~d0xl;'~~~~~~~  ;::;~
               ;xOOkxxxdo:;;'~~~      ~:ol;~          ;~~ck0Oxx0Oc''~~~;:c;;~  :'~kWK~~'Ol~~xXlcc0c;xWl      ~~ ~d~      ~;ldl;':kxxkkdllodxdddc~;::;~
              ~lxkOOxc'~~~             ~'ckOx:~       ~~:OXXOxxdc'~~~~~ckko;   ~~ ;0K~ ~Ol'~dO;~~dlOOWk;        ~;         ~;;:OWNXOdcldOKKKK0Oo~;::;~
              ;oddol:;;;:::::;;'~~       ~ck00Oxxxdlcl0WWWOXKxkXKl~  ~cO0Od:~  ~;;oxxc;;xl;'ox;~;Oxolkdd:'       ~      ~;lkOOOxl:;:ok0KXXXXNX0x;;c:;~
              ;colldxO0KXXKXKK0Okxl:'~    ~';ldOKXNMMMMMMW0NKdd0MNNOOXXKkdc;~~:lo~cod:~~'';;:'~~~~~~  ~~ ~~             ~~'''';;loxOKK0KXNNNNXKx;'::;'
              ~:loxkO0KXXXXNXXXK00XNNKKXdlKKkdl;:ldNMMNKMWWW0kKNMMMMMMMWO:~ ~o00xodxKWWWMMMMMN0ko'       ~~~            ~':coxkOO0OOOOOO0KKK0xc~~;;;;;~
              ~;coxxkO0KXKKXXXXKNWMMMMWXWMMMMMMKc~ :XMMMM0k0OkXXWMMMMMMMNK;:dloXMW0xNMMMMMMMMMMMMWKl'' ;doc:::c;;~~    ~;clddddddddxkkkxoc;~     ;;;''~~
               ~;coddkO000K000KoNMMWWMMWMMMMMMMMMKcoxkXWMWMNOkWWMMMMMMMMMMNxxOdXMWkOWMMMMMMMMMMMN0WMXxkXkXMMMMMMMMW0o;~~;:clooooooc:;;;''''~~    loolc:;'~
                ~;:coxOd;dOKWWMMMMMMNNMMMMMMMMMMMMWKKXWWMMMNkxNMMMMMMMMMMWxc0NXWMNx0MMMMMMMMMMMMMMMMOcoK0XMMMMMMMMMMMMWOc~~~~~;;;:ldkOkxdol;~~  ~OKKK0ko:~
Kkkdlc;;'~'~~~   ~~;OWMX:dXWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMXxxNMMMMMMNNNNOox00XWMKdKMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMNx;~~~cxOOO00Okxoc;~   ~XNNNNKxc~
MMMMMMMWWNNNXXK0k:~dNWMMX::NMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMKxkWMMMXo;:;';cxKO0XWWOdXMMMMMMMMMMNMMMMMMMMMMMMMMMMMMMMMMMMMMMMN0o;;:okkOkkxoc;~~   ;XNNNN0o;
MMMMMMMMMMMMMMMMNKKc~;xNMWKWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM0x0MMMK;~:NN0d':XXXWMNkkNMMMMMMMMMMMMMMMMMMMMMMNNMMMMMMMMMMMMMMMMWX0kdddxxddo:;~~    ;KK0OOd;~
MMMMMMMMMMMMMMMMMWMWNMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMXKWMMWKOXMMMK;~:NMMWdOMMMMMWXNWMMMMMWXKWMMMMMMMMMMMMMWWMMMMMMMMMMMMMMMMMXOOkkxolc:;'~~~    ;ol:::;~
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMN' lMMMWWWMMMMKxOWMMMMMMMMMMMMMMMMMMMW0OWMMMMMMMMMMMMMMMMMMWMMMMWMMMMMMMMMOdllcc;'~~~~~     ~~~~~~
WMMMWWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMWl  'x0MMMMMMMMMMMMMMMMMMMMMMMMMMMMMWOoKWXWMMMMMMMMMMMMMMMWXXl;dlcKWMMMMMMMXc''~~~~~        ~;~~~ ~
WWMMMWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMWNWMMMW0:  ~oKNMMMMMMMMMMMMMMMMMMMMMMMMMMMMM0' ;c~dMMMMMMMMMMMMNc:'      ~kNOxNMMMMMl~~           ~:ol:;'~'~
MMMMMMMMMMWKWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMXo;;;l:'~;dXMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMWNNWMMMMMMMMMMMMMMMX;~;:~   ~cOKOOWMMMM0;~           ~okkkxolc:'
MMMWWMMMMMXOWMMMMMMMMMMMMMMMWWMMMMMMMMMMMMMMMXo:loc~~~lKWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMWNNMNx~ ~kkdOKMMMMMMWXd~           ~;okO0OOxo;'~~
MMMMMMMMMMMWWWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMNkcdd:;;oWMMMMMMMMMMMMMMMMMMMMMMMMXXWMMMMMMMMMMMMMMMMMMMMMMWKKXNo;:''';xNMMMMMNd:    ~~';:lok0XWMMMMMMMMMMWWNXK
MMMMMMMMMMMNx:;;;;cNMMMMMMMMMMMMMMMMMMMMMMMMMMMMWl'lxOXWMMMMMMMMMMMMMMMMMMMMMMMNxdKWNNMMMMMMMMMMMMMMMMMMMMMMMWd;0W0l~ lWMMMMMx  :KXXNWMMMMMMMMMMMMMMMMMMMMMMMM
WMMMMMMMMMXl:c;   ~0MMMMMMMMMMMMMMMMMMMMMMMMMMMMMNkXMMMMMMMMMMMMMMMMMMMMMMMMMM0'~~dNWMMMMMMMMMMMMMMMMMMMMMMNWMOOWWMMWNNWMMMMMo 'KMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMK~ ~;;;lKMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMOlKMMMMMMMMMMMMMMMMMMMMMMMMMNWWkdXMMMMMMMMMMMW0oxWMMMMMWWMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMX;   ~;:lkXWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMNWM0o0WMWXXWMWNOxoclx0OKNMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMN;       ~'dNMNKXNMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMNc;lxOx;~:olcc'~';cl;o0NWWWMWWMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMWO'       ~cKWNKkxxxxOXWMMMMMMNWMMMNWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMX;      '''ckk:~   ;;~:cldkOXMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMK~        ~~;x0KXKOxxO0OkOxooOKXWNXWWWNNWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMk         ~:c~     ;;;'';:cxXMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMN'             ~cdxl:dkl~~~~;cdxdl;::;;ckWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMWNKd~                 ;kc:~~;;:ONMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMWo                          ~cko~~     'xXWMWWWMMMMMMMMMMMMMMMMMMMMWMMMMMMMWWWMMWMMMMMMMWXOd;~                   ~K0''~~~~~'lKWMMMMMMMMMMMMMMMMMMM
MMMMMMWNMMMMMW:                                     ~;XMMMMMMMMMMMMMMMMMMMMMMW0ddOWMXd;';coc;:c;:d0Ko~                       oMx~~~~;;;':dKWMMMMMMMMMMMMMMMMMM
MMMMMMWNWMMMMMx~                                      ;dO000XWMMMMMMMMMMMMMWXo~  ~;;~             'XN'                  ~;odkXWl~~~~';;;cokXWMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMM0c;~                                          ~'~':clllc:::;'~                      o0;               ~:kXNWWMWX;  ~~~'';:clkXWMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMX;                                                                               :WMW0~          ~;kXWMMMMMMM0~   ~~~';;:lkKWMMMMMMMMMMMMMMMM
MMMMMMMMMMMXXXMMMWKl~                                                                             :kdc~        :k0XWMMMMMMMMMWk~    ~~;;coxXWMMMMMMMMMMMMMMMMM
MMMMMMMMMMWOooXN00WM0;~          ~;:cxO:                                                                      dWMMMMMMMMMMMMMNl~    ~~';lkXWMMMMMMMMMMMMMMMMMM
MMMMMMMMMMWkc;;~ ;XMMMXo:~    ~lKWMMMMMXkkko~        ~;c;~~                           ~~~                   ~dWMMMMMMMMMMMMMWo ~~~ ~~';lkXWMMMMMMMMMMMMMWWMMMM
MMMMMMMMMMWkc'~   'oxOONMX;':d0WMMMMMMMMMMMMNOdl;~~;:l0WMWNX0Okkkkkdl;~~';'~  ~';:coOXWWNX0kxlc:;~        ~;0WMMMMMMMMMMMWN0c  ~;~~~;;okNMMMMMMMMMMMMMMMMWMMMM
MMMMMMMMMMNd:'~        'x00NMMMMMMMMMMMMMMMMMMMMMMWMMMMMMMMMMMMMMMMMMMMWMMMNXXWMMMMMMMMMMMMMMMWW0l;':c:cdOXWMMMMMMMMMMMMM0'~   ~:~~~;dONMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMK;~~           ~dXWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMWNWMMMMMMMMMMMMMMMMXxxOk:    'oo~;o0NWMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMK:~              ~;:oXMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMWKc~       ~;00';XMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMWO'                 ~oXWNXXNMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMWWNOkkkkkkxlc;~       ~c0NK;:XMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMWO;                  ~   ~cxxddx0XWWW0cd0NWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMWMMW0d:;'~                 ;dXWNx;lNMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMWKd;~                           ~~~    ~~'';lkO0NNNNWMMMMMMMMMMMWNKkdkKOo:;'';;~                    ;l0WMW0ll0WMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMWXXKx;                                         ~~ ~';;clooolcc:'~                               ;oKWMMNOodKMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMWNXXKKOdl;~                                                                                 'cdKWMWXOdoxKWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMWOlxO0K0ko:~                                                                     ~~;cx0XWMNKOxdokKWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMNK0OO0XNNX0ko:;~~                                                     ~~';;l0NWMMWX0xc'~:ONMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMWXXK0Okk0KNNNX0Oxdolc:;;'~~~~~~~~~~          ~~~~~~~~';;:::lodk0XNWWMWN0xxdolcclodOXWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMWXKOOkO0KKXNWWMMMMWMWWWWWWWWNNXXXXXXXXXNNWWWWWWMMMMMMMWWXX0kdlc:;;:ldOKNWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMWNXK000KK0OkO00KXXXNNNNNNNNNNXXKK0000OOkkxdxxkxoccloxk0KNWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMWWNNNNNWNXXXKK0OkxxxkkkxxxxxxxxO0KXNWMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM

(Spoilers.)

Permalink 1 Comment

Controversy!

If you follow these things at all, you’ve probably heard by now: creationists are once again inventing a controversy where there is none, this time by pretending that it matters at all whether Dawkins’ venerable Weasel program uses locking or not, and claiming that his “unwillingness” to dig up code written in the ’80s and release it means… well, something significant.
In case you’ve forgotten what the Weasel program is, here’s a video (using a different phrase, but the same concept):



If you’re a long-time reader and that looks familiar, it’s because I’ve talked about it twice before. The experiment is simple enough that any idiot can repeat it, but of course creationists are a very special kind of idiot.
So this time, let’s walk through writing our own Weasel program and settle this once again.

I’ll be doing this in a kind of “literate Python”, because everyone understands Python and it doesn’t require compilation, so even the most technologically inept don’t have any excuse not to follow along.
If you don’t have Python installed (or aren’t sure if you do), get it here. Get the 2.6.2 one (if you aren’t sure which you need, you’ll need this one). If that’s too hard already, you shouldn’t be on the Internet in the first place.

I’ll be preceding lines of Python code with > signs, façon literate Haskell. The Python interpreter doesn’t understand this style, so I’ll also be providing a link to the final script at the end.

To recap, we’ll start with a random string composed of symbols chosen from a specific alphabet, and a target string which we’re hoping to achieve.
For randomness, we’ll be using Python’s inbuilt random module, so let’s import it.

> import random

The genetic alphabet is just “CGTA” for DNA, but ours will be a bit longer:

> alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ "

Note that that includes the space at the end. We could make it even longer by including minuscules and punctuation, but it really makes no difference to the principle of the thing.
And here’s our target string:

> target = "METHINKS IT IS LIKE A WEASEL"

Of course it’s important that all symbols in the target string are also in the genetic alphabet, or we’ll never find it.

At the heart of every genetic algorithm, there’s a fitness function. In our case, this is just something that compares our string of “DNA” with the target string symbol by symbol, and just says how many symbols match. Strings with more matching symbols are obviously more like the target string, and will be selected to breed for the next generation.

> def fitness(child):
>     fit = 0
>
>     for a, b in zip(child, target):
>         if a == b:
>             fit += 1
>
>     return fit

The built-in zip function pairs items in a given list, creating a pair of the first character in both the “organism” and the target string, then the second, then the third, and so on. For each pair, it’s then going to check if both members of the pair are the same symbol. If they are, the fitness value is incremented by one. At the end, it’s returned.
This should be obvious.

Equally important, of course, is the reproduction function. Dawkins’ Weasel strings reproduce asexually, so there’s only one parent for each child. The parent copies his entire “DNA”, and each locus has a small chance of mutating.
How high the mutation rate is isn’t that important, because the point of the Weasel program isn’t to simulate real-life life perfectly, but just to demonstrate that descent with modification is more effective than a random search. Let’s give our strings a mutation rate of 1 chance in 50 at each locus. If that seems too high, remember that our genome will be 28 loci in length, so there’ll already be a lot of generations where no mutation will happen at all.

> def reproduce(parent):
>     child = ""
>     for gene in parent:
>         child += random.choice(alphabet) if random.randint(1, 50) == 1 \
>                                          else gene
>     return child

Did you follow that? Our child starts off as an empty string, and then we iterate over the genes of the parent. There’s 1 chance in 50 that a mutation occurs, in which case we randomly select a gene from the alphabet to add to the genome. Otherwise, we use the parent’s gene.
At the end, the constructed child is returned, ready to have its fitness judged.

Now that we have those two important functions, the rest of the program is straightforward. First, we construct a completely random starting point:

> parent = [random.choice(alphabet) for _ in target]

Our Adam. Let’s put him and his fitness on display:

> generation = 1
> print "%d %s (%d)" % (generation, parent, fitness(parent))

The generation variable will keep track of how many generations have passed.
We’re just about ready to enter our reproduction cycle now. Every generation, our parent will have one child, and if this child is fitter than his parent, this child will be the next parent. Otherwise it is mercilessly discarded and the parent will parent the next generation as well.

> while True:
>     child = reproduce(parent)
>     generation += 1
>
>     child_fit, parent_fit = fitness(child), fitness(parent)
>
>     if child_fit > parent_fit:
>         parent, parent_fit = child, child_fit
>
>     print "%d %s (%d)" % (generation, parent, parent_fit)
>
>     if parent_fit == len(target):
>         break

As you can undoubtedly tell, this loop will exit once the fitness of the mutating string equals the length of the target; that is, when both strings are equal.
The string "METHINKS IT IS LIKE A WEASEL" is 28 symbols long. With a 27-symbol alphabet, a random search would take on average 2728 / 2 generations to match it. That's 5,986,​257,​591,​281,​009,​894,​301,​370,​013,​358,​523,​552,​840 generations. Even if we're doing a trillion1 generations a second, that would take 189 million trillion2 years to finish.

Our little program will be doing a bit better than that. The final generation number will be displayed before the final generation genome, of course, but let's rub it in again just to be sure:

> print "Finished! Target reached in %d generations!" % generation

Running it a thousand times, I got an average of 8012.58 generations. Suck it, creationists. Descent with modification and selection really is faster. Just like the last time. And every other fucking time.

And as promised, the full code is here. Just save that somewhere and double-click it to run it (you'll need to chmod +x on sane platforms, but you know that).

And that really should be that. Dembski can wave giant-sleeve-clad arms about free lunches all he likes, but in the real world, not everyone is innumerate. It’s just sad that decent people have to waste time on his bullshit.
I doubt this will actually do much good (of course it won’t; even the creationists themselves (exceptionally dense specimens excepted, as usual) realise this time that there’s no controversy here, just a giant heap of time-wasting nonsense), but if nothing else, I hope I’ve demonstrated even an elementary school student could do this. If the original code is of any interest at all, it’s because of archaeological reasons, because old code is usually interesting, not because the algorithm is that fascinating or complicated.


1 Short scale. 1012.
2 1018. Yes, I know the proper name for that is a quintillion in the short scale and a trillion in the long scale.

Permalink 2 Comments

Lol Github

I’ve never had much use for source control, but after that one thread broke /prog/scrape I realised it might be useful to have somewhere centralised to put the source code and recent diffs, instead of just having a possibly-up-to-date file hidden here somewhere and having people download that whenever their shit breaks without really knowing if I fixed it yet.
Github was an obvious choice, not least because bandwagons are awesome, but then I lost interest and forgot about it until now.

Incidentally, when you create a new repository on Github, it automatically generates a URL for you to push to, of the form git@github.com:​username/​projectname.git. The project name can’t have any characters outside [a-zA-Z0-9-], though, so when it does they’re replaced with dashes.
Whoever wrote that piece of code, though, forgot that project names apparently also can’t start with a dash, so when you try to push to it, it just gives you an unhelpful error message:

Invalid repository url. Make sure you include the .git, e.g. git@github.com:defunkt/ambition.git

And then it breaks the connection, which is particularly nice when your first experience with git involves pushing a project named -prog-scrape.
Anyway, easy enough to fix. I do like that it uses ssh, so I don’t have to type my password every time. I’ll probably be creating additional repos for my many, many other projects that people find useful.

tl;dr: http://github.com/Cairnarvon/progscrape/tree/master

Permalink 4 Comments

Langton’s Ant

As programming exercises go, this ranks somewhere between hello world and Conway’s Game of Life, but since most implementations I’ve seen on the internets use the basic form, I thought I’d write one capable of the generalised ant.

Recall: in the basic form, Langton’s ant moves forward on a two-dimensional field, flips the tile it’s currently on (which can be black or white), and turns left or right based on the color of said tile. This gives you a surprisingly chaotic pattern which degenerates into a repetitive “highway”.
In the generalised form, the tile can be any number of colors, and each color has a turning behavior. Instead of flipping a binary tile, it cycles through the list of colors in order.

You use this by compiling it (again, you’ll need Allegro; instruction is on the first line of the file again, and you’ll probably want to change the WIDTH and HEIGHT #defines again) and then invoking it like so:

ant LR

Where the “LR” is the pattern. LR is the basic ant, saying that it’ll turn left on the first color and right on the second. If you want more than one ant on the field at a time, specify that using the -n flag:

ant -n 3 LR

If you don’t want to start off with a black field, you can load a bitmapped image (.bmp, .pcx, .lbm, or .tga) with the -f flag. In principle you can determine the placement of the ant(s) with red pixels, but I can’t be bothered to figure out how Allegro works with palettes, so that’s not likely to work.

ant -n 3 -f penis.pcx LR

You can save a screenshot by hitting the S key while it’s running (it will always save to the same file, specified in a #define; unless you change it, that’s ant.pcx), and exit by hitting any other key.

Interesting patterns include LRRL (pictured), and patterns can be from one to fifteen tokens long (more if you add more colors yourself).
Enjoy.

Permalink Comments

Quadratic spline interpolation

You’ve had this problem before: you have a bunch of data points, and you want to interpolate between them.
For various reasons, higher order polynomial interpolation (where you try to find an nth-degree polynomial through n + 1 of your data points) can be a bad idea, so you decide that rather than using a simple equation, you’ll use a series of them to connect your data points. These equations are splines, and the simplest form of spline interpolation is just, well, connecting your data points directly:

That’s pretty ugly, though. Is there a way to achieve something like this:

instead?
Yes, obviously, and one of those ways is to use quadratic splines instead. Let’s use a simpler example, though. Suppose we only have four data points, (x0, y0) through (x3, y3):

linear spline interpolation

The black dots are our actual data points, the red lines are our linear splines. What we’d actually like, though, is this:

quadratic spline interpolation

Turns out that’s not that hard to do. As you can see, every spline is a quadratic equation, which obviously is of the form f(x) = ax² + bx + c. So each spline equation has three unknowns (a, b), and c, and there are three splines, for a total of nine unknowns (let’s call them a1 through a3 and so on).

Since two points are known for each spline equation, that gives us the following six equations:

To solve for nine unknowns, obviously we need nine equations. So what else do we know?

Well, the reason the linear spline interpolation looks like crap is because of the sharp breaks at the spline edges, so we would like our neighboring quadratic splines to have the same slope in the point that they share. In other words, if our spline equations are f, g, and h, we want the derivative of f to equal the derivative of g in the point (x1, y1), and we want the derivative of g to equal the derivative of h in the point (x2, y2).
The derivative is easy enough to find:

Filling in, this gives us two more equations:

Or equivalently:

Which brings our total to eight equations. We aren’t going to squeeze another legitimate equation out of this, so let’s just fill in one of the unknowns ourselves. If we make one of the as equal to 0, one of the quadratic splines becomes a linear spline, which is fine. Let’s take a1 for simplicity’s sake.
This enables us to construct the following matrix:

The first three columns are the as, the next three the bs, the next three the cs, and the final column will hold the solutions after reduction.
Filled in and solved for our particular dataset:

Which gives us the following equations for our splines:

Obviously this is a lot of work, but it’s mechanical work that doesn’t require a lot of judgement. Which is why I’ve written this Python script to do it for you. Feed it data points and it’ll produce gnuplot code to plot your splines:

$ python qsi.py < data.txt
plot 1.000000 <= x && x <= 3.000000 ? 0.000000 * x * x + 1.500000 * x + 1.500000 :\
     3.000000 <= x && x <= 5.000000 ? -1.250000 * x * x + 9.000000 * x + -9.750000 :\
     5.000000 <= x && x <= 9.000000 ? 1.125000 * x * x + -14.750000 * x + 49.625000 : 0/0 notitle

As you can tell, it’s not necessarily gorgeous, but it (probably) works, and it’s not like anyone has to see the code itself.
Format for the input file is as you’d expect: two numbers per line, first being x and second y, sorted. If gnuplot’s output is jagged, increase the sampling (set sample 1000).
And if it doesn’t work, fix it.

Edit: In light of overwhelming demand, this is a script that interpolates using a higher-order polynomial, as mentioned above. Here’s how the approaches compare for our sample dataset:

This script will fail if you only have one datapoint and its x value is 0, but everything else should work.

Permalink 5 Comments

Bézier curves are pretty

If you’ve ever used a vector drawing program you’ve probably come across them. It turns out they’re conceptually a lot simpler than I expected them to be.
Wikipedia has great pictures that should be self-explanatory:

This is a linear “curve”, with only two guide points:

A quadratic curve has three guide points:

A cubic curve has four:

And a quartic curve has five:

Most vector graphic applications only go up to cubic, and represent more complicated curves as grafts of simpler ones.
My last exam was yesterday, though, so I had some free time, and I’ve been playing with Allegro recently, so I thought I’d write something that draws Bézier curves of arbitrary complexity. Behold.

First line is how you compile it on a typical system. You’ll need Allegro, obviously, which for Debian/Ubanto users is the liballegro-dev package. Others can get it here.

The guide points it uses are passed as command line arguments, with the first argument being the x coordinate of the first point, the second being the y coordinate of the first point, the third being the x coordinate of the second point, &c.
The origin (0, 0) is at the top left of the screen.

./bezier 10 10 50 320 310 230 200 10

This should give you something like:

The program will pause after drawing your curve, until you press a key. If that key is s, it will save the screen to a .pcx file, the name of which you can change in the #defines (default bezier.pcx; if you don’t like .pcx, Allegro also supports .bmp and .tga, and will determine file type based on the extension).

Other things you can customise should mostly be obvious. If LINES is 0 (or undefined), it will just draw pixels instead of trying to connect points with lines. If GUIDES is 1 it will mark the guide points in the color specified by GUIDECOL. GUIDECOL, FOREGROUND, and BACKGROUND are just RGB values in the range 0-255.
WIDTH and HEIGHT are the dimensions of the drawing field. This doesn’t have to be your resolution, but probably shouldn’t be higher. If you want a 100×100 image, 100 and 100 are perfectly legal values.
GRAIN is how often divide() recurses while trying to divide lines into sections. Higher values should give more accurate representations, but usually aren’t needed. STEPS is how many points this will actually give you. Don’t touch STEPS.

This isn’t hugely interesting, but it’s a nice enough toy that I thought I’d share it.

Permalink 3 Comments

$ php -r "echo 0.15-0.05;"
0.0:

The actual result is 0.09999999999999999167332731531132594, courtesy of IEEE, but since PHP is user-friendly, it rounds before display.

ASCII round

Permalink 11 Comments

Literate Tripcrackers

There’s been some interest in tripcode crackers lately, so I thought I’d write on in Haskell. I mentioned this before, but I’ve improved it a bit since.
I’ll be discussing the code step by step in this post. By the end, we should have a working application that takes a POSIX regex as an argument, and then outputs tripcodes that match it.

If everything goes right, this post should be literate Haskell, but I can’t promise that it’ll actually work, what with WordPress being what it is.
Let’s get started.

The tripcode algorithm is relatively straightforward: the input is converted to SJIS, there are some XML character entity substitutions, then a salt is calculated, and the whole thing is passed to Unix crypt.
We won’t be dealing with SJIS conversions, since our input will be ASCII only, which (with one exception) is a subset of SJIS anyway, and our target board will probably be Shiichan or Futallaby anyway, neither of which even does it. We also won’t be writing our own crypt implementation in Haskell, so we’ll have to get it from a C library. To do this, we use the Foreign Function Interface language extension, like so:

> {-# LANGUAGE ForeignFunctionInterface #-}

The usual boilerplate:

> module Main (main) where

> import Char (chr, ord)
> import Data.List (inits)
> import Foreign.C
> import System (getArgs, exitFailure)
> import System.IO.Unsafe (unsafePerformIO)
> import Text.Regex.Posix ((=~))

It’ll become clear why we need all of these as we go along.
Let’s import our C crypt:

> foreign import ccall unsafe "DES_crypt" crypt :: CString -> CString -> CString

We’re using OpenSSL’s implementation, because GNU crypt is slow as fuck, and this thing is going to be slow enough as it is. That bit of code is just saying to take a C function named DES_crypt from a linked library, and expose it as a Haskell function named crypt, with the listed type signature.

We said the tripcode algorithm involves some XML character entity substitutions, so let’s write a function to do that. The canonical algorithm just escapes ", <, and >. If yours escapes more (or fewer), just add (or remove) them here.

> xmlescape :: String -> String
> xmlescape [] = []
> xmlescape (x:xs) = case x of
>     '"' -> (++) "&quot;" $ xmlescape xs
>     '<' -> (++) "&lt;"   $ xmlescape xs
>     '>' -> (++) "&gt;"   $ xmlescape xs
>     otherwise -> (:) x   $ xmlescape xs

Straightforward enough.
Next, we’ll need a function to generate the salt. crypt’s salt is string of length 2 whose characters are in the range [a-zA-Z0-9.]. The tripcode function obtains this by appending H.. to the input and taking the second and third characters, performing some transformations to ensure they’re in the allowed range:

> salt :: String -> String
> salt t  =
>     map f . take 2 . tail $ t ++ "H.."
>     where
>         f c
>             | notElem c ['.'..'z'] = '.'
>             | elem c [':'..'@'] = chr $ ord c + 7
>             | elem c ['['..'`'] = chr $ ord c + 6
>             | otherwise = c

Now we’re ready for the actual tripcode.
This will happen in the IO monad, because we’re dealing with the FFI. We don’t have to use unsafePerformIO to escape from it, but to keep our algorithm conceptually cleaner, we will anyway.1

> tripcode :: String -> String
> tripcode tr = unsafePerformIO $ do
>     trip <- newCString t
>     salt <- newCString $ salt t
>     peekCString (crypt trip salt) >>= return . drop 3
>     where t = xmlescape tr

Great. Now that we have everything we need to calculate the tripcode for a given input, we need a way to generate inputs.
Let’s first start by specifying the characters we want to allow. We’re leaving out # and ! because they’re usually separator characters and as such illegal in tripcodes (though most boards will allow !), and \ because that’s the aforementioned ASCII/SJIS exception.

If you’re targetting a specific algorithm and you know which characters are fine and which aren’t, you can edit this. For Shiichan, for instance, the only forbidden character is #.

> allowedChars :: [Char]
> allowedChars = filter (\c -> c `notElem` "#!\\") [' '..'~']

You can avoid the call to xmlescape altogether by disallowing characters that would be escaped, of course; that’s what I did for my C implementation, too. That will obviously reduce your search space, though.

Now we can use these characters to generate our (infinite) list of inputs:

> ins :: [String]
> ins = (inits . repeat) allowedChars >>= sequence

This will generate all possible combinations, from "" to strings of infinite length. Since crypt disregards input over eight characters wide, it will generate far more combinations than you’ll ever need. Since it will take almost forever to even get up to that point, though, the issue is kind of moot.2

Now that we have our infinite input list, we can turn this into the input/output combinations we need; first we’ll team up each input with its corresponding output with zip, and then we’ll filter it with our regular expression. Obviously, this is where Haskell’s laziness is really handy:

> tripPairs :: String -> [(String, String)]
> tripPairs regex = filter (\(a, b) -> b =~ regex) $ zip ins $ map tripcode ins

It’s cute that (=~) is used for regular expressions. It’s also handy that it takes a string as its second argument and we don’t have to dick around with compiling regular expressions and what have you.
(=~) actually has a pretty complicated type signature, but its return value is a polymorphic value that converts into a boolean without complaints, so we don’t have to worry about that.

So what do we have now? We have our tripcode function, our list of inputs, and a function that filters this list based on a regex argument. We’re pretty much done. Now we only need the main function:

> main :: IO ()
> main = getArgs >>= f
>     where
>         f []       = putStrLn " USAGE: tripcode [regex]" >> exitFailure
>         f (arg:xs) = mapM_ (putStrLn . show) $ tripPairs arg

When no argument is passed on the command line, we display a short usage note and return failure, otherwise we generate our list of matching pairs, turn them into displayable strings, and then print them to stdout. Try it, it works!3

It’ll be slow as fuck, though,4 for a number of reasons. The first is that OpenSSL’s crypt, while much faster than GNU crypt, still isn’t very fast. The second is that Haskell’s Text.Regex.Posix is very slow (if you know how, I suggest you use another regex library; I went with Text.Regex.Posix because most casual Haskellers (which I am too) aren’t likely to have any of the others). The third is that it can only use one processor at a time, whereas most others are multithreaded or multiprocess affairs. The fourth is that a high-level language like Haskell is obviously going to be slower than tripcode crackers written in C and Sepples by moon people. The fifth is that I suck at Haskell.

Still, at least it’s written in the world’s leading fictional programming language.


1 If we didn’t, the type signature would be String -> IO String, of course.

2 If you only want to check eight-character inputs, though, you can do something like ins = sequence . take 8 $ repeat allowedChars instead.

3 Copy/paste this post into a file named Tripfind.lhs, then compile it using ghc -lssl Tripfind.lhs

4 Slower than my C one almost by an order of magnitude, even after adjusting for the fact that my C one can use both of my processors. And since my C one is already slower than, say, Tripcode Explorer by an order of magnitude…

Permalink 3 Comments

Strange attractor

You know Sierpiński gaskets, right? I used one in my Christmas tree last December. They’re fractals created by taking a triangle, connecting the midpoints of the sides to divide it into four, removing the middle one, and then repeating that on the remaining triangles, ad infinitum (literally). They have an area of 0 and a Hausdorff dimension of log2(3).

There’s another, more interesting way of constructing them, though: take the three corner points of a triangle, and a random starting point x. Roll a three-sided die1 to select a random corner point, and mark the midpoint between that point and x. Then, this midpoint becomes x. Repeat forever.

It turns out the Sierpiński gasket is the attractor for this system. I’ve written a Python script to save you some paper and a large number of pencils.2 Here is the result I got after 10,000 iterations:

Sierpiński gasket attractor

To run the script, you’ll need to have the Python Imaging Library installed. It takes three optional arguments: the side of the triangle in pixels (defaults to 1,000), the number of iterations (default 10,000), and the output filename (default out.png).

Strange attractors are fun. Coming up: more of the same.

Edit: If you’d prefer something you can see on the screen to something that dumps to a file, this may interest you. You can change the WIDTH and HEIGHT #defines to your actual resolution if you like (or basically any value, really; it should produce an equilateral gasket for any realistic resolution, though it might not for odd ones, including ones that are higher than they are wide).
You’ll need (besides a C compiler) the Allegro libraries. If you’re using a Debian-based distro, the package is liballegro-dev, IIRC, or you can get them here.


1 Or a six-sided one where you divide the result by two, rounding down, if you like.

2 It’s not exactly the same thing: raster images don’t have an infinite resolution, IEEE floating point numbers don’t have infinite precision, and you (probably) don’t have the patience to let your computer run forever.3

3 If you do, consider cracking tripcodes instead.

Permalink 1 Comment

Forced indentation of Huffman encoding

Inspired by rmuser’s Youtube videos on information theory (and specifically the one about Huffman encoding), I wrote a Python script to calculate a Huffman encoding for text.

It reads input from stdin (preferably in ASCII), calculates a Huffman mapping, and shows it to you. It also calculates how long the text would be if encoded with that mapping, and how many bytes you’ve saved compared to ASCII1, which just uses a byte for each character, regardless of how frequently it’s used.

Here’s the result of running it on itself:

$ python huffman.py < huffman.py
Symbol	Freq	Encoding
' '	560	10
'e'	262	111
't'	138	1100
's'	134	1101
'r'	125	00001
'\n'	110	00011
'f'	105	00100
'n'	98	00110
'l'	96	00111
'a'	75	01100
'i'	72	01110
'o'	67	000000
'd'	61	000100
'.'	54	001010
'('	48	010000
')'	48	010001
'c'	46	010011
','	46	010100
'q'	45	010101
'm'	36	011011
'b'	35	011110
'u'	32	0000010
':'	30	0001010
'_'	26	0001011
'='	26	0010110
'y'	24	0010111
'p'	24	0100100
'h'	22	0101100
'g'	20	0101110
'['	11	01001011
'%'	11	01011010
'#'	11	01011011
'1'	10	01101000
'"'	10	01101001
'0'	9	01101010
"'"	9	01101011
']'	9	01111100
'F'	9	01111101
'\\'	8	000001110
'T'	5	010111100
'+'	5	010111101
'v'	5	010111110
'w'	5	010111111
'/'	4	0000011010
'R'	4	011111100
'3'	4	011111101
'x'	4	011111110
'k'	4	011111111
'I'	3	0100101000
'2'	3	0100101001
'j'	3	0100101010
'S'	3	0100101011
'>'	2	00000110000
'C'	2	00000110010
'A'	2	00000110011
'E'	2	00000111100
'B'	2	00000111101
'P'	2	00000111110
'H'	2	00000111111
'*'	1	000001100010
'!'	1	000001100011
'8'	1	000001101100
'-'	1	000001101101
'W'	1	000001101110
'L'	1	000001101111

Encoded message length: 12237 bits (1529.62 bytes)
This message contained 2634 characters. Huffman encoding saved 1104 bytes
compared to ASCII.

As you can see, the most frequently-used characters have the shortest encoding, while the rarest have the longest. I’m assuming that means it’s working the way it should.

Simple toy, but it beats paying attention in class.


1 If the input isn’t in ASCII, it should still come up with a correct mapping, but that last bit will be off by a bit.

Permalink 2 Comments

Incidentally

My blog now supports the use of tripcodes and secure tripcodes by commenters. The tripcode algorithm should be less broken than Shiichan’s while still not really entirely not being broken. The secure tripcode algorithm is crap and probably not portable.

The source for the plugin is here.

(I don’t approve of tripfaggery any more than the next guy. However, I have too much free time.)

Permalink 10 Comments

OFIOC

My copy of Real World Haskell still hasn’t arrived, so I’ve been making do with Learn You a Haskell, which doesn’t suck as much as you’d expect but obviously stops right before it gets to the important bits. I read most of YAHT at some point in the past, but it isn’t as useful.
My initial impression (that Haskell is elegant as shit in most places but has some warts (like typeclasses) and cops out for IO) is mostly confirmed, though unexpectedly, it turns out you can also do stuff with it.

To wit: tripcodes.
My reason for wanting to implement this has disappeared, so I probably won’t be using it to build a non-moonspeak equivalent to tripper+ or Tripcode Explorer, and it’s really only guaranteed to work for Shiichan tripcodes (whose implementation is broken for arguably negligible values of broken, but broken nonetheless), but as first attempts go, at least it’s a bit more interesting than hello, world or yet another factorial implementation.

It took longer than planned because it uses the Foreign Function Interface to call the C version of crypt, and it’s impossible to find a usable tutorial on how that works.
The upshot of this is that it sacrifices laziness and also isn’t guaranteed to work with every Haskell implementation, though it does work with GHC(i):

cairnarvon@feynman:~/code/haskell$ ghci -lcrypt Tripcode.hs
GHCi, version 6.8.2: http://www.haskell.org/ghc/  :&64; for help
Loading package base ... linking ... done.
Loading object (dynamic) crypt ... done
final link ... done
[1 of 1] Compiling Tripcode         ( Tripcode.hs, interpreted )
Ok, modules loaded: Tripcode.
*Tripcode> tripcode "1fXzap//"
Loading package old-locale-1.0.0.0 ... linking ... done.
Loading package old-time-1.0.0.0 ... linking ... done.
Loading package filepath-1.1.0.0 ... linking ... done.
Loading package directory-1.0.0.0 ... linking ... done.
Loading package random-1.0.0.0 ... linking ... done.
Loading package unix-2.3.0.0 ... linking ... done.
Loading package process-1.0.0.0 ... linking ... done.
Loading package array-0.1.0.0 ... linking ... done.
Loading package haskell98 ... linking ... done.
"MhMRSATORI"
*Tripcode> tripcode "tea"
"WokonZwxw2"
*Tripcode>

(I know my copy of GHC is out of date. Hurray for Debian.)

Don’t forget the -lcrypt flag to actually link the crypt library, and if you want to use this to actually write that tripper+ clone, feel free.

Edit: Vague testing indicates this is actually twenty times slower than tripper+ running under Wine (which is already much slower than many alternatives), so it’s probably not going to be useful as anything but a command-line tripcode converter.

Permalink 11 Comments

Χριστούγεννα

I gets presents! All of them came from ThinkGeek, because apparently nobody else ships to Belgium in less than three months.
I got an Einstein action figure, Tetris magnets (weakest magnets I’ve ever seen, though they’ll hold up a single piece of paper adequately), an RFID-blocking wallet thing, two of these (brain cell (which isn’t technically a microbe) and sleeping sickness), and these, which look really good on my copy of Origin but smell like death.

I was also supposed to get Real World Haskell, but apparently Barnes & Noble sucks and it hasn’t arrived “yet”. This only goes to confirm my suspicion that Haskell isn’t a real programming language, but rather a very elaborate running joke. You nearly had me with GHC, guys.

Anyway, I updated progscrape to address two nitpicking complaints and one marginally more serious one (though that post is mine), as well as fix the fact that it didn’t extract e-mails at all (despite claiming to).
The thread-parsing code could still do with a desucking, but I’m not going to do it.

The new code is here (diff), and since running that on your old database will cause all kinds of interesting errors, a new database is here (15.0 MB, 59.9 MB uncompressed). I could have written a script to just siphon data from the old database into the new one, but instead I rebuilt it from scratch so I could time how long it took:

real	123m19.681s
user	4m45.306s
sys	0m22.145s

That’s for 6,826 threads and 173,390 posts.

The main advantage is that it’s now possible to do things like

select * from posts where thread = 1228980536 and id = 50;

which used to require

select * from posts where thrid = 1228980536 order by time limit 1 offset 49;

which is considerably less secksy.

Ceterum censeo Rasmus Lerdorf is to be beaten to death with a hardcover copy of SICP.

Permalink 2 Comments

The Sieve of Eratosthenes

Prime numbers are ridiculously important in cryptography, and I recently found myself needing a way to generate them (for a toy implementation of the Diffie-Hellman key exchange algorithm). It keeps amazing me how few languages actually have libraries for generating prime numbers, or even just for primality testing.
Because I didn’t feel like thinking, and I didn’t need incredibly massive primes, I just wrote a quick implementation of the sieve of Eratosthenes, which is an interesting enough algorithm that I’d thought I’d share it (this is first-year stuff even for us, but I guess I have non-programmers in my audience).

The basic idea is that you start with a list of integers from 2 up to the number you want to find the largest prime smaller than. You then take the smallest number in this list (2), and cross off all multiples of this number (except for 2 itself). You repeat this for every other number, in order, and you’ll be left with just a list of primes.

Wikipedia has a visualisation which could be clearer but is still pretty good:

Sieve of Eratosthenes

The implementation (in Python) is dead simple:

def sieve1(n):
    l = range(n)
    l[0], l[1] = None, None
    for i in xrange(int(__import__('math').sqrt(n)) + 1):
        if l[i] is not None:
            j = i * 2
            while j < n:
                l[j] = None
                j += i
    return filter(None, l)

We could just use range(2, n) and then work with an offset, but the convenience of having the indices equal to the values at those indices is too useful to complain about the extra memory required for two additional integers. Note also the obvious optimisation of only going up to sqrt(n) + 1 (no apologies for the unorthodox import).
filter(None, l) gets rid of the Nones in our list. Passing None as the function argument just applies the identity function, which has the effect of getting rid of None, False, 0, and empty lists, tuples, and dictionaries (anything which would evaluate to False in an if statement).

This is nice, but it does have the disadvantage of having to generate the whole list every time you want more primes. You can’t just input a list of pregenerated primes and have it continue from there.
The fix is easy enough:

def sieve2(n, primes = [2, 3]):
    if n <= primes[-1]:
        return filter(lambda m: m <= n, primes)

    offset = primes[-1] + 1
    l = range(offset, n)

    max = int(__import__('math').sqrt(n))
    for p in primes:
        if p > max:
            break
        i = p * 2 - offset
        while i < 0:
            i += p
        while i < len(l):
            l[i] = None
            i += p

    for k in xrange(max - offset + 1):
        if l[k] is not None:
            i = k * 2 + offset
            while i < len(l):
                l[i] = None
                i += l[k]

    l = filter(None, l)
    primes.extend(l)

    return primes

This function uses memoisation, and takes advantage of a feature of Python’s function defaults which some believe to be a bug: if mutable default arguments are altered, they will remain altered for the next time the function is called.
This is a side-effect of the fact that functions have a single field that holds the default arguments (a tuple called func_defaults in the function’s namespace), which isn’t duplicated into a working copy when the function is called. Most default arguments will be things like integers and strings, which are immutable, so most people never even run into this. Being able to use it for memoisation is really handy, though. Alternatively, you could just use a global variable and access it from inside the function.

The algorithm is basically the same, except that if we’ve already generated the primes requested, we can just return them. Then we construct a list of all integers larger than our largest prime, and sieve it with our existing primes (now we really do have to work with an offset). Then we just sieve the rest classically. At the end, we add our newly sieved list to our old primes, and return that.
Next time we invoke the function, primes will still have the new primes. You can access them from outside the function at sieve2.func_defaults[0].

Evaluating sieve2(100) and then sieve2(1000) will be slower than just evaluating sieve1(1000), but it will be faster than evaluating sieve1(100) and then sieve1(1000). And if you happen to have a large number of pregenerated primes, you can save some more time.

The sieve of Eratosthenes is just one of a number of sieve-based prime number generators, but it’s the oldest one, and it compares favorably to modern improvements.

Permalink 2 Comments

/prog/scrape

Our power was out for eight hours this Thursday, so I spent most of the day being bored and drawing diagrams. One of those diagrams turned into this, which got me started on a project I’ve been meaning to get started on for a while now.
Of course, then I lost interest and it ended up just being this bit. The rest is just GUI twattery and crafting HTTP requests in ways that won’t trigger the auto-banners anyway.

What the thing I wrote (and which I guess I’ll call /prog/scrape) does now is open or create an SQLite database, pull subject.txt, compare subject.txt to the data in the database, load the thread pages for any thread that’s out of date, subject it to horrible and primitive pattern matching to pull the individual posts (and relevant metadata), and update the database.

The code is here (and should require no modules not present in a standard Python 2.5 install), but I strongly suggest you don’t try to run it. It will pull every single thing ever posted on /prog/, which takes something like two and a half hours.
Instead, download this (13.0 MB gzipped, 54.5 MB expanded), which is the SQLite database I built today. It’s up to date as of a few minutes ago, but to update it, just untar it into the same folder as the script (preserving the name prog.db), and run the script. It’ll find and use the existing database when determining what needs to be pulled.

Obviously this isn’t very useful by itself, but people interested in random statistics can have some fun with it.
To wit:

sqlite> select count(*) from posts;
168246
sqlite> select count(*) from (select distinct body from posts);
140893
sqlite> select count(*) from posts where author = 'Xarn';
28

Requires SQLite 3. Have fun.

(The scraper should work with any Shiichan board by just changing the variables at the top, but honestly, who really uses Shiichan?)

Edit (July 11, 2009): Yes, this thread broke it. Updated version deals with it (by ignoring broken threads altogether).

Edit (August 12, 2009): http://github.com/Cairnarvon/progscrape/tree/master

Permalink 13 Comments