Rosio Pavoris a blog

Statistics and shitty graphs

gnuplot is nice when you’re trying to do something it was designed to do, but kind of painful when you aren’t. Anyway, I made some graphs out of visitor data collected since July last year (all of this is just my blog, not rotahall.org as a whole).
The X axis is always time, with the far left being July 2008 and the far right being March 2009 as of today. The Y axis is the percentage of people using a given thing, so that the whole length adds up to 100%. The white area is always unknown and/or others.

The first is the operating systems used.
The white is “unknown”. In reality it’s mostly phone browsers and spiders, but fuck it, I want an ethernet-enabled typewriter. Red is all of the BSDs combined. There seems to be exactly one person using FreeBSD, OpenBSD, and NetBSD.
The number of Windows users is higher than expected. I blame idiots googling for pictures of axolotls.

The second is the distribution of Windows versions within the Windows users group. I expected more people to use XP than Vista, but it’s nice to see Vista isn’t even growing much anymore. There are a few people still using Windows 98 (that’s the red outline right above 2000) and 95. I’m pretty sure the one guy using Windows 3.1 is spoofing it, though.
I didn’t make one of these for the Linux users, because most of them end up being “other distro”. There are suprisingly few Ubanto users, though.

Next is browsers. I’m not sure what “Mozilla” is in this context, and I really don’t know what Netscape is doing there. Terras thinks people from 1996 are reading my blog, so: hi! Sell your stock in 2000, and don’t vote for Bush!

Finally, IE versions used by the IE users. This is abysmal. I’ve stopped showing content to IEs older than 7, so with luck they’ll go away. Again, though, I blame random googlers, not regular readers.
The bit of blue at the bottom is IE 8. I’m surprised how early they started coming, since the beta was only released earlier this month. Spoofed user agents, probably.

Anyway, just in case anyone’s interested. Making these was fun, but too much effort to turn it into a regular thing.

Permalink 7 Comments

Strange attractor

You know Sierpiński gaskets, right? I used one in my Christmas tree last December. They’re fractals created by taking a triangle, connecting the midpoints of the sides to divide it into four, removing the middle one, and then repeating that on the remaining triangles, ad infinitum (literally). They have an area of 0 and a Hausdorff dimension of log2(3).

There’s another, more interesting way of constructing them, though: take the three corner points of a triangle, and a random starting point x. Roll a three-sided die1 to select a random corner point, and mark the midpoint between that point and x. Then, this midpoint becomes x. Repeat forever.

It turns out the Sierpiński gasket is the attractor for this system. I’ve written a Python script to save you some paper and a large number of pencils.2 Here is the result I got after 10,000 iterations:

Sierpiński gasket attractor

To run the script, you’ll need to have the Python Imaging Library installed. It takes three optional arguments: the side of the triangle in pixels (defaults to 1,000), the number of iterations (default 10,000), and the output filename (default out.png).

Strange attractors are fun. Coming up: more of the same.

Edit: If you’d prefer something you can see on the screen to something that dumps to a file, this may interest you. You can change the WIDTH and HEIGHT #defines to your actual resolution if you like (or basically any value, really; it should produce an equilateral gasket for any realistic resolution, though it might not for odd ones, including ones that are higher than they are wide).
You’ll need (besides a C compiler) the Allegro libraries. If you’re using a Debian-based distro, the package is liballegro-dev, IIRC, or you can get them here.


1 Or a six-sided one where you divide the result by two, rounding down, if you like.

2 It’s not exactly the same thing: raster images don’t have an infinite resolution, IEEE floating point numbers don’t have infinite precision, and you (probably) don’t have the patience to let your computer run forever.3

3 If you do, consider cracking tripcodes instead.

Permalink 1 Comment

[.∴( ・)∴.( ・)∴]

My keyboard is broken. It’s hard to reform þe ſpelling of þe Eŋliʃ laŋguage wiþout a workiŋ Alt Gr kī.

(It’s even harder to write code in a C-like language, because AZERTY is retarded.)

Permalink Comments

Forced indentation of Huffman encoding

Inspired by rmuser’s Youtube videos on information theory (and specifically the one about Huffman encoding), I wrote a Python script to calculate a Huffman encoding for text.

It reads input from stdin (preferably in ASCII), calculates a Huffman mapping, and shows it to you. It also calculates how long the text would be if encoded with that mapping, and how many bytes you’ve saved compared to ASCII1, which just uses a byte for each character, regardless of how frequently it’s used.

Here’s the result of running it on itself:

$ python huffman.py < huffman.py
Symbol	Freq	Encoding
' '	560	10
'e'	262	111
't'	138	1100
's'	134	1101
'r'	125	00001
'\n'	110	00011
'f'	105	00100
'n'	98	00110
'l'	96	00111
'a'	75	01100
'i'	72	01110
'o'	67	000000
'd'	61	000100
'.'	54	001010
'('	48	010000
')'	48	010001
'c'	46	010011
','	46	010100
'q'	45	010101
'm'	36	011011
'b'	35	011110
'u'	32	0000010
':'	30	0001010
'_'	26	0001011
'='	26	0010110
'y'	24	0010111
'p'	24	0100100
'h'	22	0101100
'g'	20	0101110
'['	11	01001011
'%'	11	01011010
'#'	11	01011011
'1'	10	01101000
'"'	10	01101001
'0'	9	01101010
"'"	9	01101011
']'	9	01111100
'F'	9	01111101
'\\'	8	000001110
'T'	5	010111100
'+'	5	010111101
'v'	5	010111110
'w'	5	010111111
'/'	4	0000011010
'R'	4	011111100
'3'	4	011111101
'x'	4	011111110
'k'	4	011111111
'I'	3	0100101000
'2'	3	0100101001
'j'	3	0100101010
'S'	3	0100101011
'>'	2	00000110000
'C'	2	00000110010
'A'	2	00000110011
'E'	2	00000111100
'B'	2	00000111101
'P'	2	00000111110
'H'	2	00000111111
'*'	1	000001100010
'!'	1	000001100011
'8'	1	000001101100
'-'	1	000001101101
'W'	1	000001101110
'L'	1	000001101111

Encoded message length: 12237 bits (1529.62 bytes)
This message contained 2634 characters. Huffman encoding saved 1104 bytes
compared to ASCII.

As you can see, the most frequently-used characters have the shortest encoding, while the rarest have the longest. I’m assuming that means it’s working the way it should.

Simple toy, but it beats paying attention in class.


1 If the input isn’t in ASCII, it should still come up with a correct mapping, but that last bit will be off by a bit.

Permalink 2 Comments