Playing games is easy
People who take an active interest in AI are quite unlikely to have very many friends, so it should come as no surprise that trying to get computers to play games has always been a popular subfield of AI. Traditionally that game has mostly been chess, but I feel chess has a grinding tedium to it, so we’re going to look at tic-tac-toe instead, because that at least has the benefit of being over quickly.

So far in this series we’ve only seen beautiful, elegant algorithms with very nice provable properties. The one we’ll be discussing today isn’t one of these; in fact, it’s just a really stupid idea that happens to work because human players aren’t particularly bright either.
The idea is that we’re going to look at all possible moves from a given point in the game until the very end (or, more likely, up until a certain number of moves ahead), and select the best one. As such, our foray into heuristic search earlier will be reasonably helpful, but there are a few very crucial differences, the most important being, of course, our opponent. We could regard the game as a search tree and calculate the shortest path to a victorious state, but our opponent would be under no obligation to do his bit in his own defeat. In fact, it’s more reasonable to assume he will himself be picking the best possible move at every step as well, and to try to find a move that’s better than the others even when he does that.
Consider the following expart illustration of some game where at each point there are three possible moves, where we’re looking ahead two moves (or ply, if you’re Australian):

The values in the leaves are the values we get when we apply a certain heuristic function, which just looks at a given state and tell us how desirable it is, high values being good for us, and low ones are good for our opponent. In the blue nodes, it’s our turn to play. In the red ones, it’s our opponent’s.
We can assume that if we take the left-most branch, our opponent will then take the branch that ends up in the state with value 2, because that’s the most desirable to him0 (and least desirable to us). If we take the middle, he’ll probably take the one leading to 1. If we take the one on the right, we’ll end up in 1 again. Therefore, it makes sense that we’d want to take the left-most branch, because that will give us the best result, even though the middle branch has at least the potential of getting us into a state worth 8, which would be very good for us indeed.
This is looking suspiciously like an algorithm already! We generate the full tree, apply some evaluation at the leaves, and propagate those values upwards, alternately taking the minimum and the maximum of them, depending on whose turn it is. Maybe we’ll even get better results depending on how far we look ahead!
It sounds too dumb to work, but apparently it does. This alternating of minimising and maximising apparently isn’t the reason this algorithm is called minimax, but it should be.
Translating this to Python is easy enough. As usual, we first need our state representation and our rules of production. We’re just going to use a 3×3 list, each cell of which can be 'X', 'O', or ' '.
We’ll have two sets of rules, one for player X and one for player O. These could be tedious to define, because there are, of course, eighteen of them, but fortunately, Python is a pretty decent language.
> class Rule(): > def __init__(self, i, j, token): > self.i, self.j, self.token = i, j, token > > def can_apply(self, state): > return state[self.i][self.j] == ' ' > > def apply(self, state): > s = [row[:] for row in state] > s[self.i][self.j] = self.token > return s > > rules_p1 = [Rule(i, j, 'X') for i in range(3) for j in range(3)] > rules_p2 = [Rule(i, j, 'O') for i in range(3) for j in range(3)]
(In case that’s not obvious, i indicates the row, j the column, and token will be the symbol with which that spot will be marked.)
For convenience, and because we’ll need it later, we’ll also define a function that tells us if the game is over:
> def game_over(state): > rcd = state + map(list, zip(*state)) + [[state[i][i] for i in range(3)], > [state[2 - i][i] for i in range(3)]] > return 'O' if ['O', 'O', 'O'] in rcd else \ > 'X' if ['X', 'X', 'X'] in rcd else \ > ' ' if not any(map(lambda l: ' ' in l, state)) else \ > False
rcd is a list of the three rows and columns, and the two diagonals. The return value doesn’t just indicate whether the game is over, but also who won; 'O', 'X', and ' ' all evaluate to True in a conditional statement, of course.
Our starting state is easy:
> state = [[' ', ' ', ' '], [' ', ' ', ' '], [' ', ' ', ' ']]
Now for the actual minimax algorithm. The whole thing lives and dies by its heuristic function, of course, so we’re lucky our example is tic-tac-toe, where the following is sufficient:
> def h(state): > rcd = state + map(list, zip(*state)) + [[state[i][i] for i in range(3)], > [state[2 - i][i] for i in range(3)]] > > if ['O', 'O', 'O'] in rcd: > return 2 > > if ['X', 'X', 'X'] in rcd: > return -2 > > return 0
We could make it more complex (for example, a line with two Os and a blank could be worth 1, and one with two Xs and a blank -1), but with tic-tac-toe, that quickly devolves into just enumerating all possible situations, and this is actually quite good enough.
Notice that this function is written from the point of view of player O, because he’ll be our computer player. Player X is you.
Then the minimax function. We won’t be constructing the whole tree (though we could for tic-tac-toe; it’s at most nine moves deep, after all), so we’ll set a depth beyond which we won’t look:
> max_depth = 6
(How deep is good enough? Experiment. For tic-tac-toe, a depth of 9 will explore the entire possible game tree, so 6 is pretty deep already.)
We’ll define the whole thing as two mutually recursive functions, one for maximising and one for minimising. These will return a rule and a heuristic value of a state; this value is (eventually) the value of the state at the leaves, of course, and the rule will be the first rule applied to eventually get to that state (because that’s what we’re looking for, of course).
> def minimax(state): > def maxmove(state, depth): > if depth == max_depth or game_over(state): > return None, h(state) > > best_rule, best_state = None, None > for rule in rules_p2: > if rule.can_apply(state): > _, s = minmove(rule.apply(state), depth + 1) > if best_state is None or best_state < s: > best_rule, best_state = rule, s > return best_rule, best_state > > def minmove(state, depth): > if depth == max_depth or game_over(state): > return None, h(state) > > best_rule, best_state = None, None > for rule in rules_p1: > if rule.can_apply(state): > _, s = maxmove(rule.apply(state), depth + 1) > if best_state is None or best_state > s: > best_rule, best_state = rule, s > return best_rule, best_state > > return maxmove(state, 0)
This may look a bit convoluted, but it’s more straightforward than it seems.
Once we have this, it’s just a matter of wrapping the whole thing into a game, which I’ve done here. This uses pygame, which you may already have installed. If you don’t, it’s the python-pygame package in the Debian/Ubanto repositories.
It goes without saying that some improvements can be made to this. A popular one is alpha-beta pruning, which is based on the observation that in the following situation (where the red nodes are minimising again, and blue ones maximising), it doesn’t matter what the values of nodes A and B are, the value at β is always going to be 1 or less, so the value at α will always be 2.

Adapting our algorithm to keep this in mind isn’t that hard, but also not particularly worth the effort because the tic-tac-toe tree isn’t that big anyway. For a game like chess, on the other hand, where there’s an average of about fifteen possible moves at each layer and a game is on average eighty moves long, anything that can pare down the size of the tree is quite welcome.
The algorithm runs in exponential time regardless, though, so a game like go, with two hundred possible moves every time (and an average game length of three hundred moves) is quite beyond pure minimax. This is one reason why you’ll probably be able to beat your computer at go but not at chess.
Incidentally, it was (a slightly tweaked1 version of) this algorithm that defeated Kasparov in 1997. Competitive chess poses another interesting issue, because there’s a time limit on how long a player can think before making his move. How do you tackle this?
Why, iterative deepening! You start off with a max_depth of 1 (or some other suitably low number which you can be quite confident will finish before the time limit is up), and then you keep increasing it by 1 until you run out of time. Turns out you only need to look ahead maybe twelve moves to beat the best human chess players in the world.
The minimax algorithm is designed for games with two players working against each other, perfect knowledge of the game state, and no element of chance, but it’s obvious that it can be adapted relatively easily to other games, with varying success. For more players and separate teams, just add more layers, and perhaps separate heuristics. For chance, multiply in the odds. Imperfect knowledge usually reduces to simple chance, though it can be hard to quantify adequately.
What you’ll usually see is that computers have the advantage in games that are closest to tic-tac-toe and chess, whereas humans have the advantage in softer games like bridge and poker, or in games where the search space is simply too large, like the aforementioned go.
Sadly, none of it is as interesting as it should be.
0 We’re assuming that our opponent is competent and wants to win. This may not always be warranted, and an idiot playing randomly can sometimes confound careful strategy, as anyone who’s ever played chess with children can attest.
1 One thing that was taken into account was the so-called horizon problem. This is the issue where one situation at the maximum depth might seem more palatable than another, even though one or two moves further down it leads to inevitable catastrophe. This can be tackled through heuristic continuation, where you have a second heuristic function that determines whether it would be better to keep looking beyond the soft depth limit. Obviously this makes the algorithm less predictable with regards to time it will take to come up with an answer.
Leah said,
April 6th, 2010 at 9:36 pm
Where’s the multithreaded progscrape patch?
Cairnarvon said,
April 6th, 2010 at 10:14 pm
Maybe later. I mostly played with the JSON interface for a bit instead of multithreading, but it’s actually slower than the normal way if you want to verify tripcodes.
If you bug MrVacBob-chan into making the JSON interface unambiguous, I’ll add JSON support to the main progscrape branch, and then I’ll probably end up adding multithreading as well, just because.
Even though it’s technically bad manners and not likely to be particularly useful, because you only scrape the whole thing once anyway.
Anonymous said,
April 9th, 2010 at 7:11 am
You should have mentioned Apu Nahasapeemapetilan. He wrote the world’s first tic-tac-toe program, only the top human players could beat it.
Knuth said,
April 12th, 2010 at 7:34 am
Pop culture references are beneath [b]Xarn[/b].
Robert said,
July 8th, 2010 at 6:05 am
Interesting news when it is boring, it is possible to play in tic-tac-toe, in general algorithm clear. Thanks!