Heuristics are easy
(This post assumes you read the previous one.)
Today we’ll be looking at the hill climbing algorithm, which is just a plain old depth-first search with heuristics added.
“Heuristics” is a fancy word (from the Greek εὑρίσκω, “I discover”) for a very simple concept. In the context of search trees, it simply means that at a every node, you’re going to look at each possible branch, and take the one that looks the most promising first, instead of just one at random. “Most promising” can be a tricky concept, though.0
Our river-crossing example isn’t necessarily the best one to demonstrate the concept, so let’s go with another classic: the 8 puzzle.

You get these puzzles for free with breakfast cereal or as prizes in crappy raffles, usually, and you’ve probably come across them in bigger versions, but eight is quite enough for our purposes. There’s one empty square, and the goal is to unjumble the picture by sliding the tiles around, so you get this:

Let’s make an assumption that we know of each tile where it’s supposed to go (something that isn’t necessarily obvious with real puzzles), and number them, so our problem becomes tenable to a computer. We’ll give the empty square the number 0, so our initial and goal states become, respectively:

Then we can tackle this like the last time, with a search tree. We’ll just represent our state as a 3×3 matrix.
> start = [[4, 3, 0], [8, 1, 5], [2, 7, 6]] > goal = [[1, 2, 3], [4, 5, 6], [7, 8, 0]]
And because of the type of work you can do on this representation, let’s immediately define a function that finds the place of a certain tile too.
> def locate(state, n): > for i in range(3): > for j in range(3): > if state[i][j] == n: > return i, j
Our rules of production are easy to specify if we don’t see it as moving tiles around (which would require 32 separate rules), but as moving the empty square around (which only requires four). The only restriction, of course, is the edge of the board.
> class Rule0(): > """Move left""" > num = 0 > > def apply(this, state): > x, y = locate(state, 0) > s = deepcopy(state) > > s[x][y] = s[x][y - 1] > s[x][y - 1] = 0 > > return s > > def can_apply(this, state): > x, y = locate(state, 0) > return y > 0 > > class Rule1(): > """Move up""" > num = 1 > > def apply(this, state): > x, y = locate(state, 0) > s = deepcopy(state) > > s[x][y] = s[x - 1][y] > s[x - 1][y] = 0 > > return s > > def can_apply(this, state): > x, y = locate(state, 0) > return x > 0 > > class Rule2(): > """Move right""" > num = 2 > > def apply(this, state): > x, y = locate(state, 0) > s = deepcopy(state) > > s[x][y] = s[x][y + 1] > s[x][y + 1] = 0 > > return s > > def can_apply(this, state): > x, y = locate(state, 0) > return y < 2 > > class Rule3(): > """Move down""" > num = 3 > > def apply(this, state): > x, y = locate(state, 0) > s = deepcopy(state) > > s[x][y] = s[x + 1][y] > s[x + 1][y] = 0 > > return s > > def can_apply(this, state): > x, y = locate(state, 0) > return x < 2 > > rules = [Rule0(), Rule1(), Rule2(), Rule3()]
(Our code is complicated slightly by Python’s approach to lists, which is what the deepcopy is for. It’s from the copy module.)
Now we can just do everything the same way we did last time.
> queue = [([], start)] > visited = [start] > steps = 0
Note that we’re keeping track of all visited states again, even though our search space is a lot larger. This is just to keep things simple, and not because it’s necessarily a good idea. Modern computers will be able to hold all of the search space in memory easily (it’s 9! / 2 states1, which translates to about eleven megabytes in our representation (unless Python wastes even more memory than I think it does, which it might)), but the point of heuristics is to limit the search space we’ll actually traverse as much as possible, so it will be using a lot less.
Now for our main loop. Everything will mostly be the same here, except for one crucial step:
> while len(queue) > 0: > path, state = queue[0] > new_nodes = [] > > for rule in rules: > if rule.can_apply(state): > n_state, n_path = rule.apply(state), path + [rule.num] > > if n_state == goal: > print "Solution found in %d iterations:" % steps, n_path > exit(0) > > if not any([n_state == s for s in visited]): > new_nodes.append((n_path, n_state)) > visited.append(n_state)
The crucial difference is the order in which our nodes will be added to the queue. The idea is to take the most promising branches first, so we’ll be sorting them according to some function h (our heuristic function), which will take a state, and return an estimate of how many nodes will need to be traversed before we come to a solution from that state.
Let’s assume we’ve already defined such a function, and finish up with our code, to get it out of the way.2
> queue = sorted(new_nodes, lambda a, b: h(a[1]) - h(b[1])) + queue[1:] > steps += 1 > > print "No solution found!"
As you can see, everything else is exactly the same.
Coming up with a good heuristic function can be tricky. However, it doesn’t need to be perfectly accurate, and it’s alright if it’s completely wrong from time to time. It won’t affect the completeness of our algorithm (because no nodes are actually dropped), just the performance. If we wanted to, we could just have a function that returns the same value for every state:
> h = lambda _: 0
This would have the effect of turning our hill climbing algorithm into the vanilla depth-first search we saw last time. It would also take ages to find a solution.
A better heuristic might be to count how many tiles are in the wrong place. An even better one than that could look at each tile in the state, and calculate how many places it is removed from its position in the goal state. For instance, in our starting state, the tile 2 would need to move three steps:

Notice that because it’s moving on a grid, the shortest distance is really easy to calculate: the difference in the x coordinates plus the difference in the y coordinates. This is known as the Manhattan distance, because the streetplan of Manhattan is also a grid, courtesy of the Commissioners’ Plan of 1811.3
If we add the Manhattan distance of every tile in a state, we will have a minimum estimate of the number of moves we need to take before we get to the final state, so this sounds like a relevant heuristic function.
> def h(state): > d = 0 > > for n in range(1, 9): > x1, y1 = locate(state, n) > x2, y2 = locate(goal, n) > d += abs(x1 - x2) + abs(y1 - y2) > > return d
Putting that in our source at some point before the function is called gives us our complete program, and running that we find a solution (in 28 moves) after examining 27 nodes. We can run it without the heuristics, but I started doing that an hour ago and it still hasn’t finished, so we’ve clearly improved the algorithm already.4 Success!
As you can guess, picking a good heuristic function is the difficult part here, and it requires that you understand your problem at least reasonably well.
The h function can be as complicated as you like. A very accurate function, for example, would calculate the entire search tree starting from your given state, find all paths, and return the length of the shortest one. As you can imagine, that wouldn’t actually make our algorithm any faster. There’s often a trade-off to be made between speed and accuracy of the h function, and the goal is, of course, to increase the speed of the overall algorithm. It can be tricky.
There are countless variations you can play with for this sort of search algorithm. Sorting the whole queue rather than just the new nodes turns hill climbing into greedy search (you want to keep track of heuristic values you’ve calculated before for that one, obviously, particularly if your h function is expensive). Limiting the number of new nodes added by sorting first and just dropping the worst estimates will give you beam search, though that one obviously sacrifices completeness.
Now, we may have found it a lot more quickly, but is our solution of 27 moves the best solution? Mathematicians tell us any soluble eight-puzzle can be solved in 31 moves or fewer, so it might be, but can we use heuristics to guarantee an optimal path? Yes, turns out. But that’s something for another post.
Bonus: I’ve written a simple implementation of the game. Just compile that (you’ll need Allegro, as usual), put the image in the same folder as your executable, and run. It’ll output the starting configuration to stdout, if you want to try feeding that to the Python script instead of solving it yourself. Use the arrow keys to play.
If you pass it a filename as a command line argument, it’ll use that as the puzzle image instead (BMP, LBM, PCX, and TGA files only). If you want harder puzzles, changing the SIZE #define controls that. TILE_SIZE is the size (in pixels) of a single tile, so make sure the images you feed it are TILE_SIZE * SIZE by TILE_SIZE * SIZE. Have fun.
0 In a loose sense we already did this last time, in our river problem, when we just ignored any branches that brought us back to a state we’d already seen before. We basically gave those a lower heuristic value than the other branches, though instead of moving them to the back of the queue, we just dropped them entirely.
1 9! because that’s the total number of permutations of nine tiles, of course, and divided by 2 because Johnson and Story showed in 1879 that half of all states is insoluble without physically breaking the puzzle.
2 I know here are a lot of inefficiencies in this code. As always, my goal was clarity and concision, not not-sucking.
3 Another term for it is taxicab distance, because cities without grids for streetplans never have taxicabs.
4 Because this is a really deep but very narrow tree, you might expect a breadth-first search to do better than a depth-first one. You’d be right. A BFS found the optimal solution in twelve moves after looking at 803 nodes. That’s O(bd) versus O(bm) for you.
Anonymous said,
March 2nd, 2010 at 7:49 pm
Really enjoying these posts. Thanks.
AsianBorat said,
January 1st, 2012 at 9:00 am
queue = sorted(new_nodes, lambda a, b: h(a[1]) – h(b[1])) + queue[1:]
How in hell does this line work? I read up on lambdas and sorted (I’m unfamiliar with Python), but I can’t make sense of this line.
Cairnarvon said,
January 1st, 2012 at 9:24 pm
The second argument of
sortedis a function that takes as its arguments two elements of the list to be sorted, and returns a negative number if the first is smaller than the second (according to some relevant ordering), zero if they’re equal, and a positive number if the first element is greater than the second.sortedreturns a new list, which is then joined to the tail of the queue (that is, all of the queue except for the first element; which is obtained using a list slice).Longer way to write this would be:
Hope that clarifies.