My last tafl post was deeply theoretical. Very shortly, this one is going to take a turn for the deeply practical. First off, though, there’s some theory housekeeping I need to handle. Because the Copenhagen and Fetlar variants are both difficult AI problems, and because Copenhagen has some additional rules which are computationally hard to check, I decided to start with brandub, a much simpler tafl variant. Brandub comes from Ireland, and is played on a 7×7 board. The king’s side has five pieces, and the besieging side has eight, arranged in a cross. The king participates in captures, and is captured by two besieging pieces instead of four. He escapes to the corners. Any piece may cross the center square, but only the king stops on it.
Computationally, brandub is in a much easier class than the 11×11 variants I’ve discussed. The upper bound on its state space complexity is a mere 4.4 x 1015, using the estimation technique I described in the previous post; for comparison, checkers1 is on the order of 1020. Its game tree complexity is at least 1.3 x 1036, which is also not overwhelmingly distant from checkers’ complexity (though harder—tafl’s state space complexity is frequently lower than comparable games, but its game tree complexity is frequently higher). This makes it a good place to start—for one, it means that it doesn’t matter how slow my code runs.
So, to kick things off, I started with what the business calls GOFAI—good old fashioned AI—techniques. The simple version is this: you play every possible move from your current position to a few moves out, and pick the best outcome. It ends up being a little more complex than that, though. If you’re not curious, you can skip the next few paragraphs.
The particular GOFAI technique I’m using right now is a straight implementation of the minimax algorithm. After generating a game tree from the current state to a given search depth, the algorithm evaluates the tree’s leaf nodes. An evaluation is a single number, describing whether the state is good for the besieging side (nearer positive infinity) or good for the defending side (nearer negative infinity). The details aren’t important just yet, although I’ll get into them later. Anyway, the algorithm is predicated on the assumption that both players are perfect—they’ll always make the best move they can make, given the situation. It follows, therefore, that the best move at a certain depth is the one which leads to the best state for him: the most negative for the defender, the most positive for the besieger.
As an example, with the besieger to move and a search depth of three, there are four levels to the tree2:
1. A position with the besieger to move.
2. All of the positions, defender to move, resulting from every one of the besieger’s possible moves.
3. All of the positions, besieger to move, resulting from the defender’s possible responses to those moves.
4. All of the positions, defender to move, resulting from the besieger’s responses to the defender’s responses.
The algorithm starts at the tree’s leaves, the nodes at the fourth level above, and runs the evaluation function. It then moves up a level. The besieger is to move, and all of his possible moves have a child state which has an evaluation attached to them. The besieger picks the move which yields the best outcome for him, which is also the worst outcome for the defender. (Although the tafl board is asymmetric, the game, in the theoretical sense, is not—a good state for the defending side is a bad state for the besieging side, and vice versa.) The value of each node at the third level becomes the value of its best child (for the besieger). The algorithm then moves up to the second level. The defender is to move, and he has a set of moves and child states with evaluations attached, as we just derived. He picks the best outcome for himself, which is also the worst outcome for the besieger. The value of each node at the second level becomes the value of its best child (for the defender). The algorithm then moves up a level, and, at the first level, the besieger has a list of moves and child states with evaluations attached to them. He chooses the best move out of his options, and that’s the move he makes.
As you might have guessed, if you had a computer of infinite power, you could just explore the game tree until you found a win, loss, or draw for every sequence of moves from your starting state. In fact, we call games for which this is possible3 solved. The evaluation function for an infinitely powerful computer is extremely simple:
1. Has the besieger won in this state? Return positive infinity.
2. Has the defender won in this state? Return negative infinity.
3. Has neither of the above happened? Return zero.
4. Ignore draws for the purposes of this example, or declare them a win for one side, or something.
The algorithm will explore until every branch reaches an end state, then work its way up through the tree. Since it assumes perfect play, it will provide a sequence of moves which yields a victory for one side from the current position, and there’s nothing the other side can do about it—the whole game becomes inevitable. Perhaps fortunately, this is extremely hard for most games; tic-tac-toe is somewhat less thrilling now that it’s solved.
It also reveals a weakness in, or at least a caveat with, the minimax algorithm: since we do not have computers of infinite power (in fact, my current implementation is able to look ahead only one ply further than the example I gave above), its playing ability depends either on achieving deep lookahead to compete with humans, who can look ahead much more selectively than computers can, or on having a good evaluation function. I haven’t really attained either goal yet.
I see that I’ve already written about a thousand words, and I haven’t even made it out of the first point on my supposed outline for this post. I suppose I’ll just have to write a couple of ’em. Next time I’ll get into the practical considerations in implementing an AI algorithm like this. Probably the time after that, I’ll write about evaluation functions, and perhaps after that I’ll write about some improved algorithms. The latter requires that I actually implement them, though, so we’ll see how the timing works out.
1. 8×8 draughts, if you aren’t an American.
2. The term of art is ‘plies’, singular ‘ply’—since a response to a move is called a reply, clearly a single move must just be a ply. (Math and computer people, am I right?)
3. Wikipedia has some interesting reading on what it actually means for a game to be solved—the definition I’ve given is what’s referred to as a ‘strong’ solution, where a computer has brute-forced the entire game tree. There are two other kinds of solved games: those solved ‘ultra-weakly’, which is to say, by abstract mathematical reasoning, and those solved ‘weakly’, by some algorithm which does not explore the entire tree.
Pingback: Many Words » Monday update