OpenTafl 2020: New Tree Search Horizons

That’s right, I’m getting back to a project I’ve left alone for a while: OpenTafl, my hnefatafl engine project. The only one of its kind, it is both a host for other, yet-unwritten tafl bots, and a tafl bot in itself.

Why the return after all this time? Well, two reasons. First, I had to move the whole project to Github recently, because Bitbucket is shutting down its Mercurial repositories. Second, I had a bit of a brain blast on how to do a particular kind of AI project I’ve been wanting to try for a while.

OpenTafl’s current AI uses an alpha-beta pruning search, which is only a hop, skip, and jump away from the original, mathematician’s tree search algorithm, minimax. In simple terms, alpha-beta pruning minimax plays out all possible variations to a certain depth, assuming each player plays optimally, and picks the move which leads to the best outcome. The pruning bit skips searching nodes which are provably less optimal.

Of course, knowing what the optimal move is depends on one of two things: searching the game tree all the way to the end, or searching to a lesser depth and evaluating the leaf nodes in the tree. The former is impossible for reasons of computational power, the latter is logically impossible¹. Evaluation functions, as we call them, are always imperfect, and require an awful lot of domain knowledge to do well.

Because tafl is a poorly-studied game, there isn’t a lot of domain knowledge to encode into an evaluation function, which has always limited OpenTafl’s potential somewhat. There are further downsides to the alpha-beta search it uses, too, in particular that it can’t readily be multi-threaded². So, what’s the answer?

Well, at least potentially, Monte Carlo tree search. Popular among go AIs (and used as the framework for DeepMind’s efforts in computer players for board games), the secret to MCTS is a bit of randomness and a preference for exploring interesting lines of play. Start at the root of the game tree, navigate through the nodes you’ve already seen. When you find a leaf node (that is, one with no children), you generate its children, then play a random game until someone wins or loses. At each tree node, track the win/loss ratio for the tree, and use Mathematics™ to guide your root-to-leaf exploration in future iterations.

Simple! Of course, tafl poses some unique obstacles to MCTS, as Tuireann of PlayTaflOnline.com discovered. The biggest issue is that random moves in tafl are very, very unlikely to do anything of interest—tafl branching factors are higher than branching factors in, say, chess, and there’s no space pressure like there is in go. (That is to say, if you keep making random moves in go, the game eventually ends.)

Tafl MCTS artificial intelligences need some way to guide the playout process (the process of playing to the end of a game). The modern approach for this is to either train a neural network on a large corpus of existing high-level games (tafl doesn’t have one), or train a neural network by playing it against itself for a long time (which I don’t have the budget for). Given those constraints, I set about inventing a measure which would permit me to make random-ish moves which nevertheless move the game forward.

I’m calling the measure King Distance to Victory, or KDV; it’s inspired by part of the evaluation function from J.A.R.L., one of the entrants to the 2016 OpenTafl Tafl Open (which is on the calendar again for 2020, AI programmers!). For a given game state, the KDV is the shortest straight-line distance from the king’s current position to a victory space, counting spaces occupied by allies as 2 and spaces occupied by enemies as 3 or 4. The defender’s goal is to bring KDV to zero. The attacker’s goal is to maximize the average KDV and minimize the variation in the KDV values³.

It isn’t a perfect measure, but it isn’t meant to be—rather, it’s a measure to push playouts toward states which end the game. On that ground, I think it’ll be successful. I also hope to write a quick classical evaluation function which uses the KDV measure exclusively, to see how it plays on its own, without the MCTS magic behind it.

More news to come as it is made.

This proof is left as a trivial exercise for the reader. ↩
To prove which nodes can be ignored, alpha-beta search has to evaluate them in order, for a given value of ‘order’. ↩
This helps encapsulate the idea that the attackers should aim to surround the king and close the noose, rather than simply get in the way of his best move. Note as well the implication that each state has multiple KDVs: an edge-escape game has up to four (the king moves straight from his current position to each edge), and a corner-escape game has up to eight (the king moves straight from his position to an edge space, then from the edge space to one of two corners). ↩

The Soapbox

Opinions may vary

OpenTafl 2020: New Tree Search Horizons

Related

Leave a Reply Cancel reply

Share:

Related

Leave a Reply Cancel reply