This post will cover OpenTafl AI changes since the last post I wrote on the topic, further back in the v0.4.x releases. First, bugs!
Let’s do some quick recap. OpenTafl’s AI is a standard, deterministic1 tree-searching AI, using iterative deepening. That means that OpenTafl searches the whole tree2 to depth 1, then to depth 2, then depth 3, and so on, until it no longer has enough time to finish a full search3.
You may have noticed, if you’ve used OpenTafl, that searching to depth n+1 takes a longer than searching to depth n, and that it’s altogether possible that the process above might leave OpenTafl with a lot of time left over. I did not fail to notice this, and I implemented a couple of additional searches (extension searches, as they’re known) to help fill in this time.
The first, I refer to as continuation search. Continuation search takes the existing tree and starts a new search at the root node, using the work already done and searching to depth n+1. Obviously, continuation search doesn’t expect to finish that search, but it will reach some new nodes and provide us some new information. After continuation search, OpenTafl does what I call a horizon search: it finds the leaf nodes corresponding to the current best-known children of the root node, then runs normal searches starting with the leaf nodes, to verify that there aren’t terrible consequences to a certain move lurking just behind the search horizon.
These are fairly easy concepts to understand, my poor explanations notwithstanding. The bugs I referred to in the title are more insidious. They come down to a much more complicated concept: what conditions must the children of a node meet for that node’s evaluation to be valid?
In the iterative deepening phase of the search, the answer doesn’t matter. Remember, OpenTafl discards any tree it doesn’t finish. When we’re doing extension searches, though, we don’t have that luxury. OpenTafl must be able to discern when a certain node has not been fully searched. I added a flag value to the search to note that a certain node has been left intentionally unvalued, which gets set whenever we have to abandon a search because we’ve run out of time. If a node did not exist in the tree prior to the extension search, and it has unvalued children, then it is also unvalued. If a node did exist in the tree prior to its extension search and it has unvalued children, this is okay! We ignore the unvalued children and use the information we’ve gained4. If an unvalued node is left in the tree after those steps, we ignore its value. Any unvalued node is misleading, and we should avoid using its value when deciding how to play.
This issue led to poor play, as both horizon and continuation search had a chance to introduce bad data into the tree. I finally tracked it down and fixed it in v0.4.4.6b.
After that, I came across another few bugs, lesser in severity but still quite bad for OpenTafl’s play: when evaluating a draw position for the attackers, OpenTafl would incorrectly view it as more defender-favorable than it should have been5. OpenTafl also had some trouble with repetitions, incorrectly failing to increment the repetitions table in some search situations. That’s one of the more important gains over v0.4.4.7b—v0.4.5.0b is absolutely incisive in playing out repetitions, as some of the players at playtaflonline.com discovered after the update.
Finally, a few minor time usage bugs are no longer present, although there are some headscratchers where the AI seems to lose half a second or so to some task I cannot locate, and some task it does not count when doing its time use accounting.
That about wraps up bugs. Features, as usual, are more fun.
First, OpenTafl now is willing to play for a draw in rare circumstances. If its evaluation tilts overwhelmingly toward the other side, and it sees a draw in its search tree, it evaluates the draw poorly, but better than a loss.
That depends on the second feature, which is an improved evaluation function. Rather than guess, I decided to be scientific about it: I built four OpenTafl variants, each with a certain evaluation coefficient raised above the rest. Those variants played each other in a battle royale, and based on the outcome, I picked new coefficients. They differ by size; 7×7 boards consider material more heavily, while larger boards prefer to play positionally6.
Positional play comes from the last and most important feature: piece square tables. Credit for the idea goes to Andreas Persson (on Twitter @apgamesdev), who linked me to the chessprogramming wiki article, and also provided a first pass at the tables.
I should back up a bit first, though. Piece square tables are descriptively-named tables which assign a value to having a certain piece type on a certain space. For instance, the space diagonally adjacent to a corner space in a corner-escape game is very important for the besiegers. That space gets a high positive coefficient. On the other hand, the spaces directly adjacent to the corners are quite bad for the attackers, and get a moderately large negative coefficient. OpenTafl’s evaluator handles the exact values.
The benefits of this approach are manifold: not only does OpenTafl know when the opponent is building a good shape, it now has a sense for position in a global sense. (It previously had some sense of position relative to other pieces, but that was not sufficient.) Because of this, it is much better now at picking moves which serve more than one purpose. If it can strengthen its shape by making a capture, it’ll do so. If it can weaken its opponent’s shape, so much the better. The code which generates the piece square tables can be found here7.
The outcome is most pleasing. I can no longer easily beat OpenTafl on 11×11 corner escape boards, and games in that family are presently the most popular in online play. Equal-time matches are all but a lost cause, and I have to engage my brain in a way I never did before if I allow myself more thinking time. Now, I am not all that good a player, and those who are better than me still handle OpenTafl pretty roughly, but it now plays at a low-intermediate level. Given that it barely even existed twelve months ago, I’d say that’s good progress.
- Mostly. ↩
- Kind of. ↩
- More or less. For being mostly deterministic, AI stuff is remarkably fuzzy. On an unrelated note, look! We have new footnotes, with links and everything! ↩
- You can construct a game tree where this assumption—that not searching all of the outcomes when exploring an already-valued node is acceptable—causes trouble, but the fault for that lies more with the evaluation function than with the search. In such cases, the evaluation function must be so incorrect as to evaluate a node which leads to a loss just over the horizon as better than a node which does not lead to an imminent loss to a slightly deeper depth. They are quite rare, though, and I haven’t come across one yet in testing. ↩
- It was supposed to be a plus sign, but it was a minus sign instead. Oops. ↩
- On the to-do list is changing coefficients over the course of a game—brandub is more or less an endgame study, and at some point, the evaluator should prefer material over position in endgames even on larger boards. ↩
- I chose to generate the tables because it’s easier to maintain and update. Work for corner-escape 11×11 boards generalizes to most corner escape variants; the same is true for edges. The only boards which really, truly take special cases are 7×7, since the corners are such a vast majority of the board, and moves which might be considered neutral or iffy on larger boards ought to be given a chance—there aren’t many options in the first place. ↩