That’s right, I’m getting back to a project I’ve left alone for a while: OpenTafl, my hnefatafl engine project. The only one of its kind, it is both a host for other, yet-unwritten tafl bots, and a tafl bot in itself.
Why the return after all this time? Well, two reasons. First, I had to move the whole project to Github recently, because Bitbucket is shutting down its Mercurial repositories. Second, I had a bit of a brain blast on how to do a particular kind of AI project I’ve been wanting to try for a while.
OpenTafl’s current AI uses an alpha-beta pruning search, which is only a hop, skip, and jump away from the original, mathematician’s tree search algorithm, minimax. In simple terms, alpha-beta pruning minimax plays out all possible variations to a certain depth, assuming each player plays optimally, and picks the move which leads to the best outcome. The pruning bit skips searching nodes which are provably less optimal.
Of course, knowing what the optimal move is depends on one of two things: searching the game tree all the way to the end, or searching to a lesser depth and evaluating the leaf nodes in the tree. The former is impossible for reasons of computational power, the latter is logically impossible1. Evaluation functions, as we call them, are always imperfect, and require an awful lot of domain knowledge to do well.
Because tafl is a poorly-studied game, there isn’t a lot of domain knowledge to encode into an evaluation function, which has always limited OpenTafl’s potential somewhat. There are further downsides to the alpha-beta search it uses, too, in particular that it can’t readily be multi-threaded2. So, what’s the answer?
Well, at least potentially, Monte Carlo tree search. Popular among go AIs (and used as the framework for DeepMind’s efforts in computer players for board games), the secret to MCTS is a bit of randomness and a preference for exploring interesting lines of play. Start at the root of the game tree, navigate through the nodes you’ve already seen. When you find a leaf node (that is, one with no children), you generate its children, then play a random game until someone wins or loses. At each tree node, track the win/loss ratio for the tree, and use Mathematics™ to guide your root-to-leaf exploration in future iterations.
Simple! Of course, tafl poses some unique obstacles to MCTS, as Tuireann of PlayTaflOnline.com discovered. The biggest issue is that random moves in tafl are very, very unlikely to do anything of interest—tafl branching factors are higher than branching factors in, say, chess, and there’s no space pressure like there is in go. (That is to say, if you keep making random moves in go, the game eventually ends.)
Tafl MCTS artificial intelligences need some way to guide the playout process (the process of playing to the end of a game). The modern approach for this is to either train a neural network on a large corpus of existing high-level games (tafl doesn’t have one), or train a neural network by playing it against itself for a long time (which I don’t have the budget for). Given those constraints, I set about inventing a measure which would permit me to make random-ish moves which nevertheless move the game forward.
I’m calling the measure King Distance to Victory, or KDV; it’s inspired by part of the evaluation function from J.A.R.L., one of the entrants to the 2016 OpenTafl Tafl Open (which is on the calendar again for 2020, AI programmers!). For a given game state, the KDV is the shortest straight-line distance from the king’s current position to a victory space, counting spaces occupied by allies as 2 and spaces occupied by enemies as 3 or 4. The defender’s goal is to bring KDV to zero. The attacker’s goal is to maximize the average KDV and minimize the variation in the KDV values3.
It isn’t a perfect measure, but it isn’t meant to be—rather, it’s a measure to push playouts toward states which end the game. On that ground, I think it’ll be successful. I also hope to write a quick classical evaluation function which uses the KDV measure exclusively, to see how it plays on its own, without the MCTS magic behind it.
More news to come as it is made.
This proof is left as a trivial exercise for the reader. ↩
To prove which nodes can be ignored, alpha-beta search has to evaluate them in order, for a given value of ‘order’. ↩
This helps encapsulate the idea that the attackers should aim to surround the king and close the noose, rather than simply get in the way of his best move. Note as well the implication that each state has multiple KDVs: an edge-escape game has up to four (the king moves straight from his current position to each edge), and a corner-escape game has up to eight (the king moves straight from his position to an edge space, then from the edge space to one of two corners). ↩
OpenTafl has seen some major development work for the first time in a while, culminating in the recent release of v0.4.6.2b.
This version adds some major features which have been on my to-do list for some time. First, the in-game and replay board views now support keyboard input, a major usability enhancement. No longer will mistaken command entries torpedo your games!
That feature, however, was merely a happy piece of good fortune, falling out of the true headline feature for the v0.4.6.x releases: a variant editor. That’s right. OpenTafl is now the best tool available for designing tafl rules, bar none. Not only can you individually tweak every rule described in the OpenTafl notation specification, you can also edit the board layout to look any way you like.
If you’re interested in playing a few games with the new UI or experimenting with rules variants all your own, you can, as always, get the latest version from the OpenTafl website. I haven’t promoted v0.4.6.x to the stable release yet, but I expect to do so soon.
With these features done, I turn my attention next to a few network-centric things for v0.5.x. OpenTafl’s network play server has not, to date, seen much use; now that PlayTaflOnline.com is getting close to its new architecture, I hope to write a PlayTaflOnline front end for OpenTafl, so you can use OpenTafl to play games at PlayTaflOnline, with all the rich support for replays, commentary, and analysis from OpenTafl. OpenTafl’s network server mode and self-contained network play will continue to be a supported mechanism for remote games, but won’t see new features. v0.5.x will also contain an automatic updater, to simplify the end-user updating process.
Looking further into the future, I’m running out of OpenTafl features I want to do. With luck, 2017 will see a v1.0 release.
This post will cover OpenTafl AI changes since the last post I wrote on the topic, further back in the v0.4.x releases. First, bugs!
Let’s do some quick recap. OpenTafl’s AI is a standard, deterministic1 tree-searching AI, using iterative deepening. That means that OpenTafl searches the whole tree2 to depth 1, then to depth 2, then depth 3, and so on, until it no longer has enough time to finish a full search3.
You may have noticed, if you’ve used OpenTafl, that searching to depth n+1 takes a longer than searching to depth n, and that it’s altogether possible that the process above might leave OpenTafl with a lot of time left over. I did not fail to notice this, and I implemented a couple of additional searches (extension searches, as they’re known) to help fill in this time.
The first, I refer to as continuation search. Continuation search takes the existing tree and starts a new search at the root node, using the work already done and searching to depth n+1. Obviously, continuation search doesn’t expect to finish that search, but it will reach some new nodes and provide us some new information. After continuation search, OpenTafl does what I call a horizon search: it finds the leaf nodes corresponding to the current best-known children of the root node, then runs normal searches starting with the leaf nodes, to verify that there aren’t terrible consequences to a certain move lurking just behind the search horizon.
These are fairly easy concepts to understand, my poor explanations notwithstanding. The bugs I referred to in the title are more insidious. They come down to a much more complicated concept: what conditions must the children of a node meet for that node’s evaluation to be valid?
In the iterative deepening phase of the search, the answer doesn’t matter. Remember, OpenTafl discards any tree it doesn’t finish. When we’re doing extension searches, though, we don’t have that luxury. OpenTafl must be able to discern when a certain node has not been fully searched. I added a flag value to the search to note that a certain node has been left intentionally unvalued, which gets set whenever we have to abandon a search because we’ve run out of time. If a node did not exist in the tree prior to the extension search, and it has unvalued children, then it is also unvalued. If a node did exist in the tree prior to its extension search and it has unvalued children, this is okay! We ignore the unvalued children and use the information we’ve gained4. If an unvalued node is left in the tree after those steps, we ignore its value. Any unvalued node is misleading, and we should avoid using its value when deciding how to play.
This issue led to poor play, as both horizon and continuation search had a chance to introduce bad data into the tree. I finally tracked it down and fixed it in v0.4.4.6b.
After that, I came across another few bugs, lesser in severity but still quite bad for OpenTafl’s play: when evaluating a draw position for the attackers, OpenTafl would incorrectly view it as more defender-favorable than it should have been5. OpenTafl also had some trouble with repetitions, incorrectly failing to increment the repetitions table in some search situations. That’s one of the more important gains over v0.4.4.7b—v0.4.5.0b is absolutely incisive in playing out repetitions, as some of the players at playtaflonline.com discovered after the update.
Finally, a few minor time usage bugs are no longer present, although there are some headscratchers where the AI seems to lose half a second or so to some task I cannot locate, and some task it does not count when doing its time use accounting.
That about wraps up bugs. Features, as usual, are more fun.
First, OpenTafl now is willing to play for a draw in rare circumstances. If its evaluation tilts overwhelmingly toward the other side, and it sees a draw in its search tree, it evaluates the draw poorly, but better than a loss.
That depends on the second feature, which is an improved evaluation function. Rather than guess, I decided to be scientific about it: I built four OpenTafl variants, each with a certain evaluation coefficient raised above the rest. Those variants played each other in a battle royale, and based on the outcome, I picked new coefficients. They differ by size; 7×7 boards consider material more heavily, while larger boards prefer to play positionally6.
Positional play comes from the last and most important feature: piece square tables. Credit for the idea goes to Andreas Persson (on Twitter @apgamesdev), who linked me to the chessprogramming wiki article, and also provided a first pass at the tables.
I should back up a bit first, though. Piece square tables are descriptively-named tables which assign a value to having a certain piece type on a certain space. For instance, the space diagonally adjacent to a corner space in a corner-escape game is very important for the besiegers. That space gets a high positive coefficient. On the other hand, the spaces directly adjacent to the corners are quite bad for the attackers, and get a moderately large negative coefficient. OpenTafl’s evaluator handles the exact values.
The benefits of this approach are manifold: not only does OpenTafl know when the opponent is building a good shape, it now has a sense for position in a global sense. (It previously had some sense of position relative to other pieces, but that was not sufficient.) Because of this, it is much better now at picking moves which serve more than one purpose. If it can strengthen its shape by making a capture, it’ll do so. If it can weaken its opponent’s shape, so much the better. The code which generates the piece square tables can be found here7.
The outcome is most pleasing. I can no longer easily beat OpenTafl on 11×11 corner escape boards, and games in that family are presently the most popular in online play. Equal-time matches are all but a lost cause, and I have to engage my brain in a way I never did before if I allow myself more thinking time. Now, I am not all that good a player, and those who are better than me still handle OpenTafl pretty roughly, but it now plays at a low-intermediate level. Given that it barely even existed twelve months ago, I’d say that’s good progress.
More or less. For being mostly deterministic, AI stuff is remarkably fuzzy. On an unrelated note, look! We have new footnotes, with links and everything! ↩
You can construct a game tree where this assumption—that not searching all of the outcomes when exploring an already-valued node is acceptable—causes trouble, but the fault for that lies more with the evaluation function than with the search. In such cases, the evaluation function must be so incorrect as to evaluate a node which leads to a loss just over the horizon as better than a node which does not lead to an imminent loss to a slightly deeper depth. They are quite rare, though, and I haven’t come across one yet in testing. ↩
It was supposed to be a plus sign, but it was a minus sign instead. Oops. ↩
On the to-do list is changing coefficients over the course of a game—brandub is more or less an endgame study, and at some point, the evaluator should prefer material over position in endgames even on larger boards. ↩
I chose to generate the tables because it’s easier to maintain and update. Work for corner-escape 11×11 boards generalizes to most corner escape variants; the same is true for edges. The only boards which really, truly take special cases are 7×7, since the corners are such a vast majority of the board, and moves which might be considered neutral or iffy on larger boards ought to be given a chance—there aren’t many options in the first place. ↩
First off: the inaugural OpenTafl Computer Tafl Open has come to a close. It was a bit of an anticlimax, I must admit, but fun times nevertheless.
To recap, only one entry (J.A.R.L) made it in on time. On January 2nd, I had the AIs run their matches, and it was all over inside of 20 minutes, with a bit of technical difficulty time to boot. You can find the game records here.
To move one layer deeper into the recap, both AIs won one game each out of the match. J.A.R.L won in 22 moves, OpenTafl won in 18, giving the victory to OpenTafl. Disappointingly for me, OpenTafl played quite poorly in its stint as the attackers, allowing J.A.R.L to quickly set up a strong structure funneling its king to the bottom right of the board. Disappointingly for Jono, J.A.R.L seemed to go off the rails when it played the attacking side, leaving open ranks and files and leaving a certain victory for OpenTafl. Deeper analysis is coming, although, not being a great player myself, I can’t offer too much insight. (Especially given that neither AI played especially well.)
I do expect that, when Jono finishes fixing J.A.R.L, it’ll be stronger than OpenTafl is today. He intends on making its source code available in the coming year, as a starting point for further AI development. (If feasible, I hope to steal his distance-to-the-corner evaluation.)
There will be a 2017 OpenTafl Computer Tafl Open, with the same rules and schedule. I’ll be creating a page for it soon.
Next: progress on OpenTafl itself. It’s difficult to overstate how much has happened in the past year. Last January, OpenTafl was a very simple command-line program with none of the persistent-screen features it has today; it had no support for external AIs, no multiplayer, no notation or saved games, and a comparatively rudimentary built-in AI.
The first major change of the year was switching to Lanterna, and that enabled many of the following ones. Lanterna, the terminal graphics framework OpenTafl uses to render to the screen, allows for tons of fancy features the original, not-really-solution did not. Menus, for one. For another, a UI which makes sense for the complicated application OpenTafl was destined to become. Although it’s the easiest thing to overlook in this list of features, it’s the most foundational. Very few of the remaining items could have happened without it.
Next up: external AI support. In the early days, I only planned for OpenTafl to be a fun little toy. At the end of that plan, it might have been something I could use to play my weekly (… well, kind of) tafl game without having to deal with a web interface. (For what it’s worth, Tuireann’s playtaflonline.com renders that goal obsolete, unless you really like OpenTafl.)
Later on, as I got into work on OpenTafl’s built-in AI, I realized what an amazing object of mathematical interest it is, and that it has not, to date, seen anything like the kind of study it richly deserves. As such, I decided I wanted OpenTafl to be a host for that sort of study. Much of what we know about chess, go, and other historical abstract strategy games comes from the enormous corpus of games played. That corpus does not yet exist for tafl games, the amazing efforts of people like Aage Nielsen and Tuireann notwithstanding. The quickest way to develop a good corpus is to play lots of games between good AIs. Good AIs are hard to come by if every AI author also needs to build a UI and a host.
So, OpenTafl fills the void: by implementing OpenTafl’s straightforward engine protocol, AI authors suddenly gain access to a broad spectrum of opponents. To start with, they can play their AI against all other AIs implementing the protocol, any interested human with a copy of OpenTafl, and possibly even the tafl mavens at playtaflonline.com. Not only that, but the AI selfplay mode allows AI authors to verify progress, a critical part of the development process.
Multiplayer was an obvious extension, although it hasn’t seen a great deal of use. (There are, admittedly, better systems out there.) It proved to be relatively straightforward, and although there are some features I’d like to work out eventually (such as tournaments, a more permanent database, and a system for client-side latency tracking to allow for client-side correction of the received server clock stats), I’m happy with it as it stands.
OpenTafl is also the first tafl tool to define a full specification for tafl notation, and the first to fully implement its specification. The Java files which parse OpenTafl notation to OpenTafl objects, and which turn OpenTafl objects into OpenTafl notation, are in the public domain, free for anyone to modify for their own AI projects, another major benefit.
In defining OpenTafl notation, I wanted to do two things: first, to craft a notation which is easily human-readable, in the tradition of chess notation; and second, to remain interoperable with previous tafl notation efforts, such as Damian Walker’s. The latter goal was trivial; OpenTafl notation is a superset of other tafl notations. The former goal was a little more difficult, and the rules notation is notably rather hard to sight-read unless you’re very familiar with it, but on balance, I think the notations people care about most—moves and games—are quite clear.
Having defined a notation and written code to parse and generate it, I was a hop, skip, and jump away from saved games. Shortly after, I moved on to replays and commentaries. Once again a first: OpenTafl is the first tool which can be used to view and edit annotations on game replays. Puzzles were another obvious addition. In 2017, I hope to release puzzles on a more or less regular basis.
Last and the opposite of least, the AI. Until the tournament revealed that J.A.R.L is on par with or better than OpenTafl, OpenTafl was the strongest tafl-playing program in existence. I’ve written lengthy posts on the AI in the past, and hope to come up with another one soon, talking about changes in v0.4.5.0b, which greatly improved OpenTafl’s play on large boards.
Finally, plans. 2017 will likely be a maintenance year for OpenTafl, since other personal projects demand my time. I may tackle some of the multiplayer features, and I’ll probably dabble in AI improvements, but 2017 will not resemble 2016 in pace of work. I hope to run a 2017 tafl tournament, especially since the engine protocol is now stable, along with OpenTafl itself. I may also explore creating a PPA for OpenTafl.
Anyway, there you have it: 2016 in review. Look for the AI post in the coming weeks.
Unfortunately, turnout for the tournament ended up being rather disappointing. The only entrant to submit an entry was Jonathan Teutenberg, with J.A.R.L.
By the rules as written, this leaves him with the title of champion. Fortunately for you, dear tournament follower, I spoke with Mr. Teutenberg, and we agreed to a one-match playoff between OpenTafl and J.A.R.L.
You can expect to see that game on Monday, January 2nd, starting at about noon Eastern time. There will be coverage available in several forms: a liveblog here, a stream at my hitbox channel (I’ll provide a link when the time comes), and live viewing at a special OpenTafl multiplayer server.
To connect to that multiplayer server, go to Options in your OpenTafl client and change the server address to taflopen.manywords.press. (That server is not up yet; I’ll be starting it on Sunday night.) Connect to the server. (Note that accounts will not be transferred from the main Many Words OpenTafl server. Logging in with a new username and password will register a new account.) The game will be made available for spectators at about the start time.
Stay tuned later today for OpenTafl v0.4.5.0b, the version which will play in the tournament.
The first submission to the 2016 OpenTafl Computer Tafl Open arrived in my inbox yesterday morning. Since I had not yet finished my coffee at work, I decided to open it up and extend my break a bit.
J.A.R.L was developed by Jonathan Teutenberg, who comes to us from the java-gaming.org forum (under the name Jono). It’s a fairly standard iterative deepening depth-first search player, the same structure as OpenTafl’s AI, with two fascinating features.
First, it uses as a component in its evaluation function the distance to a king victory in a simple abstraction of the game. In the simplified version, only the king is allowed to move, and he can move through pieces with a move count penalty. The score is a combination of the abstract distance to each corner, the material balance, and the freedom of the king’s men.
Second, it does some advanced pruning of the search tree, along with some move ordering. Moves which make or avoid captures are weighted heavily and searched more deeply, while states which are much worse than alternatives and worse than the current position are not expanded.
The upshot is an AI which seems to have no small measure of strength to it, enough so that a game against it was worth commenting on, to some degree. (Granted, only because it lost a particular corner endgame I’ve seen come up before.) My suspicion after one game is that it’s one of the best AIs for the attacking side I’ve played to date. I look forward to seeing it in action in the tournament.
That’s right, ladies and gentlemen, one month until entries close. I’ve posted final deadlines and submission guidelines at the official tournament page; have a look and get ready.
During the tournament, expect coverage here, possibly to include some liveblogged or streamed games. See you in a month!
This is the third article in a series of posts on OpenTafl. You can read the first and second parts at the preceding links.
Welcome back to the third and final entry in Opentaflvecken, a one-week period in which I provide some minor respite to poor parvusimperator. We’ve hit the major AI improvements; now it’s time to cover a grab bag of miscellaneous topics left over. Onward!
Playing out repetitions In OpenTafl’s versions of tafl rule sets, threefold repetitions are invariably handled: by default, it yields a draw; some variants define a threefold repetition as a win for the player moving into the third repetition of a board state1. Prior to the AI improvements release, the AI simply ignored repetitions, an obvious weakness: if you found a position where you could force OpenTafl to make repeated moves to defend, you could play the forcing move over and over, and OpenTafl would obligingly find a different way to answer it until it ran out and let you win. No longer! Now, OpenTafl is smart enough to play repetitions out to the end, using them to win if possible, or forcing a draw if necessary.
It turns out, as with everything in AI, this is not as simple as it sounds. The problem is the transposition table. It has no information on how many times a position has been repeated, and repetition rules introduce path dependency: the history of a state matters in its evaluation. Fortunately, this is a very simple path dependency to solve. OpenTafl now has a smallish hash table containing Zobrist hashes and repetition counts. Whenever a position appears, either in the AI search or in the regular play of the game, the repetition count is incremented. Whenever the AI finishes searching a position, it is removed from the table. In this way, at any given state, the AI always knows how many times a position has occurred. If the position has occurred more than once in the past, the AI skips the transposition table lookup and searches the position instead. This seems like it might cause a search slowdown—repetitions can no longer be probed—but in practice, it’s turned out to have almost no effect.
Move ordering improvements I discussed move ordering a little bit in the second article, but I want to go into a little more depth. Move ordering is the beating heart of any alpha-beta search engine. The better the move ordering, the better the pruning works; the better the pruning works, the deeper the AI can search; the deeper the AI can search, the better it plays. Early on in OpenTafl’s development, I threw together a very simple move ordering function: it searched captures first and everything else later. Later on, after the transposition table went in, I added a bit to try transposition table hits in between captures and the remaining moves. The move ordering used Java’s sort method, which, though efficient, is more ordering than is necessary.
Late in the AI improvements branch, when I added the killer move table and the history table, I decided to fix that. Move ordering now does as little work as possible: it makes one pass through the list of moves, putting them into buckets according to their move type. It sorts the transposition table hits and history table hits, then searches the killer moves, the history table hits, the top half of the transposition table hits, the captures, the bottom half of the transposition table hits, and finally, any moves which remain. Though there are two sorts here instead one, since they’re much smaller on average, they’re faster overall.
Puzzles Okay, this one isn’t exactly an AI feature, but you’ll note that the title doesn’t limit me to AI future plans only. It also turns out that this isn’t much of a future plan. Puzzles are already done on the unstable branch, so I’ll use this section to tell you about the three kinds of puzzles.
The first is the least heavily-annotated sort, the one you might find on the hypothetical chess page of a newspaper: simply a rules string, a position string, and a description along the lines of ‘white to win in 4’. OpenTafl supports these with the new load notation feature: paste an OpenTafl rules string into a dialog box, and OpenTafl will load it. (There’s also a new in-game/in-replay command which can be used to copy the current position to the clipboard, ctrl+c and ctrl+v being nontrivial for the moment.)
The other two are closely related. They start with a standard OpenTafl replay, then use some extra tags to declare that the replay file defines a puzzle, and tell OpenTafl where in the replay file the puzzle starts. The puzzle author uses OpenTafl’s replay editing features to create variations and annotations for all of the branches of play he finds interesting, then distributes the file. When loading the file, OpenTafl hides the history until the player explores it. The two closely-related kinds I mentioned are loose puzzles and strict puzzles, the only difference being that loose puzzles allow the player to explore branches the puzzle author didn’t include, while strict puzzles do not.
OpenTafl has all the tools you need to author puzzles, although, at present, you will have to do your own position records. (Sometime down the line, I’d like to add a position editor/analysis mode, but today is not that day.) The README will have more details.
You can expect two or three puzzles to ship with the OpenTafl v0.4.4.0b release, along with all puzzle-related features.
Future evaluation function plans I mentioned before that there are some weaknesses in the current evaluation function, and that some changes will be required. I’m becoming more convinced that this is untrue, and that what I really ought to do is an evaluation function which is easier for humans to understand. The current evaluation function just places an abstract value on positions; the one I’d like to do uses the value of a taflman as an easy currency. In doing so, I can ask myself, “Would I sacrifice a taflman to do X?” Having that as a check on my logic would be nice, and would prevent me from making dumb logic errors like the one in the current evaluation function, where the attacker pushes pieces up against the king much too eagerly.
This may also require more separation in the evaluation function between different board sizes; sacrificing a taflman in a 7×7 game is a bigger decision than sacrificing one in an 11×11 game. I may also have some changing weights between the opening and the endgame, although, as yet, I don’t have a great definition for the dividing line between opening and endgame.
Anyway, that’s all theoretical, and undoubtedly I’ll be writing about it when I get to it, as part of the v0.4.5.x series of releases. In the meantime, though, I have plenty to do to get v0.4.4.0b done. I’ll be testing this one pretty hard, since it’s going to be the next stable release, which involves a lot of hands-on work. The automated tests are great, but can’t be counted on to catch UI and UX issues.
Thanks for joining me for Opentaflvecken! Remember, the AI tournament begins in about three and a half months, so you still have plenty of time to hammer out an AI if you’d like to take part. Hit the link in the header for details.
This isn’t exactly intuitive, but the person moving into a state for the third time is the person whose hand is forced. (Play it out for yourself if you don’t believe me.) Although it’s traditional to declare a threefold repetition a draw, I think it works better in tafl if the player making the forcing move is forced to make a different move or lose.
This is the second article in a series on AI improvements to OpenTafl. Read the first part here.
Welcome back! In the first article in this series, I talked about what was, essentially, groundwork. I did gloss over two rather large bugs in the course of that article, so I’ll give them a deeper treatment before I dive into the three topics I have planned for today.
First: I failed altogether to mention a bug that cropped up while I was writing continuation search, and which, in actuality, prompted my creation of the AI consistency test. I have a bit of code around for extension searches (that is, searches that begin from a non-root node), whose purpose is to revalue all of the nodes above it. I was calling that method much too frequently, even during the main search, which pushed child values up the tree much too quickly, and yielded incorrect alpha-beta values. The bounds converged too quickly, and I ended up cutting off search far too early, before the search had verified that a certain move was safe in terms of opponent responses. I ended up designing a miniature tafl variant, 5×5 with a total of four pieces all limited to a speed of 1, to diagnose the issue. As a game, it’s unplayable, but the game tree to depth 3 takes about 30 or 40 lines, and it’s easy to read the tree and see what’s happening. That’s what I did, and that’s how I found my problem.
Second: the incomplete tree search bug, which I covered in a small amount of detail in a footnote. This one dates back to the very beginning of the OpenTafl AI, and is likely the cause of most of its weaknesses and obvious misplays since then. As I said in the text and the footnote, it stemmed from the assumption that a partial search of, for instance, depth 6 was better than a full search of depth 5. The true trickiness of this bug is that a partial search of depth 6 is, indeed, often as good as a full search of depth 5. All alpha-beta AIs prefer to search the best moves first, so if OpenTafl gets a little ways into the tree, a partial search at depth 6 is good enough to match the full search at depth 51.
The really awful part of this bug, the one which made it the most inconsistent, was that OpenTafl doesn’t always manage to assign a value to every state when it’s out of time. The states have a magic number value marking them as having no evaluation: -11541. This value ended up in the game tree, and it’s a very tempting move for the defender. Subtrees the defender had not finished exploring would be chosen over subtrees it had, leading to inconsistent and incorrect behavior. The solution, as I mentioned in the previous post, was, when starting search to a new depth, to save the search tree to the previous depth. If the new depth doesn’t finish, OpenTafl uses the complete tree from the previous depth.
Time use planning That brings me quite naturally to my first new topic for today: time use planning. Getting time usage correct is tricky. Obviously, we want to search as deeply as possible in the main tree, but we also want to stop as soon as we know we can’t search to the next depth. Since we discard incomplete searches, any time spent on an unfinished search is wasted. Unfortunately, ‘can we search to the next depth?’ carries with it a good deal of uncertainty. OpenTafl now uses a few tricks to determine whether it should attempt a deeper search.
First, it better takes advantage of previous searches to a given depth. Concrete information is hard to come by for AIs, and ‘how long did it take me to do this last time?’ is pretty darned concrete2. Whenever a search is finished to a given depth, OpenTafl stores the time that search took in a table and sets the age of that data to zero. Whenever a search to a given depth fails, OpenTafl increments the age of the data for that depth. If the age exceeds a threshold, OpenTafl invalidates all the data at that depth and deeper.
When determining whether to embark on a search to the next depth, OpenTafl first checks the table. If it has data for the desired depth, it compares its time remaining to that figure. If there is no data, it synthesizes some. Obviously, we know how long it took to get to the current depth: we just finished searching to it. OpenTafl takes two factors into account: first, it’s hard to search to the next depth; and second, it’s harder to search to an odd depth than an even depth3. If going to an even depth, OpenTafl assumes it’ll be 10 times as hard as the previous depth; if going to an odd depth, OpenTafl assumes 20 times. These numbers are extremely provisional. Sometime down the line, I want to write some benchmarking code for various depths and various board sizes to generate some data to make those factors resemble reality more closely.
Second, OpenTafl reserves some time for extension searches. Horizon search oftentimes changes the end result of the evaluation, and so OpenTafl seems to play better when it has time to run. About 15% of the total think time for any given turn is set aside for extension searches.
Search heuristics #1: the history heuristic On to the heuristics! I implemented two for this release. The first, the history heuristic, is one of my favorites. Much like dollar cost averaging4 in finance, the history heuristic says something which is, on reflection, blindingly obvious about alpha-beta searches, but something that nevertheless is not immediately apparent. It goes like this: moves which cause cutoffs, no matter where they appear in the tree, tend to be interesting, and worth exploring early.
Consider the game tree: that is, the tree of all possible games. It fans out in a giant pyramid shape, ever widening, until all of the possibilities peter out. Consider now the search tree: a narrow, sawtoothed path through the game tree, approaching terminal states until it finally alights upon one. The search tree always examines a narrow path through the tree, because the full game tree is so overwhelmingly large. Since the search tree examines a narrow path, similar positions are likely to come up in many different searches, and in similar positions, similar moves are likely to cause cutoffs. Therefore, whenever a move causes a cutoff, we ought to keep track of it, and if it’s a legal move in other positions, search it earlier.
That’s about all there is to it: the bookkeeping is a little more complicated, since we have to do something to be sure that cutoffs high up in the tree are given as much weight as cutoffs lower toward the leaves (OpenTafl increments the history table by remainingDepth squared), but that isn’t especially important to understand the idea.
There are two variations on the history heuristic I’ve considered or am considering. The first is the countermove history heuristic. The history heuristic in its simplest form is very memory-efficient: you only need table entries; OpenTafl’s table entries are simple integers. Even for 19×19 tafl variants, the total size is less than half a megabyte. There exists a more useful expansion of the history heuristic sometimes called the countermove-history heuristic, which involves adding another two dimensions to the table: save cutoff counts per move and preceding move. This allows for a closer match to the situation in which the cutoffs were previously encountered, and increases the odds of a cutoff, but it turns out the inefficiency is too great in tafl games. Chess, with its more modest 8×8 board, can afford to bump the table entry requirement up to : it comes to about 33 million entries, or 60-some megabytes using an integer table entry. OpenTafl, which has to support everything from lowly brandub to the massive alea evangelii, needs, in the maximum case, , or 34 billion table entries, which takes almost 70 gigabytes of memory. Most people aren’t going to have that to spare.
So I did some further reading on the subject, then came across another relation of the history heuristic: the relative history heuristic. It combines a plain history heuristic with something called the butterfly heuristic, which counts how many times a given position occurs in the tree. The relative history value of a state is its history heuristic value (the number of cutoffs it has caused) divided by its butterfly value (the number of times it has appeared in the tree). This makes the relative history heuristic a measure of the efficiency of a move: if a move appears ten times in the tree and causes ten cutoffs, it’s probably more interesting than a move that appears ten thousand times in the tree but only causes eleven cutoffs. I haven’t gotten around to it yet, but OpenTafl will probably include the relative history heuristic in a future AI release.
Search heuristics #2: the killer move heuristic The killer move heuristic is the other heuristic I implemented for this release. The killer move heuristic is a special case of the history heuristic5, and turns out to be the single greatest improvement in OpenTafl’s strength I’ve implemented to date6.
What is it, then? Simply this: for each search, it tracks the two first moves which caused a cutoff at a given depth, along with the most recent move to do so, and plays those first whenever possible. See? Simple. It calls for very little explanation, and I’m already at nearly 2,000 words, so I’ll leave you with this. See you later this week for the thrilling conclusion.
Advanced chess engines take advantage of this fact to search deeper: they do something called ‘late move reduction’, where the moves late in the move ordering are searched to a lesser depth, leaving more search time for the better moves (the ones earlier in move ordering). Move ordering in chess engines is usually good enough that that this works out.
It isn’t always useful, though. Depending on what happens on the board and how well the move ordering happens to work, relative to the previous search, the next search may be very fast (if it’s something we expected) or very slow (if it completely blows up our plan).
Even depths are usually small and odd depths are usually large, in terms of node counts relative to what you’d expect if the increase was linear. The mechanism of alpha-beta pruning causes this effect.
Dollar cost averaging makes the incredibly obvious observation that, if you put money into an asset over time, fixed amounts at regular intervals, you end up buying more of the asset when the price is low than when the price is high. Because of numbers.
Although the literature usually reverses this relationship: the history heuristic is normally referred to as a general case of the killer move heuristic, since the latter was used first.
In two ways: first, the version with the killer move heuristic plays the best against other AIs; second, the version with the killer move heuristic deepens far, far faster. The killer move heuristic is, in fact, almost single-handedly responsible for that faster speed to a given depth, going by my benchmarks. In brandub, for instance, OpenTafl reaches depth 6 in 2.5 million nodes without the killer move heuristic, and 1.3 million nodes with it. Searching killer moves first yields the best result, and moving anything ahead of them in the search order is non-ideal.
If you’re a regular reader over at Many Words Main, you’ll undoubtedly have thought that I’m some kind of bum, missing another update so soon. Not so! I’ve been very busy on other projects; very busy indeed. In a series of articles this week, I’ll be going through improvements I’ve made to OpenTafl’s AI. I’ll be releasing a stable version out of v0.4.3.x soon, which will feature the structural improvements I’ve made so far; next up will be puzzles in v0.4.4.x, then AI state evaluation improvements in v0.4.5.x.
So, what have I been up to? Well, let me open up my source control log, and I’ll tell you. Note I’m not reporting my progress in chronological order here: I present to you a natural-seeming order of features which bears little resemblance to reality, but which does make for a neater story.
Groundwork Before embarking on a project as fuzzy as AI improvements, I needed tools to tell me whether my changes were making a positive effect. I started with a simple one: a command which dumps the AI’s evaluation of a particular state. This lets me verify that the evaluation is correct, and it turns out (to nobody’s great surprise) that it is not. When playing rules with weak kings, the AI places far too much weight on having a piece next to the king. When playing rules with kings which aren’t strong (tablut, for instance), the AI still mistakenly assumes that placing one piece next to the king is the same as putting the king in check. An improvement to make for phase 2!
The next thing on the docket was a test. As I wrote about what feels like a lifetime ago, OpenTafl is a traditional game-playing artificial intelligence. It uses a tree search algorithm called minimax with alpha-beta pruning, which is an extension of the pure minimax algorithm. The latter simply plays out every possible game to a certain depth, then evaluates the board position at the end of each possibility, then picks the best one to play toward, bearing in mind that the other player will make moves which are not optimal from its perspective1. Alpha-beta pruning extends minimax by taking advantage of the way we search the tree: we go depth first, exploring leaf nodes from ‘left’ to ‘right’2. As we move to the right, we discover nodes that impose bounds on the solution. Nodes which fall outside those bounds need not be considered. (The other article has an example.)
Anyway, the upshot of the preceding paragraph is that, for any given search, minimax and minimax with alpha-beta pruning should return the same move3. Previously, I had no way of verifying that this was so. Fortunately, it turned out that it was, and out of the whole endeavor I gained an excellent AI consistency test. It runs several searches, starting with all of OpenTafl’s AI features turned off to do a pure minimax search, then layering on more and more of OpenTafl’s search optimizations, to verify that none of the search optimizations break the minimax equivalence4.
Extension searches Strong chess engines do something called extensions: after searching the main tree to a certain depth, they find interesting positions among the leaf nodes—captures, checks, and others—and search those nodes more deeply, until they become quiet5. OpenTafl does something similar, albeit on a larger scale. After it runs out of time for the main search, it launches into a two-stage extension search.
First, it attempts to deepen its view of the tree overall, a process I’ve been calling ‘continuation search’. Rather than clear the tree and start again, as OpenTafl does on its other deepening steps, continuation search starts with the tree already discovered and re-searches it, exploring to the next depth and re-expanding nodes. This process is much slower, in terms of new nodes explored, than a new deepening step, but as you’ll recall, deepening goes from left to right. A new deepening step discards useful information about the rightmost moves, in the hopes of discovering better information. Continuation search assumes that we don’t have the time to do that well, and that a broad, but not exhaustive, search to the next depth is more correct than half of a game tree6.
Continuation search takes time, too, though. It has to muddle its way down through the preexisting tree, potentially expanding lines of play previously discarded, before it gets to any truly new information. When there isn’t enough time for continuation search, the AI does something I’ve been calling horizon search, which resembles the traditional extension searches found in chess engines. The AI repeatedly makes deeper searches, using the end of the best nodes as the root. Frequently, it discovers that the variation it thought was best turns out not to be: I’d put it at about half the time that horizon search runs, discovering some unrealized weakness in the position that the evaluation function did not correctly account for.
But then again, it isn’t the evaluation function’s job to predict the future. The history of traditional game-playing AI is one of a quest for greater depth; a simple evaluation function and a deep search usually gives better results than a heavyweight evaluation function and a shallow search. With this philosophical point I think I will leave you for now. I have at least one more of these posts running later in the week, so I’ll see you then.
1. Hence minimax: one player is traditionally known as max, whose job it is to maximize the value of the position evaluation function, and the other is known as min, whose job is the opposite. The best move for max is the move which allows min the least good response. 2. Or whatever other natural ordering you prefer. 3. Or a move with the same score. 4. Since minimax is known to produce the optimal move from a given situation, any search which is not the same as minimax is no longer optimal. It turned out that, depending on the search optimizations, moves might be considered in a different order, and in case of identical scores, the first one encountered would be selected. The test runs from the start, so there are eight symmetrical ways to make the same move, and some searches came up with mirrored or rotated moves. Out of the bargain, I got methods to detect rotations and mirrors of a given move, which will serve me well for the opening book and the corner books I hope to do down the road. 5. Another point where chess AI developers have an edge on tafl fans: there’s more research on what makes for a quiet chess position (one where the evaluation is not likely to dramatically shift) than there is for tafl positions. 6. This was the source of a particularly pernicious bug, because it evaded most of my tests: most of the tests had search times just long enough to get near the end of the tree for a given depth, and the best move occurred at the start of the search in the rest. Continuation search also caused another bug, in response to which I wrote the AI consistency test. (I told you I wasn’t going in chronological order here.) That one was all my fault: I have a method to revalue the parents of a node, which is important for the next kind of search I’ll be describing. I was calling it after exploring the children of any node, propagating alpha and beta values up the tree too fast and causing incorrect, early cutoffs (which meant that the AI failed to explore some good countermoves, and had some baffling weaknesses which are now corrected).