Tag Archives: gaming

Opentaflvecken: final AI notes, plans for the future

This is the third article in a series of posts on OpenTafl. You can read the first and second parts at the preceding links.

Welcome back to the third and final entry in Opentaflvecken, a one-week period in which I provide some minor respite to poor parvusimperator. We’ve hit the major AI improvements; now it’s time to cover a grab bag of miscellaneous topics left over. Onward!

Playing out repetitions
In OpenTafl’s versions of tafl rule sets, threefold repetitions are invariably handled: by default, it yields a draw; some variants define a threefold repetition as a win for the player moving into the third repetition of a board state¹. Prior to the AI improvements release, the AI simply ignored repetitions, an obvious weakness: if you found a position where you could force OpenTafl to make repeated moves to defend, you could play the forcing move over and over, and OpenTafl would obligingly find a different way to answer it until it ran out and let you win. No longer! Now, OpenTafl is smart enough to play repetitions out to the end, using them to win if possible, or forcing a draw if necessary.

It turns out, as with everything in AI, this is not as simple as it sounds. The problem is the transposition table. It has no information on how many times a position has been repeated, and repetition rules introduce path dependency: the history of a state matters in its evaluation. Fortunately, this is a very simple path dependency to solve. OpenTafl now has a smallish hash table containing Zobrist hashes and repetition counts. Whenever a position appears, either in the AI search or in the regular play of the game, the repetition count is incremented. Whenever the AI finishes searching a position, it is removed from the table. In this way, at any given state, the AI always knows how many times a position has occurred. If the position has occurred more than once in the past, the AI skips the transposition table lookup and searches the position instead. This seems like it might cause a search slowdown—repetitions can no longer be probed—but in practice, it’s turned out to have almost no effect.

Move ordering improvements
I discussed move ordering a little bit in the second article, but I want to go into a little more depth. Move ordering is the beating heart of any alpha-beta search engine. The better the move ordering, the better the pruning works; the better the pruning works, the deeper the AI can search; the deeper the AI can search, the better it plays. Early on in OpenTafl’s development, I threw together a very simple move ordering function: it searched captures first and everything else later. Later on, after the transposition table went in, I added a bit to try transposition table hits in between captures and the remaining moves. The move ordering used Java’s sort method, which, though efficient, is more ordering than is necessary.

Late in the AI improvements branch, when I added the killer move table and the history table, I decided to fix that. Move ordering now does as little work as possible: it makes one pass through the list of moves, putting them into buckets according to their move type. It sorts the transposition table hits and history table hits, then searches the killer moves, the history table hits, the top half of the transposition table hits, the captures, the bottom half of the transposition table hits, and finally, any moves which remain. Though there are two sorts here instead one, since they’re much smaller on average, they’re faster overall.

Puzzles
Okay, this one isn’t exactly an AI feature, but you’ll note that the title doesn’t limit me to AI future plans only. It also turns out that this isn’t much of a future plan. Puzzles are already done on the unstable branch, so I’ll use this section to tell you about the three kinds of puzzles.

The first is the least heavily-annotated sort, the one you might find on the hypothetical chess page of a newspaper: simply a rules string, a position string, and a description along the lines of ‘white to win in 4’. OpenTafl supports these with the new load notation feature: paste an OpenTafl rules string into a dialog box, and OpenTafl will load it. (There’s also a new in-game/in-replay command which can be used to copy the current position to the clipboard, ctrl+c and ctrl+v being nontrivial for the moment.)

The other two are closely related. They start with a standard OpenTafl replay, then use some extra tags to declare that the replay file defines a puzzle, and tell OpenTafl where in the replay file the puzzle starts. The puzzle author uses OpenTafl’s replay editing features to create variations and annotations for all of the branches of play he finds interesting, then distributes the file. When loading the file, OpenTafl hides the history until the player explores it. The two closely-related kinds I mentioned are loose puzzles and strict puzzles, the only difference being that loose puzzles allow the player to explore branches the puzzle author didn’t include, while strict puzzles do not.

OpenTafl has all the tools you need to author puzzles, although, at present, you will have to do your own position records. (Sometime down the line, I’d like to add a position editor/analysis mode, but today is not that day.) The README will have more details.

You can expect two or three puzzles to ship with the OpenTafl v0.4.4.0b release, along with all puzzle-related features.

Future evaluation function plans
I mentioned before that there are some weaknesses in the current evaluation function, and that some changes will be required. I’m becoming more convinced that this is untrue, and that what I really ought to do is an evaluation function which is easier for humans to understand. The current evaluation function just places an abstract value on positions; the one I’d like to do uses the value of a taflman as an easy currency. In doing so, I can ask myself, “Would I sacrifice a taflman to do X?” Having that as a check on my logic would be nice, and would prevent me from making dumb logic errors like the one in the current evaluation function, where the attacker pushes pieces up against the king much too eagerly.

This may also require more separation in the evaluation function between different board sizes; sacrificing a taflman in a 7×7 game is a bigger decision than sacrificing one in an 11×11 game. I may also have some changing weights between the opening and the endgame, although, as yet, I don’t have a great definition for the dividing line between opening and endgame.

Anyway, that’s all theoretical, and undoubtedly I’ll be writing about it when I get to it, as part of the v0.4.5.x series of releases. In the meantime, though, I have plenty to do to get v0.4.4.0b done. I’ll be testing this one pretty hard, since it’s going to be the next stable release, which involves a lot of hands-on work. The automated tests are great, but can’t be counted on to catch UI and UX issues.

Thanks for joining me for Opentaflvecken! Remember, the AI tournament begins in about three and a half months, so you still have plenty of time to hammer out an AI if you’d like to take part. Hit the link in the header for details.

This isn’t exactly intuitive, but the person moving into a state for the third time is the person whose hand is forced. (Play it out for yourself if you don’t believe me.) Although it’s traditional to declare a threefold repetition a draw, I think it works better in tafl if the player making the forcing move is forced to make a different move or lose.

Opentaflvecken: AI progress, continued

Posted by Fishbreath

This is the second article in a series on AI improvements to OpenTafl. Read the first part here.

Welcome back! In the first article in this series, I talked about what was, essentially, groundwork. I did gloss over two rather large bugs in the course of that article, so I’ll give them a deeper treatment before I dive into the three topics I have planned for today.

First: I failed altogether to mention a bug that cropped up while I was writing continuation search, and which, in actuality, prompted my creation of the AI consistency test. I have a bit of code around for extension searches (that is, searches that begin from a non-root node), whose purpose is to revalue all of the nodes above it. I was calling that method much too frequently, even during the main search, which pushed child values up the tree much too quickly, and yielded incorrect alpha-beta values. The bounds converged too quickly, and I ended up cutting off search far too early, before the search had verified that a certain move was safe in terms of opponent responses. I ended up designing a miniature tafl variant, 5×5 with a total of four pieces all limited to a speed of 1, to diagnose the issue. As a game, it’s unplayable, but the game tree to depth 3 takes about 30 or 40 lines, and it’s easy to read the tree and see what’s happening. That’s what I did, and that’s how I found my problem.

Second: the incomplete tree search bug, which I covered in a small amount of detail in a footnote. This one dates back to the very beginning of the OpenTafl AI, and is likely the cause of most of its weaknesses and obvious misplays since then. As I said in the text and the footnote, it stemmed from the assumption that a partial search of, for instance, depth 6 was better than a full search of depth 5. The true trickiness of this bug is that a partial search of depth 6 is, indeed, often as good as a full search of depth 5. All alpha-beta AIs prefer to search the best moves first, so if OpenTafl gets a little ways into the tree, a partial search at depth 6 is good enough to match the full search at depth 5¹.

The really awful part of this bug, the one which made it the most inconsistent, was that OpenTafl doesn’t always manage to assign a value to every state when it’s out of time. The states have a magic number value marking them as having no evaluation: -11541. This value ended up in the game tree, and it’s a very tempting move for the defender. Subtrees the defender had not finished exploring would be chosen over subtrees it had, leading to inconsistent and incorrect behavior. The solution, as I mentioned in the previous post, was, when starting search to a new depth, to save the search tree to the previous depth. If the new depth doesn’t finish, OpenTafl uses the complete tree from the previous depth.

Time use planning
That brings me quite naturally to my first new topic for today: time use planning. Getting time usage correct is tricky. Obviously, we want to search as deeply as possible in the main tree, but we also want to stop as soon as we know we can’t search to the next depth. Since we discard incomplete searches, any time spent on an unfinished search is wasted. Unfortunately, ‘can we search to the next depth?’ carries with it a good deal of uncertainty. OpenTafl now uses a few tricks to determine whether it should attempt a deeper search.

First, it better takes advantage of previous searches to a given depth. Concrete information is hard to come by for AIs, and ‘how long did it take me to do this last time?’ is pretty darned concrete². Whenever a search is finished to a given depth, OpenTafl stores the time that search took in a table and sets the age of that data to zero. Whenever a search to a given depth fails, OpenTafl increments the age of the data for that depth. If the age exceeds a threshold, OpenTafl invalidates all the data at that depth and deeper.

When determining whether to embark on a search to the next depth, OpenTafl first checks the table. If it has data for the desired depth, it compares its time remaining to that figure. If there is no data, it synthesizes some. Obviously, we know how long it took to get to the current depth: we just finished searching to it. OpenTafl takes two factors into account: first, it’s hard to search to the next depth; and second, it’s harder to search to an odd depth than an even depth³. If going to an even depth, OpenTafl assumes it’ll be 10 times as hard as the previous depth; if going to an odd depth, OpenTafl assumes 20 times. These numbers are extremely provisional. Sometime down the line, I want to write some benchmarking code for various depths and various board sizes to generate some data to make those factors resemble reality more closely.

Second, OpenTafl reserves some time for extension searches. Horizon search oftentimes changes the end result of the evaluation, and so OpenTafl seems to play better when it has time to run. About 15% of the total think time for any given turn is set aside for extension searches.

Search heuristics #1: the history heuristic
On to the heuristics! I implemented two for this release. The first, the history heuristic, is one of my favorites. Much like dollar cost averaging⁴ in finance, the history heuristic says something which is, on reflection, blindingly obvious about alpha-beta searches, but something that nevertheless is not immediately apparent. It goes like this: moves which cause cutoffs, no matter where they appear in the tree, tend to be interesting, and worth exploring early.

Consider the game tree: that is, the tree of all possible games. It fans out in a giant pyramid shape, ever widening, until all of the possibilities peter out. Consider now the search tree: a narrow, sawtoothed path through the game tree, approaching terminal states until it finally alights upon one. The search tree always examines a narrow path through the tree, because the full game tree is so overwhelmingly large. Since the search tree examines a narrow path, similar positions are likely to come up in many different searches, and in similar positions, similar moves are likely to cause cutoffs. Therefore, whenever a move causes a cutoff, we ought to keep track of it, and if it’s a legal move in other positions, search it earlier.

That’s about all there is to it: the bookkeeping is a little more complicated, since we have to do something to be sure that cutoffs high up in the tree are given as much weight as cutoffs lower toward the leaves (OpenTafl increments the history table by remainingDepth squared), but that isn’t especially important to understand the idea.

There are two variations on the history heuristic I’ve considered or am considering. The first is the countermove history heuristic. The history heuristic in its simplest form is very memory-efficient: you only need $2 \times dimension^4$ table entries; OpenTafl’s table entries are simple integers. Even for 19×19 tafl variants, the total size is less than half a megabyte. There exists a more useful expansion of the history heuristic sometimes called the countermove-history heuristic, which involves adding another two dimensions to the table: save cutoff counts per move and preceding move. This allows for a closer match to the situation in which the cutoffs were previously encountered, and increases the odds of a cutoff, but it turns out the inefficiency is too great in tafl games. Chess, with its more modest 8×8 board, can afford to bump the table entry requirement up to $2 \times 8^8$ : it comes to about 33 million entries, or 60-some megabytes using an integer table entry. OpenTafl, which has to support everything from lowly brandub to the massive alea evangelii, needs, in the maximum case, $2 \times 19^8$ , or 34 billion table entries, which takes almost 70 gigabytes of memory. Most people aren’t going to have that to spare.

So I did some further reading on the subject, then came across another relation of the history heuristic: the relative history heuristic. It combines a plain history heuristic with something called the butterfly heuristic, which counts how many times a given position occurs in the tree. The relative history value of a state is its history heuristic value (the number of cutoffs it has caused) divided by its butterfly value (the number of times it has appeared in the tree). This makes the relative history heuristic a measure of the efficiency of a move: if a move appears ten times in the tree and causes ten cutoffs, it’s probably more interesting than a move that appears ten thousand times in the tree but only causes eleven cutoffs. I haven’t gotten around to it yet, but OpenTafl will probably include the relative history heuristic in a future AI release.

Search heuristics #2: the killer move heuristic
The killer move heuristic is the other heuristic I implemented for this release. The killer move heuristic is a special case of the history heuristic⁵, and turns out to be the single greatest improvement in OpenTafl’s strength I’ve implemented to date⁶.

What is it, then? Simply this: for each search, it tracks the two first moves which caused a cutoff at a given depth, along with the most recent move to do so, and plays those first whenever possible. See? Simple. It calls for very little explanation, and I’m already at nearly 2,000 words, so I’ll leave you with this. See you later this week for the thrilling conclusion.

Advanced chess engines take advantage of this fact to search deeper: they do something called ‘late move reduction’, where the moves late in the move ordering are searched to a lesser depth, leaving more search time for the better moves (the ones earlier in move ordering). Move ordering in chess engines is usually good enough that that this works out.
It isn’t always useful, though. Depending on what happens on the board and how well the move ordering happens to work, relative to the previous search, the next search may be very fast (if it’s something we expected) or very slow (if it completely blows up our plan).
Even depths are usually small and odd depths are usually large, in terms of node counts relative to what you’d expect if the increase was linear. The mechanism of alpha-beta pruning causes this effect.
Dollar cost averaging makes the incredibly obvious observation that, if you put money into an asset over time, fixed amounts at regular intervals, you end up buying more of the asset when the price is low than when the price is high. Because of numbers.
Although the literature usually reverses this relationship: the history heuristic is normally referred to as a general case of the killer move heuristic, since the latter was used first.
In two ways: first, the version with the killer move heuristic plays the best against other AIs; second, the version with the killer move heuristic deepens far, far faster. The killer move heuristic is, in fact, almost single-handedly responsible for that faster speed to a given depth, going by my benchmarks. In brandub, for instance, OpenTafl reaches depth 6 in 2.5 million nodes without the killer move heuristic, and 1.3 million nodes with it. Searching killer moves first yields the best result, and moving anything ahead of them in the search order is non-ideal.

Opentaflvecken: AI progress

Posted by Fishbreath

If you’re a regular reader over at Many Words Main, you’ll undoubtedly have thought that I’m some kind of bum, missing another update so soon. Not so! I’ve been very busy on other projects; very busy indeed. In a series of articles this week, I’ll be going through improvements I’ve made to OpenTafl’s AI. I’ll be releasing a stable version out of v0.4.3.x soon, which will feature the structural improvements I’ve made so far; next up will be puzzles in v0.4.4.x, then AI state evaluation improvements in v0.4.5.x.

So, what have I been up to? Well, let me open up my source control log, and I’ll tell you. Note I’m not reporting my progress in chronological order here: I present to you a natural-seeming order of features which bears little resemblance to reality, but which does make for a neater story.

Groundwork
Before embarking on a project as fuzzy as AI improvements, I needed tools to tell me whether my changes were making a positive effect. I started with a simple one: a command which dumps the AI’s evaluation of a particular state. This lets me verify that the evaluation is correct, and it turns out (to nobody’s great surprise) that it is not. When playing rules with weak kings, the AI places far too much weight on having a piece next to the king. When playing rules with kings which aren’t strong (tablut, for instance), the AI still mistakenly assumes that placing one piece next to the king is the same as putting the king in check. An improvement to make for phase 2!

The next thing on the docket was a test. As I wrote about what feels like a lifetime ago, OpenTafl is a traditional game-playing artificial intelligence. It uses a tree search algorithm called minimax with alpha-beta pruning, which is an extension of the pure minimax algorithm. The latter simply plays out every possible game to a certain depth, then evaluates the board position at the end of each possibility, then picks the best one to play toward, bearing in mind that the other player will make moves which are not optimal from its perspective¹. Alpha-beta pruning extends minimax by taking advantage of the way we search the tree: we go depth first, exploring leaf nodes from ‘left’ to ‘right’². As we move to the right, we discover nodes that impose bounds on the solution. Nodes which fall outside those bounds need not be considered. (The other article has an example.)

Anyway, the upshot of the preceding paragraph is that, for any given search, minimax and minimax with alpha-beta pruning should return the same move³. Previously, I had no way of verifying that this was so. Fortunately, it turned out that it was, and out of the whole endeavor I gained an excellent AI consistency test. It runs several searches, starting with all of OpenTafl’s AI features turned off to do a pure minimax search, then layering on more and more of OpenTafl’s search optimizations, to verify that none of the search optimizations break the minimax equivalence⁴.

Extension searches
Strong chess engines do something called extensions: after searching the main tree to a certain depth, they find interesting positions among the leaf nodes—captures, checks, and others—and search those nodes more deeply, until they become quiet⁵. OpenTafl does something similar, albeit on a larger scale. After it runs out of time for the main search, it launches into a two-stage extension search.

First, it attempts to deepen its view of the tree overall, a process I’ve been calling ‘continuation search’. Rather than clear the tree and start again, as OpenTafl does on its other deepening steps, continuation search starts with the tree already discovered and re-searches it, exploring to the next depth and re-expanding nodes. This process is much slower, in terms of new nodes explored, than a new deepening step, but as you’ll recall, deepening goes from left to right. A new deepening step discards useful information about the rightmost moves, in the hopes of discovering better information. Continuation search assumes that we don’t have the time to do that well, and that a broad, but not exhaustive, search to the next depth is more correct than half of a game tree⁶.

Continuation search takes time, too, though. It has to muddle its way down through the preexisting tree, potentially expanding lines of play previously discarded, before it gets to any truly new information. When there isn’t enough time for continuation search, the AI does something I’ve been calling horizon search, which resembles the traditional extension searches found in chess engines. The AI repeatedly makes deeper searches, using the end of the best nodes as the root. Frequently, it discovers that the variation it thought was best turns out not to be: I’d put it at about half the time that horizon search runs, discovering some unrealized weakness in the position that the evaluation function did not correctly account for.

But then again, it isn’t the evaluation function’s job to predict the future. The history of traditional game-playing AI is one of a quest for greater depth; a simple evaluation function and a deep search usually gives better results than a heavyweight evaluation function and a shallow search. With this philosophical point I think I will leave you for now. I have at least one more of these posts running later in the week, so I’ll see you then.

1. Hence minimax: one player is traditionally known as max, whose job it is to maximize the value of the position evaluation function, and the other is known as min, whose job is the opposite. The best move for max is the move which allows min the least good response.
2. Or whatever other natural ordering you prefer.
3. Or a move with the same score.
4. Since minimax is known to produce the optimal move from a given situation, any search which is not the same as minimax is no longer optimal. It turned out that, depending on the search optimizations, moves might be considered in a different order, and in case of identical scores, the first one encountered would be selected. The test runs from the start, so there are eight symmetrical ways to make the same move, and some searches came up with mirrored or rotated moves. Out of the bargain, I got methods to detect rotations and mirrors of a given move, which will serve me well for the opening book and the corner books I hope to do down the road.
5. Another point where chess AI developers have an edge on tafl fans: there’s more research on what makes for a quiet chess position (one where the evaluation is not likely to dramatically shift) than there is for tafl positions.
6. This was the source of a particularly pernicious bug, because it evaded most of my tests: most of the tests had search times just long enough to get near the end of the tree for a given depth, and the best move occurred at the start of the search in the rest. Continuation search also caused another bug, in response to which I wrote the AI consistency test. (I told you I wasn’t going in chronological order here.) That one was all my fault: I have a method to revalue the parents of a node, which is important for the next kind of search I’ll be describing. I was calling it after exploring the children of any node, propagating alpha and beta values up the tree too fast and causing incorrect, early cutoffs (which meant that the AI failed to explore some good countermoves, and had some baffling weaknesses which are now corrected).

OpenTafl v0.4.2.4b released!

Posted by Fishbreath

Undoubtedly the release number will continue to increment as I find and fix little bugs in my work on the AI improvements branch. Either way, v0.4.2.x is now the stable branch, carrying with it two major features: variations in replays, and some additions to the stable of rules.

First, variations: the headline feature for v0.4.x, variations provide players the ability to play out alternate histories in replay mode. These alternate histories can be saved and loaded, as well as annotated (and, annotations now have a handy new in-game editor for ease of production).

Second, rules: I finally got around to implementing the last two rules preventing OpenTafl from playing almost all known tafl variants. Not coincidentally, OpenTafl now supports every rule supported by PlayTaflOnline.com, so the bot player there can compete on every front. Poorly.

That brings me to my last point for today: AI improvements. What’s on the list? Well, I have a few things on my list.

First thing’s first: I have to characterize OpenTafl’s particular failings. The ones I can most easily fix rest in the evaluation function. If the evaluation function provides inaccurate evaluations of board states, then all the rest—heuristics and fancy pruning alike—rests on an unsteady foundation. The analysis engine feature aids me in this quest. My workflow for this initial phase goes like this: I play a game against the AI, using the analysis engine to feed me the AI’s moves. When the AI makes a move which is obviously bad, I go to the replay, then tell the analysis engine to dump its evaluations for the states in question. By comparing those evaluations with evaluations of the move I would prefer, I can begin to see what the AI sees.

I’ve already made two interesting discoveries regarding AI weights and preferences. First, it places far too much importance on guarding or attacking the king, to the point that the attacker AI will happily sacrifice piece after piece if only it puts the king in ‘check’. Second, flowing out of the first item, it has a badly inaccurate view of what constitutes threatening an opposing piece. When it decides to threaten an enemy piece, it will happily do so by moving one of its own pieces into a position where it can immediately be recaptured. Oops.

So, once I’ve made changes to fix those mistakes, how do I verify that I’ve made a positive difference? I play the AI against old versions. Thankfully, building older versions of OpenTafl is trivial. (If you’ve been following development, you’ll no doubt have noticed that there’s a Mercurial tag for every release.) I have a little tool which is intended to do Elo ratings for chess clubs, but which will serve to do Elo ratings for OpenTafl versions just fine. This helps quantify not only whether a version is better than another version, but by how much.

Once I have the evaluation function a little more settled, I can move onto some extra heuristics. I have two in mind for the moment: the killer move heuristic, and the history heuristic (or some related heuristic). The killer move heuristic is the more promising one. It assumes that most moves don’t change the overall state of the board too much, and there are likely to be, at a given depth in the tree, only a few plausible moves to push the evaluation in your direction. Therefore, if any of those moves are possible at that depth, the AI should try them first.

The history heuristic is more complicated. There are three variations I’m considering. First, the straight history heuristic, which orders moves by how often they’ve caused a cutoff. This prefers, in the long run, making moves which have been good elsewhere in the tree. Straightforward, compared to the others.

Second, the butterfly heuristic, which orders moves by how often they occur anywhere in the tree. This prefers making moves which are frequently considered. This one is a little more subtle. Alpha-beta search, by its very nature, ends up searching only the most interesting moves, pruning away the rest. The butterfly heuristic, by tracking moves which turn up a lot, is essentially tracking which moves are more interesting.

Finally, the countermove heuristic, which tracks which moves are the best responses to other moves, and weights moves matching the known good countermove more heavily. This one requires very little explanation.

So, in the next month or two, I expect the strength of OpenTafl’s AI to improve considerably. Stay tuned!

On tafl: variations and puzzles, design goals and approach

Posted by Fishbreath

The headline feature in OpenTafl v0.4.x is playable variations: that is, when viewing a replay, you’ll be able to say ‘variation a1 a3’ to create a new branch in the history, which can be added to, viewed, navigated, and commented upon like any other branch. It turns out this is, to put it mildly, non-trivial. Before I go into why this is, I’d like to talk for a few hundred words about why this feature excites me.

In short, this feature is the last feature before OpenTafl hits feature-completeness, relative to engines for other abstract strategy game engines. Other engines include it because it’s a useful tool for teaching and review; OpenTafl will be no different. I find teaching, especially, to be important. At present, available tafl commentaries remark only on the principal line of play. If they touch on variations, they do so only in passing. Understanding why a variation is a bad idea is all but a requirement for higher-level play, and providing room for commentators to make those comments is therefore a requirement for OpenTafl.

The usage in which I’m most interested, however, is puzzles. A puzzle is nothing but a branching commentary in which it is impossible to read ahead, and OpenTafl should support that pretty easily. I have a few tafl puzzles in mind already, thanks to interesting situations from my approximately-weekly game, which I hope to package with the first release of v0.4.x. To fulfill the ‘impossible to read ahead’ requirement, I’ll be adding an allowable tag to the saved game file format. When set, OpenTafl will suppress use of the ‘history’ command when viewing a replay.

Both of these usages presuppose a community of OpenTafl-literate commentators and puzzle authors, and the current setup for editing commentaries is not what you would call user-friendly. I plan to stick in a quick-and-dirty comment editor, a big text box you can use to define the comment for a particular state.

There. That covers, approximately, the list of features and their justifications for this release. On to the depressingly practical bits. How?

It turns out to be a tricky problem. I did not write the early versions of OpenTafl with an eye toward a tree structure for game histories. Since the GameState object, the standard representation for a tafl position in OpenTafl, is already a heavy object and already used in the AI, I didn’t want to add anything further to it. OpenTafl already uses a ReplayGame object to overlay the basic Game object during replays, so the natural thing to do was extend GameState to ReplayGameState, and put all the replay-related data and functionality into ReplayGameState. This solved the first problem: where to store the data? It revealed another: how to reference it?

It turns out that the problem of naming branches in a tree is also not altogether trivial, at least as far as the scheme goes. Fortunately for you, OpenTafl user, you won’t have to worry about figuring out how it works; replay mode will name each state for you, and you can specify which one you want to jump to. I, however, had to do the heavy lifting.

In replay mode, each state now has a specific name. The first state (more accurately, the first move) is state 1a. The next move is state 1b. (In berserk tafl, you might see a 1c or a 1d.) Those two (or more) moves compose the first turn. Turn 2 comprises 2a and 2b. Easy so far, right? Let’s dive into a more complicated example. Say you start a game, enter replay mode, and type ‘variation a4 a2’. You’ve now moved off of the beaten path: you’re in a variation. The state you’re in is now called 1a.1.1a.

Whoa. What’s going on?

This is an OpenTafl variation address. We’ll read them from right to left. First, the last element: 1a. That means this is the first turn of a new branch of play, and this is the first move therein. Next, the middle element: 1. That means that this is the first variation off of the state to our right. Finally, the first element: this variation replaces move 1a. A shorter reading is, “the first move of the first variation off of move 1a.” If you make further moves in that variation, they’re called 1a.1.1b, 1a.1.2a, 1a.1.2b, and so forth. For the sake of clarity, we’ll start some of our later examples from 3a, and its first variation: 3a.1.1a. (The first move in the first turn of the first variation off of the first move of the third turn.) Got it? Cool. We’ll try a harder one.

7b.3.1b.2.4a.1.1b. (I said it would be hard.)

Remember, right to left. This is (1b) the second move in the first turn of (1) the first variation off of (4a) the first move of the fourth turn of (2) the second variation off of (1b) the second move in the first turn of (3) the third variation off of (7b) the second move of the seventh turn of the game.

Those of you quicker than me will have noticed something a little odd. Remember how I said the first move following 3a.1.1a is 3a.1.1b? Well, what happens if we make a variation off of 3a.1.1a? It turns out that it’s 3a.1.1a.1.1b¹. Remember, 3a refers to the first move. We can replace 3a with a new move, named 3a.1.1a. If we want to branch from 3a.1.1a, though, the next state is not 3a.1.1a.1.1a: we’ve already replaced 3a, the first move in the turn. What we want to do is replace the next move in the turn: 3a.1.1a.1.1b. The perhaps-unwanted side effect is that 3a.1.1a.1.1b and 3a.1.1b are siblings: both occur two moves after 3a. A little odd, but necessary.

That about covers the problem of addressing, which is the first problem I’ve addressed. The hard part isn’t generating variation states: the hard part is storing them and finding them by a human-readable address. (I won’t insult you by saying that it’s easy.)

There are other problems, of course: how to present this functionality to the user. I haven’t done that yet. Nor have I done saving and loading of games with variations. In fact, all of this work, which comes to about a week of evenings and 1000 lines of code, has knocked precisely one item off of my v0.4.x to-do list. Fortunately, I think I’ve done at least one of the hard things first. (Loading variations is, admittedly, going to be a huge pain.)

Anyway, you can expect more posts down the line. For now, I have some tests to write.

1. I use that phrasing—’it turns out’—advisedly. Because OpenTafl stores games as a series of board positions, rather than a series of moves, the indexing is all weird. For instance, the state labeled 1a is the starting position, and ‘1a’ refers to the move which exits the starting position. The entire main line of play is addressed in the same way: a state’s address refers to the move which exits that state. You no doubt see the failing here: a variation is a second way to leave the state, and OpenTafl’s not about that kind of ambiguity². Therefore, so that we can label every move in the game, addresses on variation states have to refer forward, to the move which enters them. The numbering is different between when you branch from 1a (becomes 1a.1.1a) and when you branch from 1a.1.1a (becomes 1a.1.1a.1.1b). Fun³.
2. Here we attribute to principle what is, in reality, just incompetence.
3. In the process of writing this blog post, I’ve discovered and fixed at least three or four inconsistencies in the naming scheme. Rough.

The Battle of Shilovo: 1942

Posted by Fishbreath

Welcome to Shilovo. It’s July 4th, 1942, and the Wehrmacht has embarked on yet another ambitious offensive: Fall Blau. This time, the plan focuses on the south, pushing from last year’s front (very roughly, a line from Kursk due south to Dnepropetrovsk, then southeast to Rostov, about 800 kilometers in total) to the Baku oilfields and the city of Stalingrad.

It’s only just begun, though, and we concern ourselves with the fighting around Voronezh, and more specifically, a work settlement a bit to the west called Shilovo. (It doesn’t exist anymore—it’s just part of Voronezh.) Shilovo sits on a hill overlooking the Don river, a strategically-important barrier keeping the Nazis out of Voronezh proper. Historically, the Germans took it on July 5th and 6th.

Hopefully, I’ll be able to hold them off a little better than that.

Notice a few features about this map: first, the UI I forgot how to hide. It’s covering the place name for ‘Trushkino’, the town at the bottom center controlled by the Germans. It faces Shilovo across a deep valley. Roads run northwest from Shilovo and northeast from Trushkino, then split to the north and northwest to meet one of the two crossroads objectives. Besides the valley between the two towns, and the hillside south of Shilovo, the map is more or less flat, which presents a problem: I know the Nazis have some armored vehicles, and I don’t have much in the way of anti-tank weaponry. The sum total of my force is as follows: two rifle companies, the battalion machine gun company (ten or so Maxim guns, all told), the battalion mortar company (same deal), and the battalion AT company (armed with anti-tank rifles, which may as well be rocks for all the good they do).

From the Russian side, this is almost entirely a defensive effort, and that’s reflected in my chosen deployment. (We won’t talk about my first mission in this campaign, a defense to the northwest. It didn’t go well.) One rifle company, under Homenko, is deployed at the northern crossroads, reinforced by most of Beda’s platoon. Drobotov’s platoon holds the central crossroads, while Churginov’s platoon serves as a reserve between the two. Bits and pieces of the machine gun company and the anti-tank company are detached to strengthen the two crossroads strongpoints.

The remainder of Beda’s platoon, along with the battalion mortars and the bulk of the machine guns, are deployed on the forward slope on the western approach to Shilovo, commanding the valley. With good, overlapping fields of fire, and tons of ammunition to boot, I suspect the machine guns will serve to hold the valley approach to Shilovo without issue. I’m more concerned about the central crossroads. If the Germans bring tanks down the west road, I’ll have a bad time of things. Hopefully, the northern crossroads strongpoint will be sufficiently distracting.

Anyway. Let’s get this show on the road. I had hoped to provide some extra screenshots here beyond the few I took during the battle, but alas, my VLC screenshot button isn’t working correctly, so you’ll just have to rely on your war correspondent, me.

The Germans begin their attack with a push, oddly enough, across the valley. The machine guns deliver a murderous hail of fire into the advancing Wehrmacht troops, and in large part, the advance stalls about halfway to my line. German forces will rally and push up the hill somewhat, but never in any organized manner, and never any closer than about one hundred meters to the guns.

Gunners on the northeastern outskirts of Shilovo engage German forces in the trees near the Trushkino road.

The northern crossroads, as I thought might be the case, turn out to be more interesting. It takes the Germans about ten minutes longer to make it down the road toward the strongpoint, but they arrive in greater force, and I have fewer heavy weapons to spare. It quickly becomes clear that the main German advance is coming from the west along the main road, so I shift some of the defenders facing north—a second machine gun team, and one of Beda’s squads—to meet the threat.

The fire on my position intensifies. The Germans clearly want this crossroads. Mortar fire begins to land in town, and the piddly 50mm mortars attached to each of my companies can’t even begin to fire in reply. They stick to shooting at the oncoming Germans, which is admittedly more scary than effective. (Your average 50mm mortar bomb has about 100 grams of explosive, which is less than some hand grenades of the time.)

The situation worsens about 20 minutes into the mission. A halftrack comes down the road, and while its mounted machine gun is keeping my anti-tank gunners’ heads firmly below trench level, a pair of Panzer IIs roll up the road. This is no good. Time to bring in the reserves.

A machine gun team shoots past Russian trenches (at frame left) toward advancing German infantry, while a Panzer approaches from the right.

One of the anti-tank gunners manages to get a shot off at the halftrack, which is enough to force its crew to bail out. By now, though, the Panzers have backed off, and are now working their way around to the north, where my defenses are lighter. One of them pushes into the town, about fifty yards behind the camera above, and begins shooting up my poor defenders. Fortunately, between the carnage west of Shilovo and the reserves arriving and bulking up the line south of the crossroads, the Germans realize they can’t hope to break through without further reinforcement. They call for a cease fire, and I gladly accept.

The casualty ratio favors me, as you might expect from a victory in a dug-in, defensive battle. I started with 400 men, of which about 250 were front-line combat troops, and lost 50, including a few machine guns lost and a few abandoned. (The abandoned ones will be recovered.) The Germans lost 150 out of 360, including one halftrack. I put some fire on both tanks, but neither appear to have been greatly inconvenienced by it, and undoubtedly, they’ll show up again.

Having survived this battle, I only had one more to play on the first turn, and it played out very similarly—the battle played out over Shilovo again, except shifted one grid square south. The same deployment, with machine guns covering open ground, served me well, and I’m into the second turn of the campaign now. I was able to bring some artillery up all along the line, along with anti-tank guns and air spotters. I expect the next few battles will feature much improved fireworks.

OpenTafl v0.3.0.0b released!

Posted by Fishbreath

Well, actually, it was released on Thursday, but parvusimperator needed some space filled on Saturday, so here we are.

OpenTafl v0.3.0.0b is a major milestone release, as you can tell by the version number increment. It brings support for network play and, as far as I know, is the only real-time tafl game with support for a proper game clock built-in. From now on, there will be a server running at intersect.manywords.press, which is the default setting in OpenTafl. Selection of server is user-configurable, and anyone can run a server by using the –server command line option, and optionally –threads <#> to indicate how many worker threads the server should run.

In other news, OpenTafl now lives primarily at Bitbucket: you can find the link (and the link to the updated version) on the OpenTafl website. Bitbucket has a built-in bug tracker, so if you encounter any issues, file them over there¹. The source will continue to be mirrored at the old Many Words hgweb instance, but those updates will lag by a day².

Since I have a few extra days³ to write this, I may as well go into a little more detail on OpenTafl’s server architecture, for the curious.

For maximum scalability, the OpenTafl server does as much work as possible on a set of worker threads, which pull tasks from a priority-based task queue. The task queue has three separate internal queues: a high-priority, standard-priority, and low-priority queue. Tasks are preferentially executed from higher-priority queues, interleaving low-priority queues as necessary to prevent total starvation of the lower-priority queues⁴. When a client connects to the server, a client thread is created. This client thread has the sole task of taking incoming data and pushing it to a handle-communication task on the standard-priority queue. The handle-communication task in turn pushes tasks to the high or low priority queues.

On the server, clients can belong to no game, belong to a game, but remain in the lobby, or belong to a game and be in the game UI. The server sends lobby updates (connected users, available games) to clients in the first two states at regular intervals. Clients in-game don’t get lobby updates, to save on bandwidth and time spent sending lobby updates.

Finally, some player information is stored on the server: currently, usernames, hashed and salted passwords, and time of last login. The mechanism by which this data is stored is transparent to OpenTafl—hidden by an interface. Right now, the ‘database’ is a pipe-separated file. In the future, if I decide to go further in on the idea of server-side recordkeeping, it’ll be a fairly simple change to kick the file-database to the curb and swap in something like Hibernate+HSQLDB.

That’s where we stand now. Obviously, it’s subject to change, but generally speaking, I’m happy with it. Keep your eyes open for spectator mode and headless AI play.

1. Over there, over there…
2. Or possibly longer, until I get the cronjob working right.
3. It’s a musical tafl post, evidently. You’d better believe that my wife and I always sing ‘Extra daaaaaay!’ whenever we have reason to say ‘extra day’ now.
4. Or so it is planned. My current implementation behaves like this under light loads, but inverts its behavior at the worst possible time: as the queues begin to get busy.

Fishbreath Flies: The Hip

Posted by Fishbreath

I hope to write on the DCS Mi-8 (as well as some other transport helicopters) in the coming few weeks, but in the interim, here’s a video of the Mi-8 in its native environment: snow and ice.

OpenTafl v0.2.1b released

Posted by Fishbreath

OpenTafl v0.2.1b has been released here. It has almost full support for OpenTafl Engine Protocol, and the protocol specification is also near-final. (I’ll be finalizing the protocol definition in a release or two.)

The other feature on the docket for 0.2.x is a local engine-vs-engine tournament mode, which I (and, for that matter, you) can use to quantify strength gains from changes you make to your AI, by playing the current version against prior versions. I’ll provide more information on this feature as I develop it.

On tafl: time use strategy

Posted by Fishbreath

I wrote last month on game clocks for tafl games, and settled on something like the go game clock: main time, plus replenishing overtimes. You can find my reasoning at that link, so I won’t go into it here, rather taking the opportunity to chat about how to use the time that game clock gives you.

In undertaking any previously un-undertaken task in AI development, this question usually has an instructive answer: “How does chess do it?” Unfortunately, this case is one of the ones where the answer is a little less instructive. A perfectly functional, if not altogether optimal, time usage strategy for a chess AI is to assume that, however many moves the game has taken so far, finishing it will take 20 to 40 more moves. The AI should therefore use between 1/20 and 1/40 of the time remaining. Simple, and surprisingly effective: a chess game gets less complex over time, as pieces come off of the board.

No such luck with tafl games, though. A tafl game starts at moderate complexity, then grows significantly in complexity into the middle game, before tailing off again as the action focuses on a corner and as captures are made¹. So, for fixed-time games, we want to spend a moderate amount of time on the early game, the bulk of the time in the middle game, and a decreasing amount of time in the endgame, conserving it as long as possible. For fixed-time games, OpenTafl, in its current incarnation, assumes that it’ll want to make a certain number of opening game moves (3, 5, or 10, for board sizes 7, 9, and 11), a certain number of midgame moves (6, 10, or 20), and an indeterminate number of endgame moves after the midgame. It dedicates a fixed amount of time (at present, 5%, but probably likely to increase) to finish all of its opening moves, a much longer amount of time (presently 75%) to the midgame moves, and a constantly decreasing amount of time to its endgame moves, with the remaining 20%. This seems functional enough in play with me, but the numbers will obviously be tweaked once I finish automatic self-play. (A feature which will be in 0.2.x, I’m pretty sure, given how handy it is for AI developers.)

Four hundred words in, though, and I haven’t yet covered the topic I said I wanted to cover to begin with: how do we handle overtime timing? It turns out that it’s simpler than fixed time, even: take the number of midgame moves above, use the main time for them, and use overtimes for all remaining moves. This, too, may be non-ideal—I could see some benefit to using the fixed-time framework for early game and midgame, and falling back on the overtimes only for endgame moves².

There are some extra features I hope to add at some point: a little bit of history awareness, so that, whenever OpenTafl sees a wild swing in expected value for the current state, it can dedicate extra time to a deeper search. Another useful addition would probably be some slightly more adaptive time usage generally, with the ability to understand when the game has moved from the early game to the midgame, so that it can adjust its thinking time based on the state of the board, rather than fixed numbers of moves. That said, it’s heartening to see that my system for main time plus overtime time usage is basically what AlphaGo used in its games against Lee Sedol³.

Finally, and a little tangentially, I want to talk about OpenTafl’s success against me on various time controls. Now, I am a poor to average tafl player at best, so you should not take this paragraph to mean that I think OpenTafl has much of a chance against anyone good. Keep that caveat in mind. OpenTafl generally loses to me, and loses pretty severely, on longer time controls: this is a result of its flawed evaluation function⁴, which, along with alpha-beta pruning, frequently gets rid of branches that are ultimately better than the one it chooses to explore most deeply. The matches are closer with blitz timing: a 60-second brandub game proved to be pretty intense, and I only won with six seconds left on the clock. The shorter time control plays to the computer’s strengths: exhaustive reading three or four plies out, and it gets much closer to me in skill (in part because I can’t read exhaustively in such a small amount of time).

Anyway. That’s your inside look at the most recent major OpenTafl feature. Stay tuned for more in the coming weeks.

1. I should note that by ‘complexity’ here I’m referring mainly to branching factor, the quantitative measure, mixed with a little bit of my sense for what proportion of moves are worth considering, a much more qualitative one, and, for that matter, a much more unreliable one.
2. Although it may strike you that 30 moves for an 11×11 game seems a bit light, consider that it’s actually 30 turns, or 60 moves total: for most 11×11 games, the game has either ended by then, or we’re well into the midgame or late game.
3. I got sleepy just thinking about being up that late. It is my sincere hope that when DeepMind announces its match against Ke Jie or whoever’s next, that they agree to play in London. Or maybe New York. Somewhere closer to Eastern Time, please.
4. A topic I intend to go into deeper detail on in a later post, when I’m working on some cleanup and improvement.

The Soapbox

Opinions may vary

Tag Archives: gaming

Opentaflvecken: final AI notes, plans for the future

Opentaflvecken: AI progress, continued

Opentaflvecken: AI progress

OpenTafl v0.4.2.4b released!

On tafl: variations and puzzles, design goals and approach

The Battle of Shilovo: 1942

OpenTafl v0.3.0.0b released!

Fishbreath Flies: The Hip

OpenTafl v0.2.1b released

On tafl: time use strategy