Tag Archives: gaming

OpenTafl v0.2.0: Engine support arrives

In experimental, incomplete fashion, but it’s sufficient for you, as a developer, to start working with OpenTafl in engine mode. You can find the new version at the OpenTafl site. The engine protocol document has been updated with the latest developments. Although I won’t be able to say for sure until I’ve finished the implementation, this version of the protocol is nearly feature-complete, and likely won’t change very much going forward.

Expect a post in the not-too-distant future on OpenTafl’s time usage planning and time-filling activities, both of which are topics of some interest.

AlphaGo-Lee Sedol Game 5 Liveblog

AlphaGo-Lee Sedol: 2-0

AlphaGo takes Lee Sedol in a second game, going up 2-0 and setting itself up for a match victory, if it wins any of the remaining games. Here’s the game replay, or you can go to DeepMind’s Youtube channel and watch the stream. The replay has decent commentary in the chat window, albeit from strong amateur players rather than pros. You can also find some remarks from An Younggil, GoGameGuru’s resident pro, in the comment thread.

This one played out differently; fights didn’t develop until later on in the game, and Lee Sedol did indeed play a more normal opening. He also took a lot more time to play today, a sign of greater caution and probably greater respect for AlphaGo. AlphaGo, however, got ahead and remained ahead on time, and Lee was into byo-yomi while AlphaGo still had fifteen minutes of main time left. At about move 210, Lee resigned, and seemed shaken by the loss.

Which is understandable. Go players and go experts had gotten used to the idea that their pursuit was computationally unassailable. (On your average desktop PC, it still is, for now.) It’s undoubtedly a bit of a shock to them that the tables have turned so quickly; I think we from the computational side are a little better able to recover, because reading about how AlphaGo plays makes us think, “Oh, yeah, that sounds like it should work really well.”

What does it mean for go, though? Well, let’s look at what happened to chess: it’s still played at high levels, human matches draw more eyes than machine matches, you can always find an opponent who is about as good as you are, and correspondence chess is basically dead.

Wait a minute. That last one sounds bad.

Something we discovered, as chess engines capable of beating the best humans became easily available, is that chess, under the rules we use, is a very draw-happy game. In fact, in correspondence chess, where the use of AI as an aid is accepted, draws are now the overwhelmingly most likely outcomes for games: between 80% and 90% of games in the most recent few correspondence chess championships have been drawn. This is widely seen as bad: a tie is a deeply unsatisfying outcome, and it makes sorting players in a tournament a bit of a pain. Much ink both real and digital has been spilled on how to solve the draw problem in chess; if it should turn out (as it did for checkers) that optimal play in chess inevitably leads to a draw, something will probably have to be done, but no chess expert wants to commit to changing the rules in a way whose implications may not be known.

Go1 is a game whose ultimate scoring comes down to points. Playing first is known to be advantageous, so the rules traditionally spot white (the second player to move) 6.5 or 7.5 points, depending on the scoring method in use. The half-point ensures that there can be no ties, and should AI reveal heretofore unknown differences in strength from that 6.5 or 7.5 edge, the handicap can simply be adjusted.

So, where will go be in ten years? Just as vital as today, and perhaps even more so: a hundred thousand people have watched AlphaGo play Lee Sedol on Youtube, and I can’t help but think that a lot of those are English-speakers who have not yet been really introduced to the game. Although I still find go inscrutable, and don’t expect I’ll ever be anything more than a poor recreational player, some of the fellow AI nerds watching will likely be drawn to the game, and once you have us nerds on your side…

1. And, for that matter, most solutions for how to handle tafl matches: bidding on the number of turns a win will take, and playing two-game matches, switching sides and giving the win to the player who wins fastest (if they split the games), both yield a natural, built-in way to compensate for shifts in observed balance. Furthermore, since tafl rules are, as yet, not standardized in any meaningful way, differing tournament to tournament, tafl has an edge on chess that go does not: the tafl community is very open to rules changes to support greater balance or deeper strategy.

AlphaGo Defeats Lee Sedol

In what even I, who predicted an AlphaGo series win, regard as a bit of a surprise, AlphaGo defeated Lee Sedol in the first game of the five-game match. AlphaGo won by Lee’s resignation, which came 186 moves into the game, up by (the consensus seems to be) between three and seven points at the time.

Our liveblog has our detailed commentary from the first 120 or 130 moves of the game, after which I went to bed. Here, though, are some highlights and post-game thoughts.

  • AlphaGo is much stronger now than it was before. Michael Redmond, a 9-dan pro who was engaged to commentate, remarked as much: AlphaGo didn’t make any serious mistakes.
  • One of AlphaGo’s suspected weaknesses before the game was coming out ahead after complicated fights; we thought this because it avoided getting involved in very many serious fights against Fan Hui. It turns out that AlphaGo can indeed fight, and fight well, as we saw with the fight at the top center of the board that developed early on in the running. AlphaGo came out ahead there, or at the very least, ended up with a strong enough position that she1 could hold it and use it as a base to fight over the center of the board, while strengthening her position in the top left.
  • Lee Sedol seemed a little flustered by the level of AlphaGo’s play. He, like a lot of go experts, seemed to make the twin assumptions that: 1) AlphaGo would not improve dramatically between October and today, and 2) AlphaGo was not vastly stronger than Fan Hui in October. AI experts suspected that the first item was false, and they turned out to be correct. That said, it’s hard to fault go experts for lacking experience with AI systems, and not realizing that six months is an eternity for a learning system to improve. It is much easier to fault go experts for the second point, which is one of simple math. A 9-dan pro should beat a 2-dan pro (like Fan Hui) 95%-99% of the time. Although the known AlphaGo-Fan Hui games are too small a sample to draw firm conclusions from, it is telling that AlphaGo defeated Fan Hui by large margins or resignation in all of the long-time games, losing only during quick play; either way, its record suggests that AlphaGo is properly probably at least 5-dan, and may be up at the 9-dan level.
  • It’s possible that a lot of AlphaGo’s added strength came from the extra time to think, but it’s difficult to do much more than speculate on that front. Suffice it to say that this could be a contributing factor.
  • One thing we haven’t seen very much of is AlphaGo’s ability in close endgames. Given that endgames are basically tests of reading, and AlphaGo has proven to be superb at that so far, I wouldn’t expect its endgames to be any worse than its middle games, but Lee Sedol may nevertheless aim to explore AlphaGo’s endgame capabilities with a more relaxed early game, next time out.
  • One of my prematch predictions seems to have turned out true: computers play differently than do humans, and it may be that humans play more optimally than humans. In the late midgame, around move 120 to 150, in the bottom right and center right of the board, AlphaGo played very cautiously, keeping stones alive (particularly in the bottom right) and denying Lee Sedol territory, without aiming to decisively win the corner itself. Go experts would denounce this sort of play as ‘slow’, given that it seems to yield tempo, but if AlphaGo is confident that she’s already ahead, she need not play to do any more than deny Lee the win.

That’s all for now. We’ll be back with more at the end of the weekend, or after any particularly interesting games, and we might liveblog the final game in the series on Monday night. If so, you’ll hear about it here.

AlphaGo vs. Lee Sedol Game 1 Liveblog

OpenTafl v0.1.9b: the end of the line for 0.1.x

OpenTafl v0.1.9b has been released! This release adds support for a game clock. The AI is able to plan its time usage in accordance with the amount of time it has left. Additionally, settings are now saved from run to run.

This marks the last major release in the OpenTafl v0.1 series. The next major release will be 0.2.0b, including engine mode and related features, work on which will start at the end of March or the beginning of April. Looking ahead to the tournament, I may switch around the tasks in v0.3 and v0.4 from the roadmap I put up a few posts ago—network play lets me set up each AI in its own Amazon EC2 instance and have them play that way, which simplifies the rules by letting AIs think on their opponent’s turn, and removes the need for me to figure out how to limit an arbitrary process’s resource use (although it does add some complexity in building a tournament framework).

Now that it’s feature-complete, v0.1.9b will remain the stable version until v0.2.x is ready to go. What this means for repository structure is that I’ll create a new v0.2.x branch, and that’ll be the head of the tree for dev purposes. Any bugs discovered in v0.1.9b will get patched on master and pushed into 0.2.x, if it makes sense to do so. When 0.2.x is done, it’ll get merged into master, and I’ll start a new 0.3.x branch, and so on until perhaps, someday, a 1.0.

On tafl: timing

In a previous tafl post, I remarked on the lack of pre-defined notations and mathematical resources for tafl games. In this one, I will remark on a more fundamental deficit: namely, this is the first article I can find on the Internet which will attempt to define a standard for timed tafl games.

Surprising? Well, yes and no. Competitive tafl games are rarely played in person. The only live tournament (in the modern era, anyway) put on to date was run by the Fetlar Hnefatafl Panel, and as tournaments go, it was a fairly casual affair—the games were untimed, with a gentleman’s agreement not to take an undue amount of time per move. The other major tafl tournaments are run at Aage Nielsen’s site, and are correspondence play: timed, but not in the sense we’re looking for here.

The Fetlar panel suggests using a regular chess clock, but that seems unappealing to me. Broadly speaking, tafl games are defined by two stages: an opening, where the two sides jockey for position and control in the corners, and an endgame, where the king has left his throne and is pushing to one corner. The length of the opening doesn’t vary much for a particular board size, and generally speaking, features thoughtful, cautious play. Once the king leaves the throne to start the endgame, the flavor of play changes dramatically. The king’s side must make bold, sweeping moves, and the besieging side must split its attention between putting out fires and shoring up its leaky positions. What was a game of edging around the opponent and building one’s own structure becomes a head-on conflict.

The similarity to battle is striking, and suggests a method of timing that mirrors the structure of the game. Go players among the readership will be familiar with the concept of byo-yomi, a kind of overtime comprising a number of periods of a given length. If you don’t use all of one of your byo-yomi periods, you get it back in full. If you do use it, you move onto your next one. I suggest a tafl timing scheme using a small amount of main time, and a relatively generous overtime allocation. My thinking is this: the main time ought to be used for the opening, and once the endgame begins, the players should be nearly out and into overtime. I don’t have good information on how tafl games play out yet, but as a rough starting point, I would suggest a main time of about ten minutes, and three to five overtime periods of one minute each. More generally, I propose this mutation from a single main-time chess clock: for a given chess clock length in minutes (say, 60), divide by six to get the tafl main time (10, in this case). Divide that by ten to get the length of the overtime periods. Depending on the desired difficulty and clock sensitivity, set the number of overtimes to somewhere between two and six.

Not only does this mirror the calm-to-intense structure of tafl games, it also serves a practical purpose. Although tafl games and chess appear to have similar distributions of game lengths1, I get the feeling that the most competitive chess games are clustered around the average more than the most competitive tafl games2 are. An adaptive time control like byo-yomi frees endgames from the tyranny of the clock. Some games will end up playing out more quickly, and won’t feel clock pressure at all; I don’t feel that this is an indictment of the scheme. Some games will end up playing out at around the average length, where the clock is set perfectly: players will feel some time pressure, but have sufficient main time to lay their plans for the endgame. Some games, finally, will end up running longer. With an ordinary chess clock, a longer game would end with increasingly desperate time pressure. With byo-yomi time, the clock remains a driving force, but doesn’t get much worse with time.

Although I do propose this system for general use, I plan on different time controls for the AI-vs-AI tournament: no main time, and three ten-second overtime periods. Quick-play tournaments are known to be as effective as long-play tournaments in sorting AIs by relative skill level, and a large number of five-minute games is a much easier task for yours truly than a large number of thirty-minute games.

In final news, you can find v0.1.7.1b here, which fixes a few bugs in 0.1.7 and adds threefold repetition rules.

  1. The standard deviation for game length in number of moves for chess comes in at about 25, based on correspondence games listed here, some eyeballing of graphs, and some math based on histogram values, and 11×11 tafl games come in at 23, taken in aggregate, from Aage Nielsen’s tafl tournament history. Neither of these figures are set in stone, for reasons I’ll go into when I write up my findings.
  2. Aage Nielsen’s tournament history, above, has a large number of games between top players in the vicinity of 40 moves, and a nearly equal number at 70 to 90 moves.

OpenTafl v0.1.6.1b, and progress generally

I’ve released a new version of OpenTafl. Among several other changes, there are two I would like to highlight.

First off, the groundwork for OpenTafl Notation-style engines is complete: there is code in OpenTafl which can produce and read in OTN rules records, code which can produce and read in OTN position records, and code which can produce OTN move records. This is a major milestone: producing and reading position and rules records is tricky, as is producing move records. (Move records should be fairly easy to read in, compared to the others; they encode a much smaller amount of critical data.) Note that the notation spec has changed somewhat, based on lessons I’ve learned over the course of my weekend of implementation. If you’ve done any work based on it yourself, you’ll want to go over the spec again. Hopefully, it won’t change much going forward, although the specification for OpenTafl-compatible engines is likely to change as I work on implementing that in the next month or two. You can see some of the work in version 0.1.6.1b: each position should provide its OTN position record, and games against the AI will report the AI’s thinking using OTN move records.

Second, OpenTafl’s memory usage has once again been dramatically reduced, to the point where it should be comfortably playable to the maximum depth at which it’s feasibly playable (currently depth 6-7 for brandub, depending on speed, and depth 4-5 for other games, depending on size and complexity) without running into memory issues, on any reasonable computer. I intend to go into a little more depth on how I accomplished this, but for the moment, you’ll just have to take my word for it.

You can find the full change logs at the OpenTafl site.

Stop the presses! Go AI defeats human professional!

A few days ago, Google’s DeepMind announced that they reached the most significant milestone in pure AI since the solving of checkers in 2007: a Go AI called AlphaGo that’s competitive with human professionals. (Strictly speaking, computers could play Go before yesterday, but they were ranked around midlevel amateurs at their best.) The whole paper is available here.

For those of you who aren’t as much nerds about AI as I am, here’s a quick primer on why this was thought to be a very hard problem (so hard that the people involved in the prior state of the art thought it was at least a decade away):

In the most theoretical conception, game-playing computers for perfect-information zero-sum games (most abstract board games: those with no hidden state with all players working toward the same objectives, to be not entirely accurate but more readable than perfect accuracy allows for) are as simple as exploring every possible move and every possible countermove from the current position to the end of the game. Assuming perfect play on both sides, every result will be either a win, a loss, or a draw—that is, abstract strategy games are perfectly deterministic. (Checkers isn’t completely solved, but we do know now, as a result of the work from 2007, that perfect play on both sides from the start always yields a draw.)

This is, however, a massively impractical way to actually play a game, because the number of positions to explore rapidly turns intractable. Speed-focused modern chess engines search on the order of millions of nodes (in the game tree) per second, but searching a chess position exhaustively to a depth of 7 requires a search of about 60 billion nodes. Traditional games like chess and checkers yielded to some optimizations on this process:

  • It’s easy to look at a chess or checkers board and tell how well the game is going for a certain player: in both games, it comes down primarily to the balance of pieces. (The last I read, advantages in position are worth a pawn or two in a game of chess if they’re all going your way; the queen is worth nine pawns.) So, you don’t need to explore all the way to the bottom of the game tree to get an idea of which directions are promising. You just explore as deep as you can and evaluate the position.
  • If you have a good evaluation function (that is, one that generally only evaluates a position as better when it’s closer to winning), you can do some easy pruning when you come across a game state that’s known to be worse than the worst possibility you’ve explored: in that case, you just don’t explore any further in that direction. It works even better if you can assess which moves are likely to be good and explore those first: if you try the best move first, every subsequent move is going to turn out to be worse, and you’ll save a bunch of time.

So chess computers today, working with those tools, are better than the best human players: the effective branching factor (the number of moves to explore at each state in the tree), using the pruning techniques above, goes from about 35 to between 1 and 10, depending on the exact techniques used. The reason Go didn’t fall to traditional techniques is because it’s just so much more computationally difficult. Chess’s branching factor (the number of possible moves at each state) is about 35, and games last about 80 moves; Go’s branching factor is about 250 on average, and runs about 150 moves. It also features a few difficulties that chess does not:

  • It’s a very non-local game, both in space and in time: a move made at the very start of the game halfway across the board could have implications fifty turns later on the strength of the positions played at the start. This is a horizon problem: in chess, most positions become quiescent—not likely to affect the end effect of that position on the overall evaluation—after the captures stop. Modern chess engines will play all the way through capture sequences for this reason; there’s no similar metric to use for go engines.
  • It’s very difficult to evaluate a board position on purely objective grounds, or rather, we haven’t figured out how to phrase, mathematically, what about a good go position is good. Neither present control of territory nor present number of captures bears very strongly on the eventual result.

Because of the size of the problem space for Go, traditional techniques don’t work. The branching factor remains too high. Modern Go programs use one (or sometimes both) of two approaches: either they use hand-coded expert knowledge to sort and select promising moves for tree expansion (which frequently misses oddball moves that are nevertheless good), or they randomly play out a bunch of games from the current position to the end, and sample the result to pick the best move on aggregate (which frequently misses obviously good moves). The best of the pre-AlphaGo bunch used both, combining expert knowledge to pick the best moves to sample with the oddball-finding power of random simulation.

AlphaGo does a little bit of that, but at its heart, it learned to play a lot like humans do: DeepMind’s researchers fed it a diet of 30 million sample positions and the eventual results, and built a neural network to identify what a good board looks like and what it doesn’t. (Literally—the input into the program is a 19×19 image, with pixels set to values representing empty, black stone, or white stone.) They built a second neural network to identify which states are the best ones to simulate through in a random simulation, and Bob’s your uncle: a professional-level Go program. Their paper suspects it’s about as good as a mid-level human professional—the exhibition tournament they included in the paper saw AlphaGo beat the European human champ five games to zero, four by resignation, but the Euro champ isn’t a top-tier player worldwide. February will see an exhibition tournament between the South Korean champion, a world-class player; we’ll see how it does against him. AlphaGo also won 499 out of 500 games against the prior state of the art Go computers.

The most interesting thing about this development is that it learned to play a lot like a human would—it studied past games and figured out from that study what was good play and what wasn’t, then played a lot (against itself and past versions of itself), which is roughly analogous to playing a lot and working on go problems (the way human go players are supposed to get better). The obstacle to a general game-playing AI (I could buy a copy of Arkham Horror, finally, without having to rope the wife into an eight-hour marathon of doom!) is that training neural networks is currently pretty slow. As I mentioned in the first post, AlphaGo had to study about thirty million positions and play itself many, many times to get to the level of competence it’s at now; presumably, that will improve as DeepMind hones its learning techniques and discovers new and faster ways.

That said, humans still have one thing to be proud of: efficiency. The version of AlphaGo that beat the European champ ran on about 1200 CPUs and about 200 GPUs, whereas the human brain, which is nearly as good, draws about 20 watts.

Some extra reading for you: GoGameGuru has all the game replays, which are interesting if you’re familiar with Go (it doesn’t take much—if you’ve seen a human game or five and a game-against-computer or five, you’ve probably already noticed the differences in play, if only subconsciously) AlphaGo has a style that looks very human. Apparently, Go professionals expect the Korean champ to beat AlphaGo, and another source has the game taking place in March. If I were the champ, I’d be pushing for, like, next week. I don’t know how long the million-dollar prize is gonna be good for.

Here’s a commentary on one of AlphaGo’s games against the European champ.

I originally wrote this analysis for the people over at the Quarter to Three forum, which explains the breezy, off-the-cuff style. If you’ve been following us for a while, some of the introduction will have seemed repetitive, too. Anyway, I hope it was still informative and interesting. -Fishbreath