A few days ago, Google’s DeepMind announced that they reached the most significant milestone in pure AI since the solving of checkers in 2007: a Go AI called AlphaGo that’s competitive with human professionals. (Strictly speaking, computers could play Go before yesterday, but they were ranked around midlevel amateurs at their best.) The whole paper is available here.
For those of you who aren’t as much nerds about AI as I am, here’s a quick primer on why this was thought to be a very hard problem (so hard that the people involved in the prior state of the art thought it was at least a decade away):
In the most theoretical conception, game-playing computers for perfect-information zero-sum games (most abstract board games: those with no hidden state with all players working toward the same objectives, to be not entirely accurate but more readable than perfect accuracy allows for) are as simple as exploring every possible move and every possible countermove from the current position to the end of the game. Assuming perfect play on both sides, every result will be either a win, a loss, or a draw—that is, abstract strategy games are perfectly deterministic. (Checkers isn’t completely solved, but we do know now, as a result of the work from 2007, that perfect play on both sides from the start always yields a draw.)
This is, however, a massively impractical way to actually play a game, because the number of positions to explore rapidly turns intractable. Speed-focused modern chess engines search on the order of millions of nodes (in the game tree) per second, but searching a chess position exhaustively to a depth of 7 requires a search of about 60 billion nodes. Traditional games like chess and checkers yielded to some optimizations on this process:
- It’s easy to look at a chess or checkers board and tell how well the game is going for a certain player: in both games, it comes down primarily to the balance of pieces. (The last I read, advantages in position are worth a pawn or two in a game of chess if they’re all going your way; the queen is worth nine pawns.) So, you don’t need to explore all the way to the bottom of the game tree to get an idea of which directions are promising. You just explore as deep as you can and evaluate the position.
- If you have a good evaluation function (that is, one that generally only evaluates a position as better when it’s closer to winning), you can do some easy pruning when you come across a game state that’s known to be worse than the worst possibility you’ve explored: in that case, you just don’t explore any further in that direction. It works even better if you can assess which moves are likely to be good and explore those first: if you try the best move first, every subsequent move is going to turn out to be worse, and you’ll save a bunch of time.
So chess computers today, working with those tools, are better than the best human players: the effective branching factor (the number of moves to explore at each state in the tree), using the pruning techniques above, goes from about 35 to between 1 and 10, depending on the exact techniques used. The reason Go didn’t fall to traditional techniques is because it’s just so much more computationally difficult. Chess’s branching factor (the number of possible moves at each state) is about 35, and games last about 80 moves; Go’s branching factor is about 250 on average, and runs about 150 moves. It also features a few difficulties that chess does not:
- It’s a very non-local game, both in space and in time: a move made at the very start of the game halfway across the board could have implications fifty turns later on the strength of the positions played at the start. This is a horizon problem: in chess, most positions become quiescent—not likely to affect the end effect of that position on the overall evaluation—after the captures stop. Modern chess engines will play all the way through capture sequences for this reason; there’s no similar metric to use for go engines.
- It’s very difficult to evaluate a board position on purely objective grounds, or rather, we haven’t figured out how to phrase, mathematically, what about a good go position is good. Neither present control of territory nor present number of captures bears very strongly on the eventual result.
Because of the size of the problem space for Go, traditional techniques don’t work. The branching factor remains too high. Modern Go programs use one (or sometimes both) of two approaches: either they use hand-coded expert knowledge to sort and select promising moves for tree expansion (which frequently misses oddball moves that are nevertheless good), or they randomly play out a bunch of games from the current position to the end, and sample the result to pick the best move on aggregate (which frequently misses obviously good moves). The best of the pre-AlphaGo bunch used both, combining expert knowledge to pick the best moves to sample with the oddball-finding power of random simulation.
AlphaGo does a little bit of that, but at its heart, it learned to play a lot like humans do: DeepMind’s researchers fed it a diet of 30 million sample positions and the eventual results, and built a neural network to identify what a good board looks like and what it doesn’t. (Literally—the input into the program is a 19×19 image, with pixels set to values representing empty, black stone, or white stone.) They built a second neural network to identify which states are the best ones to simulate through in a random simulation, and Bob’s your uncle: a professional-level Go program. Their paper suspects it’s about as good as a mid-level human professional—the exhibition tournament they included in the paper saw AlphaGo beat the European human champ five games to zero, four by resignation, but the Euro champ isn’t a top-tier player worldwide. February will see an exhibition tournament between the South Korean champion, a world-class player; we’ll see how it does against him. AlphaGo also won 499 out of 500 games against the prior state of the art Go computers.
The most interesting thing about this development is that it learned to play a lot like a human would—it studied past games and figured out from that study what was good play and what wasn’t, then played a lot (against itself and past versions of itself), which is roughly analogous to playing a lot and working on go problems (the way human go players are supposed to get better). The obstacle to a general game-playing AI (I could buy a copy of Arkham Horror, finally, without having to rope the wife into an eight-hour marathon of doom!) is that training neural networks is currently pretty slow. As I mentioned in the first post, AlphaGo had to study about thirty million positions and play itself many, many times to get to the level of competence it’s at now; presumably, that will improve as DeepMind hones its learning techniques and discovers new and faster ways.
That said, humans still have one thing to be proud of: efficiency. The version of AlphaGo that beat the European champ ran on about 1200 CPUs and about 200 GPUs, whereas the human brain, which is nearly as good, draws about 20 watts.
Some extra reading for you: GoGameGuru has all the game replays, which are interesting if you’re familiar with Go (it doesn’t take much—if you’ve seen a human game or five and a game-against-computer or five, you’ve probably already noticed the differences in play, if only subconsciously) AlphaGo has a style that looks very human. Apparently, Go professionals expect the Korean champ to beat AlphaGo, and another source has the game taking place in March. If I were the champ, I’d be pushing for, like, next week. I don’t know how long the million-dollar prize is gonna be good for.
Here’s a commentary on one of AlphaGo’s games against the European champ.
I originally wrote this analysis for the people over at the Quarter to Three forum, which explains the breezy, off-the-cuff style. If you’ve been following us for a while, some of the introduction will have seemed repetitive, too. Anyway, I hope it was still informative and interesting. -Fishbreath