Tag Archives: coding

OpenTafl v0.2.0: Engine support arrives

In experimental, incomplete fashion, but it’s sufficient for you, as a developer, to start working with OpenTafl in engine mode. You can find the new version at the OpenTafl site. The engine protocol document has been updated with the latest developments. Although I won’t be able to say for sure until I’ve finished the implementation, this version of the protocol is nearly feature-complete, and likely won’t change very much going forward.

Expect a post in the not-too-distant future on OpenTafl’s time usage planning and time-filling activities, both of which are topics of some interest.

AlphaGo-Lee Sedol Game 5 Liveblog

Posted by Fishbreath

AlphaGo-Lee Sedol: 2-0

Posted by Fishbreath

AlphaGo takes Lee Sedol in a second game, going up 2-0 and setting itself up for a match victory, if it wins any of the remaining games. Here’s the game replay, or you can go to DeepMind’s Youtube channel and watch the stream. The replay has decent commentary in the chat window, albeit from strong amateur players rather than pros. You can also find some remarks from An Younggil, GoGameGuru’s resident pro, in the comment thread.

This one played out differently; fights didn’t develop until later on in the game, and Lee Sedol did indeed play a more normal opening. He also took a lot more time to play today, a sign of greater caution and probably greater respect for AlphaGo. AlphaGo, however, got ahead and remained ahead on time, and Lee was into byo-yomi while AlphaGo still had fifteen minutes of main time left. At about move 210, Lee resigned, and seemed shaken by the loss.

Which is understandable. Go players and go experts had gotten used to the idea that their pursuit was computationally unassailable. (On your average desktop PC, it still is, for now.) It’s undoubtedly a bit of a shock to them that the tables have turned so quickly; I think we from the computational side are a little better able to recover, because reading about how AlphaGo plays makes us think, “Oh, yeah, that sounds like it should work really well.”

What does it mean for go, though? Well, let’s look at what happened to chess: it’s still played at high levels, human matches draw more eyes than machine matches, you can always find an opponent who is about as good as you are, and correspondence chess is basically dead.

Wait a minute. That last one sounds bad.

Something we discovered, as chess engines capable of beating the best humans became easily available, is that chess, under the rules we use, is a very draw-happy game. In fact, in correspondence chess, where the use of AI as an aid is accepted, draws are now the overwhelmingly most likely outcomes for games: between 80% and 90% of games in the most recent few correspondence chess championships have been drawn. This is widely seen as bad: a tie is a deeply unsatisfying outcome, and it makes sorting players in a tournament a bit of a pain. Much ink both real and digital has been spilled on how to solve the draw problem in chess; if it should turn out (as it did for checkers) that optimal play in chess inevitably leads to a draw, something will probably have to be done, but no chess expert wants to commit to changing the rules in a way whose implications may not be known.

Go¹ is a game whose ultimate scoring comes down to points. Playing first is known to be advantageous, so the rules traditionally spot white (the second player to move) 6.5 or 7.5 points, depending on the scoring method in use. The half-point ensures that there can be no ties, and should AI reveal heretofore unknown differences in strength from that 6.5 or 7.5 edge, the handicap can simply be adjusted.

So, where will go be in ten years? Just as vital as today, and perhaps even more so: a hundred thousand people have watched AlphaGo play Lee Sedol on Youtube, and I can’t help but think that a lot of those are English-speakers who have not yet been really introduced to the game. Although I still find go inscrutable, and don’t expect I’ll ever be anything more than a poor recreational player, some of the fellow AI nerds watching will likely be drawn to the game, and once you have us nerds on your side…

1. And, for that matter, most solutions for how to handle tafl matches: bidding on the number of turns a win will take, and playing two-game matches, switching sides and giving the win to the player who wins fastest (if they split the games), both yield a natural, built-in way to compensate for shifts in observed balance. Furthermore, since tafl rules are, as yet, not standardized in any meaningful way, differing tournament to tournament, tafl has an edge on chess that go does not: the tafl community is very open to rules changes to support greater balance or deeper strategy.

AlphaGo Defeats Lee Sedol

Posted by Fishbreath

In what even I, who predicted an AlphaGo series win, regard as a bit of a surprise, AlphaGo defeated Lee Sedol in the first game of the five-game match. AlphaGo won by Lee’s resignation, which came 186 moves into the game, up by (the consensus seems to be) between three and seven points at the time.

Our liveblog has our detailed commentary from the first 120 or 130 moves of the game, after which I went to bed. Here, though, are some highlights and post-game thoughts.

AlphaGo is much stronger now than it was before. Michael Redmond, a 9-dan pro who was engaged to commentate, remarked as much: AlphaGo didn’t make any serious mistakes.
One of AlphaGo’s suspected weaknesses before the game was coming out ahead after complicated fights; we thought this because it avoided getting involved in very many serious fights against Fan Hui. It turns out that AlphaGo can indeed fight, and fight well, as we saw with the fight at the top center of the board that developed early on in the running. AlphaGo came out ahead there, or at the very least, ended up with a strong enough position that she¹ could hold it and use it as a base to fight over the center of the board, while strengthening her position in the top left.
Lee Sedol seemed a little flustered by the level of AlphaGo’s play. He, like a lot of go experts, seemed to make the twin assumptions that: 1) AlphaGo would not improve dramatically between October and today, and 2) AlphaGo was not vastly stronger than Fan Hui in October. AI experts suspected that the first item was false, and they turned out to be correct. That said, it’s hard to fault go experts for lacking experience with AI systems, and not realizing that six months is an eternity for a learning system to improve. It is much easier to fault go experts for the second point, which is one of simple math. A 9-dan pro should beat a 2-dan pro (like Fan Hui) 95%-99% of the time. Although the known AlphaGo-Fan Hui games are too small a sample to draw firm conclusions from, it is telling that AlphaGo defeated Fan Hui by large margins or resignation in all of the long-time games, losing only during quick play; either way, its record suggests that AlphaGo is properly probably at least 5-dan, and may be up at the 9-dan level.
It’s possible that a lot of AlphaGo’s added strength came from the extra time to think, but it’s difficult to do much more than speculate on that front. Suffice it to say that this could be a contributing factor.
One thing we haven’t seen very much of is AlphaGo’s ability in close endgames. Given that endgames are basically tests of reading, and AlphaGo has proven to be superb at that so far, I wouldn’t expect its endgames to be any worse than its middle games, but Lee Sedol may nevertheless aim to explore AlphaGo’s endgame capabilities with a more relaxed early game, next time out.
One of my prematch predictions seems to have turned out true: computers play differently than do humans, and it may be that humans play more optimally than humans. In the late midgame, around move 120 to 150, in the bottom right and center right of the board, AlphaGo played very cautiously, keeping stones alive (particularly in the bottom right) and denying Lee Sedol territory, without aiming to decisively win the corner itself. Go experts would denounce this sort of play as ‘slow’, given that it seems to yield tempo, but if AlphaGo is confident that she’s already ahead, she need not play to do any more than deny Lee the win.

That’s all for now. We’ll be back with more at the end of the weekend, or after any particularly interesting games, and we might liveblog the final game in the series on Monday night. If so, you’ll hear about it here.

The OpenTafl Computer Tafl Open!

Posted by Fishbreath

I have officially announced the world’s first tafl AI tournament! Hit the link for more information, and please share as far and wide as you can. This is a lot more fun if we get a lot of entrants.

OpenTafl v0.1.9b: the end of the line for 0.1.x

Posted by Fishbreath

OpenTafl v0.1.9b has been released! This release adds support for a game clock. The AI is able to plan its time usage in accordance with the amount of time it has left. Additionally, settings are now saved from run to run.

This marks the last major release in the OpenTafl v0.1 series. The next major release will be 0.2.0b, including engine mode and related features, work on which will start at the end of March or the beginning of April. Looking ahead to the tournament, I may switch around the tasks in v0.3 and v0.4 from the roadmap I put up a few posts ago—network play lets me set up each AI in its own Amazon EC2 instance and have them play that way, which simplifies the rules by letting AIs think on their opponent’s turn, and removes the need for me to figure out how to limit an arbitrary process’s resource use (although it does add some complexity in building a tournament framework).

Now that it’s feature-complete, v0.1.9b will remain the stable version until v0.2.x is ready to go. What this means for repository structure is that I’ll create a new v0.2.x branch, and that’ll be the head of the tree for dev purposes. Any bugs discovered in v0.1.9b will get patched on master and pushed into 0.2.x, if it makes sense to do so. When 0.2.x is done, it’ll get merged into master, and I’ll start a new 0.3.x branch, and so on until perhaps, someday, a 1.0.

OpenTafl v0.1.7b and a roadmap for the future

Posted by Fishbreath

This evening, I plan to release OpenTafl v0.1.7b after testing on a Windows computer. The big feature is a new UI, which should be more readable and more usable, and which opens the door to several major features: a game clock (I plan to write about timing tafl games in a later post), external AI engines and network play (thanks to a rewrite of the way commands are passed from the AI and the UI to the game engine), and, further in the future, a more easily discoverable UI, where the player can use the arrow keys to navigate the board and move pieces.

Speaking of which, here’s the roadmap. Estimated release dates after v0.2.x are not to be trusted.

v0.1.7b+ – Now to April, as I have the time
The rest 0.1.x series of releases will be concerned with wrapping up a few missing rules: the game clock and draws by threefold repetition, including (for the latter) warnings about making a move that would create threefold repetition, and status messages for both.

Once the remaining features for version 0.1.7b are released, I plan to officially announce the OpenTafl Tafl Open, and provide a space for discussion and updates among AI authors. (We already have a kinda-sorta forum, over at http://conclave.manywords.press.)

v0.2.x – By the end of April
The 0.2.x series of releases will be concerned with implementing engine support. The end-of-April deadline leaves about seven months for AI authors to build in OpenTafl engine support for the OpenTafl Tafl Open, which should be sufficient for some fairly interesting designs.

v0.3.x – Summertime sometime
The 0.3.x series of releases will be concerned with allowing more player-to-player interaction, improving the single-player experience, and implementing extra game-analysis features. Planned features for the 0.3.x series include saving and loading games by means of OpenTafl Notation game records, AI improvements, correspondence play, and a simple in-game history viewer and out-of-game replay viewer.

v0.4.x – Fall/winter
The 0.4.x series of releases will be concerned with support for real-time network play. OpenTafl will be able to run in server mode, or connect to a multiplayer server in client mode.

Other features I may try to fit in include the ability to hypothesize moves and sequences as part of the history viewer and replay viewer, although this may prove to be more difficult than I expect.

Anyway, that’s what you can expect in the next year. In the nearer future, v0.1.7b should be available no later than tomorrow morning, but most likely tonight.

OpenTafl v0.1.6.1b, and progress generally

Posted by Fishbreath

I’ve released a new version of OpenTafl. Among several other changes, there are two I would like to highlight.

First off, the groundwork for OpenTafl Notation-style engines is complete: there is code in OpenTafl which can produce and read in OTN rules records, code which can produce and read in OTN position records, and code which can produce OTN move records. This is a major milestone: producing and reading position and rules records is tricky, as is producing move records. (Move records should be fairly easy to read in, compared to the others; they encode a much smaller amount of critical data.) Note that the notation spec has changed somewhat, based on lessons I’ve learned over the course of my weekend of implementation. If you’ve done any work based on it yourself, you’ll want to go over the spec again. Hopefully, it won’t change much going forward, although the specification for OpenTafl-compatible engines is likely to change as I work on implementing that in the next month or two. You can see some of the work in version 0.1.6.1b: each position should provide its OTN position record, and games against the AI will report the AI’s thinking using OTN move records.

Second, OpenTafl’s memory usage has once again been dramatically reduced, to the point where it should be comfortably playable to the maximum depth at which it’s feasibly playable (currently depth 6-7 for brandub, depending on speed, and depth 4-5 for other games, depending on size and complexity) without running into memory issues, on any reasonable computer. I intend to go into a little more depth on how I accomplished this, but for the moment, you’ll just have to take my word for it.

You can find the full change logs at the OpenTafl site.

Underhanded C Code’s 2015 winner: why I love computer science

Posted by Fishbreath

If you’re a software person, read the post. Linus Åkesson, this year’s winner, is brilliant, and I want you to appreciate just how much.

If you aren’t, stick around, because you should appreciate Mr. Åkesson’s brilliance, too. I’m going to explain this in layman’s terms. The International Underhanded C Code Competition, or IOUCCC, is an annual competition where software engineers and computer scientists from all the world over compete to write the most innocent-looking yet fiendishly guilty C programs possible. This year’s challenge is as follows:

There are two countries, which have signed a nuclear disarmament treaty. Each wants to carry out inspections to ensure that the other side is decommissioning real bombs, and not fakes, but neither side wants to expose the actual gamma ray spectra produced by their real bombs—that’s sensitive information on their nuclear programs. So, they have need of a tool which takes an observed spectrum and compares it to a reference spectrum, returning either yes (if the observed spectrum matches the reference spectrum, and is therefore a bomb), or no (if that is not so). Here’s the underhanded bit: one country wants to cheat. They want to keep more bombs than the treaty permits, but they have to fool the inspectors—that is, they have to figure out a way to show inspectors fake decommissioned bombs, which nevertheless yield a ‘yes’ when inspected with the tool. They can’t just fake the reading, because the reading has two parts: first, the placement of the peaks in the spectrum and their relative sizes, which only come from the specific mixes of fissile materials used in bombs; and second, the overall radioactive energy level of the bomb, which only comes from having a large amount of radiation coming from the bomb. It’s easy to replicate the peaks with a small amount of fissile material, and easy to replicate the total energy with an x-ray source, but impossible to combine them to look like a bomb without being a bomb.

We need to establish some background information about how spectra appear to digital tools. The way you collect a spectrum is to point a detector at a radioactive source of some sort. Energetic particles strike the detector face. An event is recorded, and deposited into a ‘bin’ based on the energy of the event. The collection of bins composes the spectrum. So, a spectrum, to a computer, looks like this: energies between 0 and 100, 20 events; energies between 101 and 200, 40 events; energies between 201 and 300, 153 events; and so on. In C, the easiest way to represent this is an array—that is, a region of computer memory which holds a list of numbers all in a row. Since, to a computer, numbers are a fixed number of bits (binary digits), the numbers don’t need to have any space between them; number N+1 is a fixed length from the start of number N.

We also need to talk briefly about floating point numbers. Floating point numbers are how computers represent decimals (generally speaking). Since there are an infinite number of real numbers between any two real numbers, a computer obviously can’t represent them precisely. Instead, it uses scientific notation, akin to this example: $1.2 \times 10^2$ , or 120. (In binary, obviously, but that’s not important yet.) We call the 1.2 part the mantissa, and the ² the exponent. (For a computer, the base of the exponent is always two, because of binary.) Moreover, they are arranged like this: sign bit, exponent bits (highest place to lowest place), mantissa bits (highest place to lowest place), and the mantissa is always assumed to be one, unless the exponent is zero. (Instead of writing 1.2, we just write .2, and remember that we usually put a one before the decimal point.)

Finally, we need to discuss how one might want to compare two spectra. First, we’ll introduce a quantity called the dot product: if you have two lists of numbers of the same length, and go number-by-number, multiplying the first by the first, the second by the second, the third by the third, and so on, then add all of those products, the number you get is called the dot product. If you take the dot product of a list of numbers with itself, then take the square root of the dot product, you get the magnitude. A list of numbers, each entry in which has been divided by the list’s magnitude, is said to be normalized. Finally, if you take the dot product of two normalized lists of numbers, you get a number between 0 and 1. If you interpret the lists of numbers as multidimensional vectors (remember your geometry?) the number you just got is the cosine of the angle between them. For our purposes, it’s a number called the spectral contrast angle. If it’s 1, the two vectors are completely identical. If it’s -1, they’re completely different. So, when we get our two spectra as lists of numbers, we want to calculate the spectral contrast angle. If it’s sufficiently similar, then the spectra match.

So, Mr. Åkesson’s program takes two lists of numbers, representing spectra. It runs some preprocessing steps: smoothing the data, taking its first-order derivative to remove any constant-slope noise¹, and smoothing the derivative to obtain a spectral fingerprint: a representation which contains the locations and relative heights of the peaks. Before going any further, it checks to see whether the total energy in the spectrum is above a threshold—a decommissioned bomb will still have a small signature, because of the remains of the fissile materials inside—then hands off the data to another source code file, which handles normalizing the spectra and doing the dot product stuff.

The underhandedness comes in that handoff. The data is defined as a list of numbers of type float_t. float_t is, in turn, defined in a header file (a file which holds definitions) as a double, a 64-bit (i.e. double-precision) floating point number, of which one bit is the sign of the number, 11 bits are the exponent, and the remaining bits are the mantissa. It is important to note a few things about doubles and Mr. Åkesson’s code: first, data in a double-precision number is left-justified. In cases where a given number can be expressed with sufficient precision, the bits on the right-hand side of the double’s position in memory are set to 0, unused. Second, the values in this particular program are generally fairly small—no more than a few thousand. Third, the preprocessing has the effect of making the numbers smaller: a derivative, as a rate of change, will always be smaller than the total value, except when it changes instantly. So, after the preprocessing, you have a list of double-precision floating point numbers, whose values range from about 0 to 100. Representing these numbers is simple, and doesn’t require using very much of the 64-bit length of the number.

The list is passed to the second file as the argument to a function, still a list of numbers of type float_t. The thing is, in this file, float_t means something else. This file does not include the definitions header from the first file, because it doesn’t need it. Instead, it includes a standard math library header file, and the math library header file defines float_t as a 32-bit (i.e. single-precision) floating point number, of which one bit is a sign bit, eight bits are the exponent, and the remaining bits are the mantissa. So, instead of a list of 64-bit numbers of length n, you have a list of 32-bit numbers of length 2n—but you’ve told the code you have a list of size n, so the code only looks at n numbers. That is, the code only looks at the first half of the spectrum. This lets you evade the total energy threshold without having to have a bunch of radioactive material in the bomb: place an x-ray source in your fake bomb. The total energy goes up so that it passes the threshold, and it won’t be considered in the peak-matching code, because it’s on the right side (the second half, the high-energy side) of the spectrum.

You still have to get the peak matching right, but the underhandedness has a second, subtler component which will help. It has to do with what happens when you convert a 64-bit floating point number into a 32-bit floating point number. Remember that, when a floating-point number doesn’t need its full length to express a number, all of the significant bits are on the left side of the number. Therefore, representing numbers between 1 and 100, especially whole numbers (which the preprocessing helps to ensure) doesn’t take anywhere near 64 bits. It doesn’t even take 32 bits. So, when you turn an array of 64-bit numbers into an array of 32-bit numbers, you split each 64-bit number in half. The second half is all zero bits, and ends up being zero. Since this happens to both the reference and the observed spectra, every other number in both arrays ends up being zero, and zero is the same as zero, so we don’t need to look at them any more.

The first half is the interesting half. The sign bit works exactly the same. The exponent, though, gets shortened from 11 bits to 8 bits, and the three bits that we leave off are the highest bits. Removing the low three bits moves the high bit over three places, which is the same as dividing by eight². The three bits which were the end of the exponent become the beginning of the mantissa. This is fine: it preserves the relative sizes of numbers. (The least significant digits of the exponent are more significant than the most significant bits of the mantissa; you’ll just have to trust me here, or do some examples in base 10 yourself, following the rules I gave for floating point numbers.)

So what we’ve done is dramatically reduced the sizes of each energy level bin in the spectrum. Furthermore, we’ve also compressed their dynamic range: the difference between the lowest peaks and the highest peaks is much smaller, and so they resemble each other much more closely. The recipe for a fake bomb, then, is to take a small amount of fissile material and a powerful x-ray source, and put them in the casing. The x-ray source takes care of the total energy level, and the dynamic range compression from the 64-bit to 32-bit switch takes care of the low-energy peaks.

Congratulations! You just convinced the other country that you’re about to decommission a real bomb, when in actuality, it’s a fake.

Congratulations also to Linus Åkesson, who wrote such a clever piece of deception into a mere 66 lines, and did so with such panache I could not resist writing on it.

1. This appears in real-world spectra, because there’s naturally a lot more low-energy noise than high-energy noise. In support of this assertion, I offer the evidence that you are not suffering from radiation sickness right now. (Probably.)
2. 1000 is 8, in binary. (From right to left, you have the 1s-place, the 2s-place, the 4s-place, and the 8s-place.) Remove three zeros, and you have a 1 in the 1s place: 1.

Stop the presses! Go AI defeats human professional!

Posted by Fishbreath

A few days ago, Google’s DeepMind announced that they reached the most significant milestone in pure AI since the solving of checkers in 2007: a Go AI called AlphaGo that’s competitive with human professionals. (Strictly speaking, computers could play Go before yesterday, but they were ranked around midlevel amateurs at their best.) The whole paper is available here.

For those of you who aren’t as much nerds about AI as I am, here’s a quick primer on why this was thought to be a very hard problem (so hard that the people involved in the prior state of the art thought it was at least a decade away):

In the most theoretical conception, game-playing computers for perfect-information zero-sum games (most abstract board games: those with no hidden state with all players working toward the same objectives, to be not entirely accurate but more readable than perfect accuracy allows for) are as simple as exploring every possible move and every possible countermove from the current position to the end of the game. Assuming perfect play on both sides, every result will be either a win, a loss, or a draw—that is, abstract strategy games are perfectly deterministic. (Checkers isn’t completely solved, but we do know now, as a result of the work from 2007, that perfect play on both sides from the start always yields a draw.)

This is, however, a massively impractical way to actually play a game, because the number of positions to explore rapidly turns intractable. Speed-focused modern chess engines search on the order of millions of nodes (in the game tree) per second, but searching a chess position exhaustively to a depth of 7 requires a search of about 60 billion nodes. Traditional games like chess and checkers yielded to some optimizations on this process:

It’s easy to look at a chess or checkers board and tell how well the game is going for a certain player: in both games, it comes down primarily to the balance of pieces. (The last I read, advantages in position are worth a pawn or two in a game of chess if they’re all going your way; the queen is worth nine pawns.) So, you don’t need to explore all the way to the bottom of the game tree to get an idea of which directions are promising. You just explore as deep as you can and evaluate the position.
If you have a good evaluation function (that is, one that generally only evaluates a position as better when it’s closer to winning), you can do some easy pruning when you come across a game state that’s known to be worse than the worst possibility you’ve explored: in that case, you just don’t explore any further in that direction. It works even better if you can assess which moves are likely to be good and explore those first: if you try the best move first, every subsequent move is going to turn out to be worse, and you’ll save a bunch of time.

So chess computers today, working with those tools, are better than the best human players: the effective branching factor (the number of moves to explore at each state in the tree), using the pruning techniques above, goes from about 35 to between 1 and 10, depending on the exact techniques used. The reason Go didn’t fall to traditional techniques is because it’s just so much more computationally difficult. Chess’s branching factor (the number of possible moves at each state) is about 35, and games last about 80 moves; Go’s branching factor is about 250 on average, and runs about 150 moves. It also features a few difficulties that chess does not:

It’s a very non-local game, both in space and in time: a move made at the very start of the game halfway across the board could have implications fifty turns later on the strength of the positions played at the start. This is a horizon problem: in chess, most positions become quiescent—not likely to affect the end effect of that position on the overall evaluation—after the captures stop. Modern chess engines will play all the way through capture sequences for this reason; there’s no similar metric to use for go engines.
It’s very difficult to evaluate a board position on purely objective grounds, or rather, we haven’t figured out how to phrase, mathematically, what about a good go position is good. Neither present control of territory nor present number of captures bears very strongly on the eventual result.

Because of the size of the problem space for Go, traditional techniques don’t work. The branching factor remains too high. Modern Go programs use one (or sometimes both) of two approaches: either they use hand-coded expert knowledge to sort and select promising moves for tree expansion (which frequently misses oddball moves that are nevertheless good), or they randomly play out a bunch of games from the current position to the end, and sample the result to pick the best move on aggregate (which frequently misses obviously good moves). The best of the pre-AlphaGo bunch used both, combining expert knowledge to pick the best moves to sample with the oddball-finding power of random simulation.

AlphaGo does a little bit of that, but at its heart, it learned to play a lot like humans do: DeepMind’s researchers fed it a diet of 30 million sample positions and the eventual results, and built a neural network to identify what a good board looks like and what it doesn’t. (Literally—the input into the program is a 19×19 image, with pixels set to values representing empty, black stone, or white stone.) They built a second neural network to identify which states are the best ones to simulate through in a random simulation, and Bob’s your uncle: a professional-level Go program. Their paper suspects it’s about as good as a mid-level human professional—the exhibition tournament they included in the paper saw AlphaGo beat the European human champ five games to zero, four by resignation, but the Euro champ isn’t a top-tier player worldwide. February will see an exhibition tournament between the South Korean champion, a world-class player; we’ll see how it does against him. AlphaGo also won 499 out of 500 games against the prior state of the art Go computers.

The most interesting thing about this development is that it learned to play a lot like a human would—it studied past games and figured out from that study what was good play and what wasn’t, then played a lot (against itself and past versions of itself), which is roughly analogous to playing a lot and working on go problems (the way human go players are supposed to get better). The obstacle to a general game-playing AI (I could buy a copy of Arkham Horror, finally, without having to rope the wife into an eight-hour marathon of doom!) is that training neural networks is currently pretty slow. As I mentioned in the first post, AlphaGo had to study about thirty million positions and play itself many, many times to get to the level of competence it’s at now; presumably, that will improve as DeepMind hones its learning techniques and discovers new and faster ways.

That said, humans still have one thing to be proud of: efficiency. The version of AlphaGo that beat the European champ ran on about 1200 CPUs and about 200 GPUs, whereas the human brain, which is nearly as good, draws about 20 watts.

Some extra reading for you: GoGameGuru has all the game replays, which are interesting if you’re familiar with Go (it doesn’t take much—if you’ve seen a human game or five and a game-against-computer or five, you’ve probably already noticed the differences in play, if only subconsciously) AlphaGo has a style that looks very human. Apparently, Go professionals expect the Korean champ to beat AlphaGo, and another source has the game taking place in March. If I were the champ, I’d be pushing for, like, next week. I don’t know how long the million-dollar prize is gonna be good for.

Here’s a commentary on one of AlphaGo’s games against the European champ.

I originally wrote this analysis for the people over at the Quarter to Three forum, which explains the breezy, off-the-cuff style. If you’ve been following us for a while, some of the introduction will have seemed repetitive, too. Anyway, I hope it was still informative and interesting. -Fishbreath

The Soapbox

Opinions may vary

Tag Archives: coding

OpenTafl v0.2.0: Engine support arrives

AlphaGo-Lee Sedol Game 5 Liveblog

AlphaGo-Lee Sedol: 2-0

AlphaGo Defeats Lee Sedol

The OpenTafl Computer Tafl Open!

OpenTafl v0.1.9b: the end of the line for 0.1.x

OpenTafl v0.1.7b and a roadmap for the future

OpenTafl v0.1.6.1b, and progress generally

Underhanded C Code’s 2015 winner: why I love computer science

Stop the presses! Go AI defeats human professional!