Some questions after move 37

At the time of thinking this post, AlphaGo had won the first game against Lee Sedol. At the time of writing, the second match was over with. And at the time of publishing, the third win had sealed the faith of the tournament.

In this post I extrapolate from the surprising move 37 of the second match. Take a look at it in the video here and watch the surprising reaction of the commentators who think it must be a mistake. Not to mention the reaction by Lee himself who left the room and didn’t come back to make his next move for 15 minutes. Move 37 is significant and stands as a symbol for this whole event because it was a move that was surprising not only because it was unexpected for the computer to make such a move, but because it was an unexpected move for any player to make and it wasn’t until many moves later that the strategic importance of the move was understood.

The significance of the move and its merits are still debated and probably will be for some time. Even though it worked out well for AlphaGo in the end, it could also have been one of several moves that AlphaGo could have played simply because it was already sure of winning. You see, AlphaGo plays an egoless game. It sacrifices it’s margins of victory to increase its probability of victory. It rather wins by 1 point with 99 percent probability than with 10 points with 80 percent probability. This is very hard to calculate for a human even a few moves away from the end game which usually means that humans will overextended their advantage to hedge against unknowns later in the game. It is thus hard to know just how good AlphaGo is since it seems like it makes mistakes in the endgame allowing the opposing player to get a chance at winning while in fact it could just as well have played conservatively to solidify a one point win (which is exactly how a run away AI ready to enslave humanity would act to not get discovered and pulled until it was certain of the probability of the machines winning the war against humanity… ). This is also an example where the AI can be said to have made an unexpected interpretation of the goals given to it by its programmers. Winning in a game is by a human player often interpreted as maximizing your score versus the opponents score whereas for the AI winning was a matter of probability, not marking of victory. Only by scoring different victories different try could such a “show off mode” be immense, highlighting that even in something as structured a the perfect information of the game of Go the endgame can be interpreted in different ways.

Move 37 (2016) vs. Move 36 (1997)

AlphaGo’s move was not a move that humans know to be right and can immediately recognize as being the correct move. Such was the case of the controversial move 36 by IBM’s Deep Blue in game 2 against Kasparov in their famous 1997 re-match. Kasparov tried to bait Deep Blue with a “poisoned pawn” playing to the computers materialistic bias of immediate captures and inability to think the strength of vague positional plays that could have singificance later in the game. The strategy failed, however, at move 36 in game 2. Kasparov had set his trap, but move 36 was a long-term strategic move against Kasparov. Kasparov immediately saw that it was the logical move a good human chess player would have made couldn’t believe that the computer was capable of it. He went so far in his disbelief that he accused IBM of having a human behind the scenes making that move^[1]. Here is a video analysis of that move and here is the move in the documentary “Game Over” about the matches between Kasparov and Deep Blue.

Such an accusation would be absurd in the current situation. This move 37 on the contrary was a move that no one understood why it had been played and no human observer could calculate so far ahead as to understand its later singificance. Forget about humans helping the machine backstage! Will we talk about move 37 in the future the way we talk about move 36 today?

Philosophical Implications for Go

European Champion Fan Hui was the player that the October 2015 version of AlphaGo played and won against last year. After that, Fan Hui spent the last 6 month playing against the machine to help prepare it for the games against Lee Sedol. Fan Hui claims that during this period he learned from playing the machine, and his Go ranking improved. I think it is possible to learn from Go-machines in a way that is not possible with the brute-force oriented chess computers such as Deep Blue.

They neural network behind DeepMind (of which AlphaGo is a Go-playing instance) is not only a matter of brute force processing power evaluating millions of moves per second. Go is complex to such an extent that there was a time not long ago when no addition of computing power would have beaten the best human in the game if Go. AlphaGo is both a hardware advancement, or course, and a software advancement. What it does so well is evaluating which moves are worth considering further. The bruteforce approach of chess is not working in Go^[2]. The machine has to have some kind of intuition. In the neural network this consist of a “policy network” doing the selection of what to evaluate and a “value network” making the decision of which option to play.

AlphaGo therefor plays and evaluates game positions in quite a similar way as humans do based on unexplainable pattern recognition. Many people say that with Go you can learn by learning to see beauty. You learn to intuitively recognizing certain patterns of stones as harmonical and consider them strong without being able to more precisely explain why it seemed like a good move at this time in the game. The constant references to beauty in the game of go (which I tend to agree with) — and this refers to beauty not just from a spectators point of view but beauty as a game strategy — is now being transferred into the computer. The computer, by being able to play the game of Go at the highest level, must be capable of understanding and producing beauty. Not just as an end result, like a computer reproducing a painting, but at a fundamental algorithmical level. And despite the ideology of these claims in promoting the social and cultural benefits of the computer industry and certain companies in particular, there is some truth to the claim that machine learning is less mechanical and more based of coming to a kind of aesthetic approach to problems. Neural networks solve problems in a certain way because it looks good to solve it like that based on the patterns that has emerged from the training input it has had.

The chess computer that played Kasparov only calculated and there is little to learn from calculations within the human limitations of ability to calculate moves. From the chess computer we can learn nothing. It was too foreign in its thinking and frankly a bit “dum”. Human chess remained the same and integrated no new concepts from the computer matches. It would be similar to playing humans vs. robots in basketball and the robots winning because they could shoot the ball from all across the court. It can hardly be said to be the same game although they play by the same rules.

Go AI:s on the other hand have the potential to teach human Go players new concepts with its unexpected moves and new formations on the board. Then there would have been a heuristic conceptual learning that was not about simply learning one new move but for a new way of reasoning around Go to emerge. It can be worth pointing out here that even if Go has been played for over 2500 years, in the last century, new playing styles, concepts and strategies have revutionized Go almost every decade — often in close contact with advances in combinatorial mathematics. And even today, high level players can have completely different approaches to the game and ways of reasoning about it. That itself is a testament both to the dynamic complexity of the game of Go and to the creative capabilities of advanced human collective efforts^[3]. The possibility of an AI contributing to yet another revolution in Go could therefor be seen as a continuation — not a break — with a century old tendency of advances in numerical sciences revolutionizing the game. The heuristics used for interpreting Go games now is also partly based on analogies to the physical world, such as a stone having weight and influence over a territory, or that groups are strong and weak. If learning from AIs could aid the emergence of new concepts on another level of abstraction, not stuck in the analog world (or at least hung up on a more advanced verison of it), it could be a real philosophical feat.

In particular, it seems like the game of Go is ripe for a revolution of the playing styles in the center of the board which is not as easy to conceptualize as corner positions and tend to be unique, whereas corner positions often repeat across games. This is also because the center field is so affected by what goes on in all four corners. It is hard to “own” the center the way you can own a corner. Perhaps AlphaGo can play the center in ways that open up for new styles. Philosophically this is interesting if philosophy is — as Deleuze put it — the invention of concepts. This would mean that human players would be able to change their playing style using concepts they learned from the AI. The new moves played by AlphaGo could be worked into a new Go heuristic similarly what happen with the incorporation of mathematical research into Go strategy during the 20th century^[4].

A Short History of Games and AI

Using games to advance number theory and vice versa has a long history. Go in particular has been used by Conway (of Conway’s game of life fame) to come up with the mathematical theory of surreal numbers. Combinatorial game theory has also given rise to the concept of Go infinitesimals which has application in the endgame of Go.

Even after the games between Kasparov and Deep Blue, most people thought that a computer consistently beating a Go master was far into the future, if not theoretically impossible. Here is an article from 2002 talking about the unlikelyhood of a computer beating humans in Go and even as late as 2014 it was considered to be far ahead. For a while it seemed that no brute-force in the world would be able to tackle the exponential complexity of the game of Go. However, 2006 was the year of the Monte Carlo revolution in Go. Monte Carlo Tree Search was a new search algorithm for finding the most promising moves in a decision-making tree and evaluate them. In subsequent years, Monte Carlo implementations began to beat better and better players at less and less handicap. There is however still a considerable leap from that to what the modified Monte Carlo system of AlphaGo did in October and now in the March games. In particular the connection between the Monte Carlo algorithm and the learning neural net of DeepMind gave the advances.

Deepmind was founded as a small British company in 2010. The founder, Demis Hassabis, was a child chess champion who later got into AI heavy computer game development and worked on Theme Park at Bullfrog. Developing those games was a kind of skunk work AI development, especially in understanding how humans related to the “aliveness” of AI engines. He has even stated that the cutting edge AI implementations across all fields at the time was the one they were doing in the video games. Kind of throws the idea of serious games upside down, doesn’t it?

The company was acquired by Google in 2014 after acquisition negotiations with Facebook stranded in 2013. It is safe to say that Google has provided little more than processing power and bean bags and most of this feat should be credited to the DeepMind team more than any Silicon Valley Singularity fanatics. Actually, the processing power is mostly needed in training the AI where different versions of AlphaGo is playing tournaments against each other in the cloud (now doesn’t that call up an image of mythical proportions!) and not as demanding in running it which of course make it much more scalable in the near future. Interestingly enough the company developed their system by making it play classical Atari video games like a human player (such as Pong, q*Bert, Breakout and Space Invaders). How about that! It also did this with only raw pixel data as input and learned from just doing random things, no bootstrapping of human play. True to the notion of gamification, the early arcade games are actually well suited to train an AI because the immediate feedback you get for making a wrong or right move, such as gaining points or losing the game immediately. This is what proponents of gamification (as well as New Public Management) assume that also humans also to improve their performance of tasks — a constant review, evaluation and feedback. In Go on the other hand, you just get a result in the end and it can be hard to understand after the game which move were the mistake that led to you losing. The signal you get indicating you did something wrong is weak and imprecise. That’s why humans being taught Go or other games and sports have a coach or mentor that can point out mistakes as they happen and explain what negative results they led to. This is called the “credit assignment problem”.

AI Ethics

This tournament will have implications far beyond the field of playing Go and will again make people consider the powers being granted to machines and their ability to make decisions on our behalf. Remember though that the computer didn’t actually make the moves in the game. It suggested the move and the official representative of the event did the actual move. Why did the person make move 37 on behalf of AlphaGo even though it seemed like such a strange move? In this case, of course, the representative is not going to change the move to something that they think is better according to their way of seeing things. Maybe they aren’t even experts in the field of Go. Computer-aided decision-making allows for an unskilled workforce to execute tasks suggested by the machines. Anyway, nothing much is really at stake, it is just a game. But imagine the same scenario with an AI that helps with decision-making in science — suggesting interpretations not supported by the method, or in health case — suggesting treatments the doctors have no idea why it does. Are they going to trust the AI with these “move 37s”? Or are they going to assume that it was a mistake made by the machine? Who is legally responsible if something goes wrong?

Deepmind — the company behind AlphaGo — states that their future development of the system is within healthcare and recommendation systems. For example with diagnosis from complex information in images or tracking of health data over time and recommending healthy choices based on seeing patterns in that data. Would you follow lifestyle changes “nudged” by your personal AI assistant tracking your data over time?^[5] What if such a system diagnoses a patient with cancer and recommends radiation treatment for them without the doctors being able to understand where that diagnoses came from. *What responsibility do they have if they choose to go ahead with the treatment and it later turns out that the system made an error? What responsibility do they have if they go *against the recommendation of the system and the cancer is found only at a later stage when it has developed too far? Will the systems have to explain their decisions to humans? Can they even be designed to “explain” their “reasoning” in such a way? Can they compress the steps taken by their system in a heuristic way? Could AlphaGo have explained its move 37?

Now, the danger of AIs comes from the fact that they are supposed to balance on a fine line between predictability and unpredictability. The Go machine is supposed to play some predictable solid moves and not make mistakes, but also play unexpected moves that no human could have thought of. Letting the AI loose and allowing it to learn and not only do pre-programmed patterns means opening up to the possibility of grave mistakes as well as great advances. In the context of a Go game the worst that can happen is the loss of a stupid game. Who cares? But the question is what do you trust to hook up an AI to and allow it to make autonomous decisions with or even allowing to recommend humans subject to influence and insecurity in their own decision-making capability with? Personally, in my life I don’t have the power to do much dangerous things. I don’t have control of powerful systems I could hook up an AI to. But if I had access to a system that could be used to launch nukes, hell yeah I could make a dangerous AI algorithm. Just make a semi-random weighted input of data that runs 8 o’clock every morning and if the result is >1 it launches the nukes. That would be a really dangerous AI that no one could stop without pulling the plug. But shame on me for connecting it to the nukes, right? So what is reasonable to allow AIs to do? What are the ethics? The problem could of course be solved with a some economic calculation of risk-aware opportunity costs if one was just willing to put an economic price to human and biological life (which is anyway done in health care to some extent, right?). The interesting question is not a metaphysical one of the threshold of artificial intelligence but an ethical question of what is good enough. For every step in the development of AI there will be a new threshold of good enough for certain uses where it will be implemented and new quirks and errors tolerated for the benefits it produces (for someone not necessarily being the same person affected by the faults of the AI).

Footnotes

Game 2 between Kasparov and Deep Blue got to his head. He resigned in frustration but was the day after told that deep blue had made a mistake towards the end of the game that could have helped him force a draw through perpetual check. However, modern computer analysis does not consider the position a draw but winning for deep blues side. So who was right in the end? Impossible to tell since we don’t know if deep blue would have been able to get out of the perpetual check. ↩︎
Note that current chess computers like stockfish are not as simple as the ones of the Kasparov era. ↩︎
This also opens up for the speculative possibility of some hermit Go player that haven’t had contact with human civilisation for decades coming out from the woodwork beating everyone with a completely non-human playing style developed in complete isolation from the international Go community. ↩︎
This is different also from another hypothesis that future humans will probably be able to win against the AlphaGo of today by specializing a gaming style playing to its weaknesses in some kind of artificial intelligence archeology, just as AlphaGo of today can beat previous versions of itself. In the end, it is hard to say what it means to get better at Go beyond beating other Go players. Is there an objective truthful way of measuring Go progress besides the communal agreements emerging from tournament wins? What is there’s a specific way of aging against the AI’s that will beat them but lose to a decent human player just like Kasparov tried to play, and was partly successful in playing, an anti-silicon strategy against deep blue that would never fly in a match against a top human opponent. A playing style is only weak relative to the opponents about to exploit it. ↩︎
For Swedish readers, this reminds me of an old Pentagon sketch of the personal coach. ↩︎

Blay

Move 37 (2016) vs. Move 36 (1997)

Philosophical Implications for Go

A Short History of Games and AI

AI Ethics

Footnotes