Demis Hassabis: You know, so…you know that’s why we are here for the summit, is that we want to discover if a great player like Ke Jie can find some weaknesses that we don't know, and even AlphaGo doesn't know from playing against itself. So of course when we played our match against Lee Sedol, in Game 4, Lee Sedol, with his brilliant creativity, found a weakness and managed to win this game, and it was very interesting for us to see this gap in the knowledge of AlphaGo. And so in the last year, we went back to try and improve the architecture and the system and for it to learn more against itself and to see if we could solve this knowledge gap. So we believe we have fixed that knowledge gap, but of course there could be many other new areas that it doesn't know, and we don't know either. And that’s why we are here, to see if it can be discovered.
Demis Hassabis: So AlphaGo always tries to maximize its probability of winning rather than to maximize the size of the winning margin. So whenever we see it has a decision to make, it will always try to pick the more certain path… that it thinks is a more certain path to victory with less risk. So often in positions that’s what we see the tradeoff that AlphaGo is making is to decide about how certain it is about the margin of victory and how likely the probability of victory. David, if you want to add anything to that.
David Silver: So…it’s a very interesting question. The way AlphaGo works is as Demis said, it maximizes the probability of winning the game. This means that we program into AlphaGo a goal. That goal is in match what we really want it to do, which is to try and win games of Go. You could imagine other objectives being applied, such as maximizing the gap, the margin of victory, but this is not the objective that we chose for AlphaGo to play in the game of Go. So if you really focus on victory, then it leads to these behaviors where AlphaGo will try to win, and in doing so, it may give up a number of points in favor of actually just reducing any risks it may perceives, even if that risk seems to be very small.
Demis Hassabis: I think the way to think about this is that Go is this amazing subject that is…got almost limitless possibilities. You know… as I said in my opening talk, I see AlphaGo as a tool for Go players and the Go community to use, to explore these mysteries and truth of Go, and find out more. And I hope that the Go players are enjoying the last year, including these matches and the matches online, the Master series. And I hope that it has contributed to improving our understanding of this amazing game. So I see it as a tool we can use, for great players like Ke Jie and Lee Sedol to discover more about the game that we all love.
Demis Hassabis: I’m not sure if I understand the question correctly, but… You know… obviously the version… AlphaGo initially learns from human games, and then…most of its learning now is from its own play against itself. So…but of course to truly test what it knows, we have to play against human experts, because we don't know playing the game against itself is not going to expose its weaknesses, because it will obviously fix those during the self-play. So we really have to test it against the world’s best players.
David Silver: Perhaps I could just add to that. One of the innovations of AlphaGo-Master, is that it actually relies much more on learning from itself. So in this version, AlphaGo has actually become its own teacher, learning from moves which are taken from examples of its own searches, that relies much less actually on human data than previous versions. And one of our goals in doing so is to make it more and more general so that its principal can be applied to other domains beyond Go.
David Silver: So maybe I can answer the first part to that question, regarding the technology inside AlphaGo. So AlphaGo-Master is a new version of AlphaGo, and we worked very hard to improve the fundamental algorithm that is used in AlphaGo. In fact, it turns out that the algorithm often matters more than the amount of data, or the amount of compute that actually goes into it. And if you get the algorithms right to make them general and powerful enough, then they can really progress very rapidly. So in fact in AlphaGo-Master, actually uses 10 times less computation, and is trained in match in weeks rather than months, compare to the version that played against Lee Sedol last year. So it is a different version, and is at least in self-play performance considerably stronger. And we are here to find out if indeed it’s stronger as it seems in self-play, or if it has weaknesses that can be exposed.
Demis Hassabis: And as far as the second part of the question, I’ll just answer that. And later on in the event we will be announcing the next steps for AlphaGo. So I don't want to say anything in advance of that, but we will be talking about that later in the week. But one thing I want to say is that, just like with the last version of AlphaGo where we published all the technical details and results of the AlphaGo program in the Nature article, in the scientific journal Nature. And we published all the details and that allowed other companies, you know… Tencent and Japanese companies, to make their own versions of AlphaGo, and some of them are very strong now as well, I’m sure you all know, playing online, probably 9 Dan level. And we plan to publish more details of the new version of AlphaGo in the next few months. So we will review those technical details, and then again other teams and academic labs will be able to implement their versions of this AlphaGo-Master architecture.
David Silver: So the answer to the technical question is that AlphaGo actually in this match is playing on a single machine on the Google cloud. So this is quite different to the computer that was used last year where we were using a distributed implementation that used many machines within the Google cloud. Now because we have a much more powerful, efficient algorithm that works in a much better, simpler way, it is actually able to use more than a tenth of the computation to achieve stronger and even better results. So AlphaGo is just playing on a single machine that will be available on the Google cloud to someone who has access to that. And that machine is based on TPUs, which are Tensor Processing Units, they were announced by Google recently.
Demis Hassabis: We want to use AlphaGo, as I said, as a tool for the Go community to improve their knowledge about the game. We hope to, you know, release some details about the architecture we are using, maybe also some of the games that AlphaGo plays against itself. So we maybe will make some announcement about this later in the week. But don't forget, the reason, ultimately, we are developing these technologies is also to use them more widely in areas of science and medicine, and to try and help human experts in those areas. So we have lot of work ahead of us in the coming years.