亦苏围棋社区 - Powered by Discuz! Board

标题: AlphaGo运算力为李世石版10% 期待对弈发现弱点 [打印本页]

作者: 天马行空 时间: 2017-5-24 13:58 标题: AlphaGo运算力为李世石版10% 期待对弈发现弱点

2017-05-24 09:17:11来源：搜狐体育

　　5月23日，当今世界围棋第一人柯洁九,23日下午在这里执黑289手以四分之一子的微弱劣势负于计算机围棋程序"阿尔法围棋"，在围棋"人机大战"三番棋中以0：1落后。

[attach]76531[/attach]

柯洁对阵AlphaGo 眉头紧锁思考战术

　　AlphaGo团队在赛后接受媒体采访，内容如下：

　　Q: 你认为阿尔法围棋有什么缺陷吗？

　　Demis Hassabis: You know, so…you know that’s why we are here for the summit, is that we want to discover if a great player like Ke Jie can find some weaknesses that we don't know, and even AlphaGo doesn't know from playing against itself. So of course when we played our match against Lee Sedol, in Game 4, Lee Sedol, with his brilliant creativity, found a weakness and managed to win this game, and it was very interesting for us to see this gap in the knowledge of AlphaGo. And so in the last year, we went back to try and improve the architecture and the system and for it to learn more against itself and to see if we could solve this knowledge gap. So we believe we have fixed that knowledge gap, but of course there could be many other new areas that it doesn't know, and we don't know either. And that’s why we are here, to see if it can be discovered.

　　这就是我们为什么来到这次峰会，我们想要知道一个柯洁这样的顶尖棋手是否能找出AlphaGo身上一些我们所不知道的、甚至AlphaGo自己也不知道的弱点。当我们挑战李世石时，在第四局中，他用他的绝顶创造力发现了一个弱点并赢得了比赛，能在AlphaGo的知识框架中看到这个裂隙对我们来说也很有意思。所以去年我们回去试图改进它的架构和系统，让它能从自我对弈中学到更多，并尝试解决这个知识框架裂隙的问题。我相信我们已经修复好了，但是当然还可能有更多新的它所不了解的领域，我们也不了解，所以我们来到这里来看是否能找出新的弱点。

　　Q: 这次比赛柯洁小负AlphaGo，有一种比较有脑洞的说法是AlphaGo已经不满足于仅仅获胜了，而是希望能具体地控制输赢的差距。请问AlphaGo真的达到这样的程度了吗？如果没有的话，还有多久才能做到？

　　Demis Hassabis: So AlphaGo always tries to maximize its probability of winning rather than to maximize the size of the winning margin. So whenever we see it has a decision to make, it will always try to pick the more certain path… that it thinks is a more certain path to victory with less risk. So often in positions that’s what we see the tradeoff that AlphaGo is making is to decide about how certain it is about the margin of victory and how likely the probability of victory. David, if you want to add anything to that.

　　AlphaGo总是尽量将赢棋的可能性最大化而不是将赢的目数最大化。我们看到它每次面临决策的时候，总是会选择它自己认为更稳妥、风险更小的路线。在它的落子中我们能看到AlphaGo在判断赢得的目数有多稳妥和胜出的可能性时所做出的权衡。

　　David Silver: So…it’s a very interesting question. The way AlphaGo works is as Demis said, it maximizes the probability of winning the game. This means that we program into AlphaGo a goal. That goal is in match what we really want it to do, which is to try and win games of Go. You could imagine other objectives being applied, such as maximizing the gap, the margin of victory, but this is not the objective that we chose for AlphaGo to play in the game of Go. So if you really focus on victory, then it leads to these behaviors where AlphaGo will try to win, and in doing so, it may give up a number of points in favor of actually just reducing any risks it may perceives, even if that risk seems to be very small.

　　很有趣的问题。AlphaGo的决策过程就像是Demis所说的那样，它最大化赢棋的可能性。意思就是我们给AlphaGo植入了一个目标，这个目标才是我们想要它在比赛中做到的，也就是赢得比赛。你可以想象有其他的目标被设定进去，比如将胜出的目数最大化，但是这不是我们为AlphaGo选定的目标。当你把赢棋作为中心的时候，就会导致AlphaGo在争取赢棋时的一些行为，它可能会放弃一些目数以求降低它感知到的风险，即使这个风险非常小。

　　Q: 我是不是可以这么说，未来AlphaGo会探知人类的一些极限？

　　Demis Hassabis: I think the way to think about this is that Go is this amazing subject that is…got almost limitless possibilities. You know… as I said in my opening talk, I see AlphaGo as a tool for Go players and the Go community to use, to explore these mysteries and truth of Go, and find out more. And I hope that the Go players are enjoying the last year, including these matches and the matches online, the Master series. And I hope that it has contributed to improving our understanding of this amazing game. So I see it as a tool we can use, for great players like Ke Jie and Lee Sedol to discover more about the game that we all love.

　　我认为看待这件事的方式是，围棋是一个非常令人惊讶的有着无限可能性的事物。就像我在开幕式上所说的，我把AlphaGo看作是一个供棋手和围棋界使用的工具，用它探索围棋的神秘和真理，去探寻更多可能。我希望棋手们都能享受过去的一年，包括去年的比赛和Master的网棋。我希望它对提高人类对围棋的理解上有所贡献。我把它看作是一个我们能使用的工具，为了让柯洁和李世石这样伟大的棋手们探索更多关于这个我们所热爱的游戏。

　　Q: 这次的AlphaGo是纯净版的AlphaGo吗？也就是说，它是否是完全不依赖人类大师的棋谱来自我学习的？

　　Demis Hassabis: I’m not sure if I understand the question correctly, but… You know… obviously the version… AlphaGo initially learns from human games, and then…most of its learning now is from its own play against itself. So…but of course to truly test what it knows, we have to play against human experts, because we don't know playing the game against itself is not going to expose its weaknesses, because it will obviously fix those during the self-play. So we really have to test it against the world’s best players.

　　我不太确定我是否正确理解了这个问题。当然在最初的版本中，AlphaGo从人类棋谱中学习，后来到现在它大部分的学习材料都来自于自我对弈的棋谱。但是当然为了真正地测试它的所学，我们必须和人类高手对弈，因为我们不知道在自我对弈的过程中它是否会显露出它的缺点，因为显然它在自战过程中会避开不足。所以我们必须和世界上最优秀的棋手们对弈以测试它。

　　David Silver: Perhaps I could just add to that. One of the innovations of AlphaGo-Master, is that it actually relies much more on learning from itself. So in this version, AlphaGo has actually become its own teacher, learning from moves which are taken from examples of its own searches, that relies much less actually on human data than previous versions. And one of our goals in doing so is to make it more and more general so that its principal can be applied to other domains beyond Go.

　　我补充一下。AlphaGo-Master的一大创新就是它更多地依靠自我学习。在这个版本中，AlphaGo实际上成为了它自己的老师，从它自己的搜索中获得的下法中学习，和上一个版本相比大幅减少了对人类棋谱的依赖。我们这样做的目标之一就是是它变得更为通用，从而能被应用在围棋以外的领域上。

　　Q：我想知道Master的版本是V25，那么现在和柯洁对弈的AlphaGo是不是一个更新的版本？另外我想知道这是我们最后一次见到AlphaGo吗？AlphaGo未来会成为一个工具，帮助职业棋手继续提升自己的技术，还是从此就会和我们说再见？

　　David Silver: So maybe I can answer the first part to that question, regarding the technology inside AlphaGo. So AlphaGo-Master is a new version of AlphaGo, and we worked very hard to improve the fundamental algorithm that is used in AlphaGo. In fact, it turns out that the algorithm often matters more than the amount of data, or the amount of compute that actually goes into it. And if you get the algorithms right to make them general and powerful enough, then they can really progress very rapidly. So in fact in AlphaGo-Master, actually uses 10 times less computation, and is trained in match in weeks rather than months, compare to the version that played against Lee Sedol last year. So it is a different version, and is at least in self-play performance considerably stronger. And we are here to find out if indeed it’s stronger as it seems in self-play, or if it has weaknesses that can be exposed.

　　我可以回答问题的第一部分，关于AlphaGO内部的技术问题的。AlphaGo-Master是一个全新版本的AlphaGo，我们非常努力地工作，改进了AlphaGo的基础算法。事实证明，算法常常比数据的多少或者运算力更重要。当你把算法弄对使它们足够通用和强大，它们运行的速度是非常快的。所以事实上AlphaGo-Master用了和去年挑战李世石的那个版本相比来说十分之一的计算能力，用了几周在棋盘上训练而不是几个月。所以这是一个不同的版本，至少在自我对弈中它表现的更为强大了。我们来这里就是为了看看它是否真的像在自战中所表现的那样强大，还是它依然存在能被暴露出来的弱点。

　　Demis Hassabis: And as far as the second part of the question, I’ll just answer that. And later on in the event we will be announcing the next steps for AlphaGo. So I don't want to say anything in advance of that, but we will be talking about that later in the week. But one thing I want to say is that, just like with the last version of AlphaGo where we published all the technical details and results of the AlphaGo program in the Nature article, in the scientific journal Nature. And we published all the details and that allowed other companies, you know… Tencent and Japanese companies, to make their own versions of AlphaGo, and some of them are very strong now as well, I’m sure you all know, playing online, probably 9 Dan level. And we plan to publish more details of the new version of AlphaGo in the next few months. So we will review those technical details, and then again other teams and academic labs will be able to implement their versions of this AlphaGo-Master architecture.

　　至于第二部分的问题，由我来回答。今后在这个峰会上我们会公布AlphaGo的下一步计划，所以在那之前我不想多说，我们会在这周稍后谈到。但是有一件事是我想说的，我们在《自然》杂志中公布了上一个版本AlphaGo的技术细节和成果，这允许了其他的公司，比如腾讯和一些日本公司开发了他们自己版本的AlphaGo，这些程序中有一些已经很强大了，我相信你们都知道，它们在网上下棋，有着大概9段的水平。我们也计划在几个月内公布更多关于新版AlphaGo的技术细节。我们会回顾这些技术细节，然后其他的团队和实验室将会能够再次构建他们自己的AlphaGo-Master框架。

　　Q: 今天AlphaGo使用了多少个GPU？柯洁今天的表现是否让AlphaGo后台的机器出现了发热甚至运算力不足的情况？

　　David Silver: So the answer to the technical question is that AlphaGo actually in this match is playing on a single machine on the Google cloud. So this is quite different to the computer that was used last year where we were using a distributed implementation that used many machines within the Google cloud. Now because we have a much more powerful, efficient algorithm that works in a much better, simpler way, it is actually able to use more than a tenth of the computation to achieve stronger and even better results. So AlphaGo is just playing on a single machine that will be available on the Google cloud to someone who has access to that. And that machine is based on TPUs, which are Tensor Processing Units, they were announced by Google recently.

　　这个技术问题的回答是在这次比赛中，AlphaGo实际上是在谷歌云端的单一一台机器上运行的。这和去年我们使用的用上了谷歌云端中多台机器的分布式结构有很大区别。因为现在我们有了一个运行起来更好、更简单的更加强大、高效的算法，它能够用十分之一的运算力来得到更强大甚至更好的结果。所以AlphaGo是运行在谷歌云端上的一台机器上的，任何有权限的人都能使用它。这台机器是建立在TPU上的，也就是谷歌最近发布的张量处理单元。

　　Demis Hassabis: So just to be clear, we are using 10 times less computation power roughly than for the Lee Sedol Match.

　　重点说明一下，和对李世石的版本相比，我们现在使用的是大约十分之一的运算力。

　　Q: 所以说这是一个单机版？

　　Demis Hassabis: Yes.

　　是的。

　　Q: 当越来越多顶尖棋手不愿意和AlphaGo对弈时，我们是否会考虑到用AlphaGo和AlphaGo对弈？

　　Demis Hassabis: We want to use AlphaGo, as I said, as a tool for the Go community to improve their knowledge about the game. We hope to, you know, release some details about the architecture we are using, maybe also some of the games that AlphaGo plays against itself. So we maybe will make some announcement about this later in the week. But don't forget, the reason, ultimately, we are developing these technologies is also to use them more widely in areas of science and medicine, and to try and help human experts in those areas. So we have lot of work ahead of us in the coming years.

　　就像我所说的，我们希望AlphaGo会是一个供围棋界提高他们对于这个游戏的认知的工具。我们会公布我们所使用的程序架构的细节，也可能还会公布一些AlphaGo自我对弈的棋谱，这周稍后会正式宣布。但是别忘了，我们发展这些科技的最终目的是为了在科学和医学领域更广阔地应用它们，也为了给人类专家提供帮助。所以在接下来几年我们还有很多工作要做。

欢迎光临亦苏围棋社区 (http://ysgo.91em.com/bbs/)