AlphaGo以学习人类经验棋谱而战胜了人类棋手,成为了人工智能的时代标志,而AlphaGoZero则以“白板”(tabular rasa)学习而再次成为头号新闻,英国经验主义哲学家洛克(John Locke,1632-1704)著名的“白板”说(theory of tabula rasa)认为,人出生时心灵像白板一样空白,通过人的经验心灵中才有了观念和知识,洛克认为经验是观念、知识的惟一来源。AlphaGo Zero的“白板”是指与人类经验棋谱相对的空棋盘,即从0开始的“学习”,但洛克的心灵“白板”是人从现实经验中认知或学习,两者的区别就在于AlphaGo Zero不需要人类的棋谱经验而是自己与自己在棋盘上对战的“经验”,这个区别的微妙之处就在于人类的经验与机器的“经验”有何本质的不同,这与AlphaGo对人类的伦理挑战不同,AlphaGo Zero的“白板”是对人类哲学问题的一个挑战,这些问题都深刻地与我们对人工智能的本质的理解和定义有关,实际上已经成为了今天我们对人的智能的基本认知理论的更新,其意义远超过AlphaGo Zero的成功。
DeepMind团队在“自然”杂志上发表的论文,推出了人工智能围棋程序的最新版本的更强大的“学习”能力,AlphaGo Zero:Mastering the game of Go without human knowledge (无需人类知识的围棋大师),据称,AlphaGo Zero以100 : 0的成绩击败李世乭版本的AlphaGo。(http://nature.com/articles/doi:10.1038/nature24270,中文介绍可见:http://mp.weixin.qq.com/s/68GTn-BaiRPmzi9F-0sCyw)最引人注意的地方是,“我们介绍一种单独基于强化学习方法的算法,无需人类数据、人类的指导,或超越围棋规则的领域知识。AlphaGo成为了它自己的老师,”(we introduce an algorithm based solelyon reinforcement learning, without human data, guidance, or domain knowledgebeyond game rules。 AlphaGo becomes its own teacher)。
(AlphaGo Zero which haslearned completely from scratch, from first principles withoutusing any human data and has achieved the highest level of performance overall。 The most important idea in AlphaGo Zero isthat it learns completely tabular rasa。 That meansit starts completely from a blank slate and figuresout for itself only from self-play, without any human knowledge, without anyhuman date, without any human examples or features or intervention from humans.It discovers how to play the game of Go completely from fist principles。 Sotabular rasa learning is extremely important to our goals and ambitions at DeepMind。 And the reason is that if you can achieve tabularasa leaning, you really have an agent that can be transplanted from the gameof Go to any other domain。 You untie yourself from the specifics of the domainyou’re in and you come up with an algorithm which is so general that it can beapplied anywhere。 For us the idea of AlphaGo is not to go out and defeathumans, but actually to discover what it means to do science, and for a programto be able to lean for itself what knowledge is。 So, what we start to see wasthat AlphaGo Zero not only rediscovered the common patterns and openings thathuman tend to play, these joseki patterns that human play in the corners。 Italso leaned them, discovered them and ultimately discarded them in preferencefor its own variants which humans don’t even know about or play at the moment。 And so wecan say that really what’s happened is that in a short space of time, AlphaGo Zero has understood all of the Go knowledgethat has been accumulated by humans over thousands of years of playing。 And it’s analyzedit and started to look at it and discover much of this knowledge for itself.And sometimes it’s chosen to actually to beyond that and come up with something which thehuman hadn’t even discovered in this time period。 And developed new pieces ofknowledge which were creative and novel in many ways。 )
DeepMind强调AlphaGo Zero从白板上开始自我学习,这是指机器进入包括训练或实战状态时不从学习巨量的人类数据开始(People tend to assume that machine learning is allabout big data massive amounts of computation),但这时的AlphaGoZero本身并非白板(裸机),也并非只包含了“操作系统”的纯净机器,而是具有了强大的机器学习能力的机器,David Silver说 “但实际上我们从AlphaGo Zero中发现,算法比所谓计算或可用数据更重要,事实上我们在AlphaGo Zero上使用的计算(量)比过去在AlphaGo上要少一个数量级,这是因为我们使用了更多原则性算法。“(But actually what we saw in AlphaGoZero is that algorithms matter much more than either compute or dataavailability。 In fact in AlphaGo Zero, we use morethan an order of magnitudes less computation than we used in previous versionsof AlphaGo。 And yet it was able to perform much higher level due to using muchmore principled algorithms than we had before。)正是由于AlphaGo Zero具有这种“先天”的学习能力它才能一开始就可以自己学习自己。
今年的法国科学节上,儒勒·凡尔纳公立综合大学(Université de PicardieJules Verne)第一次以科学介绍方式向公众展示中国围棋和包含其中的文化因素(亚眠“科学节”——围棋从中国到法国的旅行),在向完全不懂围棋为何物的观众简单地演示如何学下棋时,采用了两种现场教学方法。第一种是先介绍最基本的下棋规则,然后让学习者下子,这时参与者每下一子要费周折,第一粒棋子放在什么地方是很大的困惑;第二种方法是先让观众任意下子,然后在教学者的陪练中亦步亦趋地学习可行的落子方法。很明显,后者不但使事前完全不懂围棋的观众能够马上下棋,而基本上知道了什么是围棋,领会他任意落下的棋子都充满了奥秘,对围棋产生了兴趣。这个情况引发了我们进一步的思考,围棋的规则虽然简单,但与棋盘上的直接经验相比,对新手的认知、学习具有很大的区别。围棋的规则是围棋作为游戏的设计性思想的体现,而棋盘上的直接落子则是在现成的游戏世界中的经验行为,前者是人类知识的体现,而后者是作为游戏角色的经验,对于一个新手来说,后者是在棋盘上的经验中的学习。为此,我们研究作为围棋棋盘的特殊性。
以数学眼光对围棋进行过精深研究的英国数学家 John Horton Conway 发明的“生命游戏”(Game of Life)或称“元胞自动机”(Cellular Automaton)就是在直交网络上进行的一种位置格局迭代过程,每一格局的迭代由一个选定位置与其邻接位置的相互控制关系决定。这种迭代过程是算法能行的(可以程序化),但这种迭代产生的平面复杂格局可以表现为一种有规则的图形,这种事前无法预见的复杂现象就像生命现象的涌现和演化一样引人注意(注意这里有一种错觉,屏幕上生命游戏中的图案仍是由机器以算法形式产生的)。但由于平面上的位置组合是指数增长的,现有设计的可以实现的生命游戏的算法程序都无法穷尽,这种情况造成一种误解,只要有无限的空间和时间,生命游戏就可能演化出任意复杂事物,但实际上并没有进行这样的大规模研究的意义。生命游戏就是平面直交网格的几何性质表现为图案形式的算法的一个范例,但作为电子“游戏”,只能说是一种知识性的娱乐。