蒙特卡罗算法在游戏（围棋）AI中的应用

isiqi

浏览: 16025698 次
性别:
来自: 济南

最近访客更多访客>>

nison

hellohank

wangyy

devcang

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

2012-07 ( 335)
2012-06 ( 252)
2012-05 ( 362)
更多存档...

游戏算法 Go 虚拟机 Google

我是在 aigamedev.com 上的2008年第17周的 RoundUp 里看到这篇文章的推荐的，出于自己对中国象棋及其计算机博弈方面的兴趣，虽然对于围棋和围棋AI一窃不通，但还是挺仔细地阅读了这篇文章，觉得这里的内容跟自己以前了解的计算机博弈方面的知识有不同。所以把它翻译一下，为的是让自己更好地理解其中的知识。本人英语甚差，如有译错，敬请赐教。

另，本文的作者应该是中国人，真希望他以后也用中文写写他的研究所得。

＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝

Monte Carlo Method in Game AIs

蒙特卡罗算法在游戏（围棋）AI中的应用

Friday, April 25th, 2008 12:28 pm

Written by: qqqchn

作者：qqqchn

翻译：赖勇浩（恋花蝶）

原文地址：http://expertvoices.nsdl.org/cornell-cs322/2008/04/25/monte-carlo-method-in-game-ais/

As many of my classmates have posted, the Monte Carlo method isn’t actually any single method, but actually represents an entire class of methods which involve taking random samples to find a result.An interesting application my partner and I found for the Monte Carlo method was for one of the GO AIs we made for one of our other projects. (GO is an ancient Chinese Board Game that is still very popular today in East Asia, the rules and details can be found here)

像我很多同学说过的，蒙特卡罗算法不是一个算法，而是一系列关于通过随机抽样来求解的算法。我的 partner 和我发现了一个有趣的蒙特卡罗算法应用：把它用在围棋的人工智能上。（围棋是一种来自中国的古老的智力游戏，直到今天在东亚仍然非常流行，参考这里）

One of the reasons we chose to use the Monte Carlo method was because the immense number of possible moves in GO made using the Minimax Algorithm (one of the more common methods used for finding the next ”best” move in many game AIs like chess by consecutively maximizing and minimizing the score for a player up to a certain depth, more details here) far too computationally intensive when looking at more than 2 or 3 moves ahead (looking only 4 moves ahead on a mere 9×9 board takes about 81^4> 4 million board evaluations). An interesting quote illustrating the computational intensity of GO games on a full 19×19 board is that “the number ofpossible GO games far exceeds the number of atoms in the universe” (more details and derivation here) Interesting Facts: Lower bound on number of possible GO games on 19×19 board is about 10^(10^48) . Upper bound is 10^(10^171).

我们选择蒙特卡罗算法的原因之一是围棋中应用极小极大算法（Minimax Algorithm，一种在棋类中常用的选择“最佳”的下一步着法的算法，参考这里）来计算2步或3步之后的着法产生的计算量就非常巨大（在9x9的棋盘上计算4步着法就需要做81^4（大于4百万）次盘面估值）。有一句非常形象的话来形象围棋（19x19）的计算复杂度：远大于宇宙中所有原子的个数（参考这里）。实际上围棋（19x19）的计算下限的 10^(10^48)，上限是10^(10^171)。

So another way we used to evaluate how “good” a move is was to use the Monte Carlo method. What the Monte Carlo method does in this case to estimate how good or bad a certain move is for a given board position is to play “virtual games” illustrating what would happen if two Random AIs (AIs playing completely randomly) played out those moves.The way it does this is to start from this board position and play each of the viable moves in a fixed number of games with all subsequent moves being completely random. Then after all of the ”virtual games” are finished, we would average the total scores of each game and let it represent the “goodness” of the original move which spawned that game. Finally by choosing the move with the highest average score, the Monte Carlo AI would then play this move in the actual game itself, based on the assumption that the moves which score better over a large number of random games would be “better” moves in general.

因此我们使用蒙特卡罗算法来评估一个着法有多好（差）。蒙特卡罗算法评估某一着法有多好（差）的方法是由两个随机AI（选择的着法完全随机）对一个给定的盘面下若干盘“虚拟棋”。从一个给定的盘面开始，然后对每一可行着法计算指定数量的后续着法完全随机的“虚拟棋”。之后，我们统计所有可行走法的平均值，以反映出“好”的着法。最后是选择有着最高的平均值的着法，蒙特卡罗AI在真正的棋局中应用这一着法。这是基于假设这一高分着法通常比其它的选择产生的结局都要好来做的。

For our project, we let our AI play about 500 virtual games for each move, which on slower computers actually can take a while, but it is still far faster than trying to use the Minimax Algorithm to look ahead just 4 moves (just over 1 million evaluations compared to 4 million +). In addition, the results of the Monte Carlo AI are pretty good as it can generally defeat most of our other AIs (Minimax AI looking 2 or 3 moves ahead and Random AIs), and it even put up a decent fight against some beginner human players as well.

在我们的项目中，我们让AI对每一个着法下500局“虚拟棋”。这也有不小的计算量，如果机器比较“破落”，可能需要计算挺长的一段时间。但它仍然比用极小极大算法向前计算4步（计算量大约是9x9棋盘计算4步（约需评估4百多万个盘面，见前文）的1百万倍）要快得多。蒙特卡罗AI 的效果很好，它通常能够打败极大极小算法AI（计算2或3步）和随机AI，这样的棋力跟初学围棋的人类差不多。

本文最初发表于赖勇浩（恋花蝶）的博客，http://blog.csdn.net/lanphaday，如蒙转载，敬请保留全文完整，未经许可，不得用以商业用途。

Worth noting is that one very important factor for how well the Monte Carlo method works in this case is the scoring function which you use to decide a player’s score given a certain board position. The one we used which is very straightforward and relatively simple in that it just assigns an empty spot to whoever has the closest stones to that spot, with ties being broken by number of stones near it. This isn’t the most accurate or effective scoring method, but it worked decently well enough for our purposes.

值得注意的是蒙特卡罗算法依赖于一个很重要的因素，那就是对特定盘面的估值函数。我们用了一个简单的函数：把空的点归属于最近的棋子，如果有多个棋子，则平分。它可能不够准确和高效，但对于我们来说，已经足够。

The AI we developped using Monte Carlo methods was one of the better AIs we made, but it is still nowhere near the capabilities of a decently experienced amateur human player. Especially, the AI starts losing out near the end game when tactics mean a lot more than overall strategy (which Monte Carlo and Minimax seem to do well at). And the fact that we are using random moves to play each “virtual game” means that we can get very different results each time we play it, especially near the end game where results of moves really depend on the quality of subsequent moves, which in this case are completely random.

我们开发的蒙特卡罗算法AI是我们开始的AI中较好的一个，但它与训练有素的棋手仍然相距甚远。尤其在游戏将结束时，战术比策略显得更为重要，AI 就容易输棋（蒙特卡罗算法和极小极大算法都有这种问题）。我们使用随机着法来下每一个局“虚拟棋”，所以我们每一次都会得到不同的结果。在将近结局的时候，最后的结果依赖于后续着法的质量，而在这里后续着法是完全随机的，所以效果差强人意。

GO is considered by many to be the most complicated game we know of to date, and it is very unlikely that we will be able to come even marginally close to solving the game anytime soon (want to even try writing out 10^(10^48)?). But it seems equally unlikely that people will give up on trying anytime soon either, as has been proven by human tenacity in the face of other “insurmountable” odds in the past (landing on the Moon…).

围棋被认为是目前为止最复杂的游戏，而且我们不可能在很近的将来解决它。但大家都不会放弃，因为已经证明人类在面对“不可逾越”的问题上是坚忍不拔的（例如登月）。

NOTE: when I said “random” in this post, I naturally mean the pseudorandom number generators computers use, which isn’t really random, but was more than close enough for our project.

注意：本文中的“随机”是指计算机使用的伪随机数，而非真随机，但从项目中来看已经不错了。

CITATIONS:

引用

http://en.wikipedia.org/wiki/Monte_Carlo_method

http://en.wikipedia.org/wiki/Go_%28board_game%29

http://en.wikipedia.org/wiki/Go_complexity

http://en.wikipedia.org/wiki/Minimax

GO AI Project CS478 (Gordon Briggs, Qin Chen) -unfortunately not finished yet so don’t really have any statistics yet to cite-

围棋AI项目CS478（Gordon Briggs, Qin Chen）尚未完成，所以无法提供真正的统计数据。

―――――――――――――――――――――――――――

中文参考：

蒙特卡罗算法：http://baike.baidu.com/view/480343.htm