AlphaGo Key Points
My understanding of the AlphaGo success.
It’s well known that Machine Learning needs massive amount of good data to work. “Data is the fuel of machine learning”. Go, like other virtual games (Atari Breakout etc), can be fully virtualized and be played millions of times in a short period.
However real world problems are much slower to generate data due to the physical limitation.
According to this post.
The value network provides an estimate of the value of the current state of the game: what is the probability of the black player to ultimately win the game, given the current state?
The policy networks provide guidance regarding which action to choose, given the current state of the game.
The probability of the value network is measured much accurate than human player, while human can only have roughly feelings about different sections of the board. That’s the best advantage of AlphaGo.
Also the policy network is always in cold blood, exhaustively search the candidate positions.
Top software quality from Deepmind + Google computing infrastructure especially Tensor Processing Unit (TPU) etc. really accelerated the training cycles.
Go is well suited to the technology and famous enough to have world wide influence.
Appendix: Fun thinking for the Ke Jie vs AlphaGo match 3
What if Ke Jie clone his strategy from the second game and fix his mistake this time? I suspect AlphaGo won’t change its behavior. This approach can save a lot of thinking time for Ke Jie and he can focus on the second half of the game. A bit boring for the audiences though.