I think you're wrong here. From the Nature paper describing AlphaGo: "We also te...

YeGoblynQueenne · on March 12, 2016

I don't know whether this is true. If AlphaGo could do just as well without search, then why did it use search at all?

But in any case, I'm not necessarily disputing that. I'm particularly refuting the claim that the AlphaGo architecture is identical to the one that learned to play Atari games and that Deepmind have advertised as a general game-playing agent.

My comment here is specifically in reply to the GP who repeated this claim, but I'll dig up the relevant link if you're interested.

moyix · on March 12, 2016

Oh, fair enough. There are certainly differences; it's definitely not exact same architecture. They are both using Deep Reinforcement Learning, but e.g. AlphaGo benefits from getting an explicit representation of the board state and game rules rather than having to learn them.

Hassabis has said that in the next few months they want to try and get up to current AlphaGo performance without using any MCTS at all.

suryabhupa · on March 12, 2016

The idea is that even there's a policy network that is able to decide at some point what the best possible move is, the tree search is done to refine this choice and to "evaluate" it. This is why a value network is derived from policy network and is used in conjunction with MCTS to make sure that the moves AlphaGo picks are good ones.

hacker42 · on March 12, 2016

It is necessary to make multiple alternatives in the tree comparable in an easy way (and nothing is better comparable than scalars). They could also go about training a network that compares two positions to decide which one is superior, but that would require much more computation. Or another alternative would possibly to learn the value somehow jointly with the action selection, but that would possibly also be harder both to train and evaluate.

hacker42 · on March 12, 2016

Where did he say "identical"?

YeGoblynQueenne · on March 14, 2016

Noone really said that AlphaGo is identical to the Atari playing AI from a while ago. What Deepmind did say is that that agent was a general game-playing agent. If it was, then why was it not employed against Lee Sedol? Well- because it wasn't.