Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think you're wrong here. From the Nature paper describing AlphaGo:

"We also tested against the strongest open-source Go program, Pachi, a sophisticated Monte Carlo search program, ranked at 2 amateur dan on KGS, that executes 100,000 simulations per move. Using no search at all, the RL policy network won 85% of games against Pachi."

AlphaGo does use MCTS, but it seems that most of its improvements are actually coming from the deep reinforcement learning approach.



I don't know whether this is true. If AlphaGo could do just as well without search, then why did it use search at all?

But in any case, I'm not necessarily disputing that. I'm particularly refuting the claim that the AlphaGo architecture is identical to the one that learned to play Atari games and that Deepmind have advertised as a general game-playing agent.

My comment here is specifically in reply to the GP who repeated this claim, but I'll dig up the relevant link if you're interested.


Oh, fair enough. There are certainly differences; it's definitely not exact same architecture. They are both using Deep Reinforcement Learning, but e.g. AlphaGo benefits from getting an explicit representation of the board state and game rules rather than having to learn them.

Hassabis has said that in the next few months they want to try and get up to current AlphaGo performance without using any MCTS at all.


The idea is that even there's a policy network that is able to decide at some point what the best possible move is, the tree search is done to refine this choice and to "evaluate" it. This is why a value network is derived from policy network and is used in conjunction with MCTS to make sure that the moves AlphaGo picks are good ones.


It is necessary to make multiple alternatives in the tree comparable in an easy way (and nothing is better comparable than scalars). They could also go about training a network that compares two positions to decide which one is superior, but that would require much more computation. Or another alternative would possibly to learn the value somehow jointly with the action selection, but that would possibly also be harder both to train and evaluate.


Where did he say "identical"?


Noone really said that AlphaGo is identical to the Atari playing AI from a while ago. What Deepmind did say is that that agent was a general game-playing agent. If it was, then why was it not employed against Lee Sedol? Well- because it wasn't.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: