I've very, very suspicious of this statement. Unsupervised learning is generally...

seanwilson · on March 13, 2016

I can't help but feel training it with previous human games is fair as that seems the equivalent of how humans are taught. You don't just explain the rules of Go to somebody and leave them to learn on their own without playing anyone or picking up tips that have been passed along for centuries.

moultano · on March 12, 2016

Even more importantly, the policy network that chooses which move to explore must choose human like moves in order to function correctly because it must choose to explore the correct moves of Alphago's opponent.

hyperpape · on March 13, 2016

That's not right. It just needs to choose equally good or better moves in playouts. It doesn't need to anticipate when its opponent plays bad moves, that's just a bonus. Basically: if you're good enough you don't need psychology, you just play the winning move.

asdfologist · on March 12, 2016

By that logic, AlphaGo would falter against an opponent who plays moves at random.

moultano · on March 12, 2016

I don't think that follows. To beat the machine the move must be both unpredicted and profitable. Random moves are not profitable. Training purely by reinforcement learning rather than on humans could create a policy network that ignores more subtrees that are profitable than the current one does. In short, it isn't good enough for the AI to be good at playing itself, it has to be good at playing every possible player, and while it is playing humans it is sufficient for it to be good at playing every human player.

asdfologist · on March 12, 2016

But in this case, the training data consists of human games, which are flawed. Supervising with flawed data can have unpredictable results.

pixl97 · on March 12, 2016

That depends on the reasons that humans make flaws. If human flaws are mostly errors related to failures in our meat (stress, lack of focus, jittery nerves) that keep us from looking in depth then the algorithm will easily use the good points in each game and with its superior ability to look deep into the future the results are predictable that the machine will win every time.

seanwilson · on March 13, 2016

I think flawed is a strong word. It's likely humans are stuck in locally optimised strategies though.