Hacker News new | past | comments | ask | show | jobs | submit login

I've very, very suspicious of this statement. Unsupervised learning is generally speaking not as advanced as supervised.

While unsupervised learning might be the holy grail, this victory is really about deep learning, which is an advanced supervised learning technique.




I can't help but feel training it with previous human games is fair as that seems the equivalent of how humans are taught. You don't just explain the rules of Go to somebody and leave them to learn on their own without playing anyone or picking up tips that have been passed along for centuries.


Even more importantly, the policy network that chooses which move to explore must choose human like moves in order to function correctly because it must choose to explore the correct moves of Alphago's opponent.


That's not right. It just needs to choose equally good or better moves in playouts. It doesn't need to anticipate when its opponent plays bad moves, that's just a bonus. Basically: if you're good enough you don't need psychology, you just play the winning move.


By that logic, AlphaGo would falter against an opponent who plays moves at random.


I don't think that follows. To beat the machine the move must be both unpredicted and profitable. Random moves are not profitable. Training purely by reinforcement learning rather than on humans could create a policy network that ignores more subtrees that are profitable than the current one does. In short, it isn't good enough for the AI to be good at playing itself, it has to be good at playing every possible player, and while it is playing humans it is sufficient for it to be good at playing every human player.


But in this case, the training data consists of human games, which are flawed. Supervising with flawed data can have unpredictable results.


That depends on the reasons that humans make flaws. If human flaws are mostly errors related to failures in our meat (stress, lack of focus, jittery nerves) that keep us from looking in depth then the algorithm will easily use the good points in each game and with its superior ability to look deep into the future the results are predictable that the machine will win every time.


I think flawed is a strong word. It's likely humans are stuck in locally optimised strategies though.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: