In a recent interview [1], Hassabis (DeepMind founder) said they'd try training ...

kurlberg · on March 12, 2016

A case of life imitating AI koans:

Uncarved block

In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. "What are you doing?", asked Minsky. "I am training a randomly wired neural net to play Tic-tac-toe", Sussman replied. "Why is the net wired randomly?", asked Minsky. "I do not want it to have any preconceptions of how to play", Sussman said. Minsky then shut his eyes. "Why do you close your eyes?" Sussman asked his teacher. "So that the room will be empty." At that moment, Sussman was enlightened.

(It seems based on a true story https://en.wikipedia.org/wiki/Hacker_koan )

jmathes · on March 12, 2016

It sounds like the point of this story is to illustrate by analogy that starting from first principles is sometimes a silly way to approach a problem, and by extrapolation that it's a silly way to make an AI that plays Go well.

Making an AI that plays Go well is not (and has never been) the real goal. They're trying to learn how to build a AI that can solve any problem.

alexdowad · on March 12, 2016

I don't think that's the point of the story. In the story, Sussman says that because the initial state of his net was randomized, it will "have no preconceptions". But that's not true. It still has "preconceptions", but randomly chosen ones. Because Sussman didn't know what they were, that didn't mean they didn't exist, any more than closing your eyes means the room is empty.

theoh · on March 13, 2016

The Taoist concept of the uncarved block, referenced in the title of the koan, refers to naturalness and simplicity. I'm sure someone more expert than me can give a better explanation but it seems highly relevant to the idea of learning to play Go based only on the rules, rather than any human tradition of strategy.

http://taomanor.org/pu.html

argonaut · on March 12, 2016

The actual quote: "Sussman told Minsky that he was using a certain randomizing technique in his program because he didn't want the machine to have any preconceived notions" makes no indication it was a neural net ;)

latentspace · on March 12, 2016

Not according to the jargon file

http://www.catb.org/jargon/html/koans.html#id3141241

nostrademons · on March 12, 2016

Both the Jargon file quote and a citation from a published book (which has the quote the grandparent posted) are listed on the Wikipedia page:

https://en.wikipedia.org/wiki/Hacker_koan

Eric Raymond kinda butchered the Jargon File when he took over maintenance, so it wouldn't surprise me if some of the text there is invented. The original Jargon File does not contain any koans:

http://www.dourish.com/goodies/jargon.html

azzafazza · on March 12, 2016

awwducks · on March 12, 2016

That would be amazing if it could achieve the same levels (or higher) without the bootstrapping.

The niggling thought in my mind was that AlphaGo's strength is built on human strength.

radicalbyte · on March 12, 2016

Human strength is also "built on human strength" so I don't see the problem? :)

V-2 · on March 12, 2016

Well, yes, but it's still humans standing on the shoulders of other humans. Even though human players do memorize opening books, it stays in the family so to speak. Meanwhile a human player facing an AI engine is battling both the AI, and great human players of the past (who invented the openings).

sleepychu · on March 12, 2016

It's not truly artificial if it's using a human playbook. (Is the problem posed by the parent, I believe.)

Barrin92 · on March 12, 2016

What is 'truly artificial'?

Neural networks are modeled after biological systems to begin with, I don't the that's a meaningful concept at all.

niels_olson · on March 12, 2016

Well, we can extend that to say the biological systems are self-assembled randomly and selected through evolutionary algorithms, starting from random molecules on the sea floor.

andreyf · on March 12, 2016

Truly artificial means not using meatspace metaphors for reasoning like human players do.

inopinatus · on March 12, 2016

I suspect you will only be satisfied when AIs play each other at an incomprehensible game of their own devising.

bobwaycott · on March 12, 2016

Making popcorn now.

awwducks · on March 13, 2016

I doubt it. When/if they do play such a game that humans can't explain, I'll probably be interested in some other problem.

Isn't that the nature of human endeavor? Always looking for the next challenge?

1024core · on March 12, 2016

What if the "betaGo" played just AlphaGo, and learned from its games?

BTW: even humans don't just randomly pick up the game. They have teachers, who teach them the tricks of the trade and monitor their games.

gyom · on March 12, 2016

That's already a known method to transfer "knowledge" from one model to another. I should double-check before quoting a paper, but I think that this one talks about this (http://arxiv.org/abs/1503.02531).

You train many models. Then you "distill" their predictions into one model by using the multiple predictions (from many models) as targets (for the single model trained afterwards).

You're right to point out that humans don't do that.

I think it would be "cheating" if you train BetaGo on AlphaGo, for the purposes for doing that experiment. The goal would be to have some kind of "clean room" where people fumble around.

Of course, you can also run the other experiment to see how fast you can bootstrap BetaGo from AlphaGo. That's also interesting.

3minus1 · on March 12, 2016

I'm pretty sure that the reinforcement learning algorithm they are using is guaranteed to converge. It just takes a very long time to train, and using human games probably sped it up.

aab0 · on March 12, 2016

As far as I know, using neural networks for function approximation destroys the various convergence guarantees available. NNs can easily diverge and have catastrophic forgetting, and this is one of the things that made them challenging to use in RL applications despite their power, and why one needs patches like experience replay and freezing the networks.

kotach · on March 12, 2016

I believe the whole point of pretraining on reference policies, which a collection of "optimally" played human games is, is just avoidance of bad local optimum.

It can be a case that training and learning on just a learned policy is going to get you stuck in a local optimum that is of worse quality than the one with pretraining.

If they stored all of the AI played games their reference policy (the data) would be of extreme value. You could train a recurrent neural network, without any reinforcement learning, that you could probably run on a smartphone and beat all of the players. You wouldn't need a monte carlo search too.

There are algorithms [1] that have mathematical guarantees of achieving local optimality from reference policies that might not be optimal, and can even work better than the reference policy (experimentally) - assuming that the reference policy isn't optimal. The RNN trained with LOLS would make jointly local decisions over the whole game and each decision would guarantee that a minimization of future regret is being done. Local optimality mentioned here isn't finding a locally optimal model that approximates the strong reference policy, it means that it will find the locally optimal decisions (which piece to put where) without the need for search.

The problem is that for these algorithms you have to have a closely good reference policy, and given a small amount of human played Go games, reinforcement learning was the main algorithm instead, it allowed them to construct a huge number of meaningful games, from which their system learned, which allowed them to construct a huge number of more meaningful games, etc.

But, now when they have games that have a pretty good (AlphaGo is definitely playing on a superhuman level) reference policy, they can train the model based on that reference policy and they wouldn't need a search part of the algorithm at all.

The model would try to approximate the reference policy and would definitely be worse than AlphaGo real-search based policy, but it wouldn't be significantly worse (mathematical guarantee). The model is trained starting from a good player, and it tries to approximate the good player, on the other hand, reinforcement learning starts from an idiot player, and tries to become a good player, reinforcement learning is thus much much harder.

[1]: http://www.umiacs.umd.edu/~hal/docs/daume15lols.pdf

wagglycocks · on March 12, 2016

I feel like an ant in the presence of giants.