Deep Learning 2016: The Year in Review

AndrewKemendo · on Dec 31, 2016

I find it helpful to think of developments in deep learning as being driven by three major frontiers...Firstly, there is the available computing power and infrastructure,...secondly, there is the amount and quality of the training data and thirdly, the algorithms"

I'm really glad this was the approach taken and his examples really show why these are the important metrics with respect to DL. It's hard to underplay the dependencies between these three frontiers. Radical breakthroughs in algorithms make it easier to get better results on existing infrastructure. Better data on the same algorithms and infrastructure can transform results, and on and on.

IanCal · on Jan 1, 2017

I think there's been a big impact on having commonly used tools too. Tensorflow and the like have allowed people to share and easily build on each others work, including on shared models. Not unique to 2016, but I remember the struggles of trying to get hacked together matlab code working that it turned out was actually slightly different than must have been run for the paper. Now I'd expect a lot of work to come with a setup that anyone can easily get and try themselves.

hacker_9 · on Dec 31, 2016

GANs seem really interesting, and a logical step if I'm right? As far as I understand they essentially make the fitness function more detailed, leading to higher quality output from the network as it get's more accurate good/bad feedback.

deepnotderp · on Dec 31, 2016

It's more a combination of two things: 1) finding a clever way to do unsupervised learning. And 2) the Adversarial loss enables types of "loss functions" that wouldn't be possible with day, mse. It's almost like having a human as a loss function.

deepnotderp · on Dec 31, 2016

*say, not "day"

bluetwo · on Dec 31, 2016

There seems to be work on "big fish" to be done, like:

- Weather Prediction

- Stock Market Prediction

- Healthcare (Better prediction, better reading of diagnostic tools)

- Poker

hueving · on Dec 31, 2016

>- Stock Market Prediction

I can assure you this is being done. Anything that works is kept secret.

deepnotderp · on Jan 1, 2017

Ditto, understandably, it's kept secret but would be nice to at least see some of the models, presented outside the trading context. That being said, hft world still primarily uses linear/logistic regression.

agibsonccc · on Jan 1, 2017

We see a lot of finance work(trend prediction in markets being 1 of these things. For weather forceasting, moji (HUGE weather in app in china) does a lot of deep learning based on data collected from their mobile app.

malaya_zemlya · on Jan 1, 2017

Yandex.Weather relies heavily on deep convnets for short term forecasting.

foenix · on Jan 2, 2017

http://healthcare.ai/ is an early-alpha tool for healthcare data using Python and R.

stared · on Jan 1, 2017

In chaotic systems (like weather or stock market) there is a fair share of randomness, which cannot be predicted no matter what approach we use.

walrus1066 · on Jan 1, 2017

Not to mention huge externalities. In the case of stock markets, things like financial crises, central bank decisions, political events (brexit etc).

andruby · on Jan 1, 2017

Some of those externalities could be predicted by deep learning (the crash of the housing market for example)

elcct · on Jan 3, 2017

I think these could have easily be predicted using common sense ;)

Moshe_Silnorin · on Dec 31, 2016

Naive question, how many parameters do most large models have?

gwern · on Dec 31, 2016

Low millions to a max of 1b or so (excluding special cases like "Outrageously large neural networks" paper). The recent Google Translate RNN was in the neighborhood of 50m parameters, IIRC. It can be a little difficult to calculate since you have a lot of weight-sharing and tied-weights going on in CNNs and other architectures.

Moshe_Silnorin · on Dec 31, 2016

So the largest are about the size of a bee brain? Amazing how many applications you can get from insect-brain sized networks, about a hundred thousand times smaller than a human brain if a parameter is at all analogous to a synapse. Seems like human-level AI is a ways off though.

argonaut · on Dec 31, 2016

A parameter != a neuron. A single neuron probably has thousands, if not hundreds of thousands, of "parameters" (of course the vast majority are probably unrelated to learning - but given we barely understand neurological learning it's hard to say).

Moshe_Silnorin · on Dec 31, 2016

I was comparing parameter counts to synapse counts, not neuron counts.

deepnotderp · on Dec 31, 2016

Please don't make these sorts of comparisons between biological neurons and our "neural networks". They're really not the same thing at all. We'd rather call them "differentiable networks", and IIRC, the human brain is supposed to have only 6 layers in the visual cortex.

apl · on Jan 1, 2017

Cortical layers are in no way equivalent to DNN layers.

argonaut · on Jan 1, 2017

The fair comparison is comparing the number of "free variables" in the system. This is how machine learning models are compared. You don't compare "number of free parameters in linear regression" with "number of layers of neural nets," you compare number of parameters with number of parameters.

gwern · on Dec 31, 2016

Wouldn't that imply the other way around? If you can get these amazing, often human-level, performances out of tiny insect-brain equivalent NNs - which I don't believe anyone predicted in advance - that suggests that the estimates of thousands of parameters or megaflops per neurons (see the Whole Brain Emulation Roadmap for a compilation of estimates) are irrelevant and human-level AI requires much less computation than expected.

Moshe_Silnorin · on Dec 31, 2016

Naive trend-extrapolation (assuming 1 synapse == 1 parameter) implies 17 iterations of Moore's law before we get human level AI, or some sort of ASIC, I suppose. This is what made me think human-level AI is a ways off.

Brain dead trend-extrapolation is a pretty good prediction tool, but I suppose my assumption that modern NNs aren't more efficient was sort of an arbitrary assumption that makes the exercise useless. Is there any work on this?

I'm not sure what evidence the performance of these systems show us, as bees are pretty smart for their size. These system are about as optimized for lip-reading and translation as bees are for their various tasks, so it wouldn't be surprise me if it turned out that they are similarly powerful per synapse to NNs.

tshadley · on Dec 31, 2016

17 years doesn't seem too far off. Tim Detmer's detailed analysis puts it anywhere from 2037 to 2078, with his preference the latter (or later).

http://timdettmers.com/2015/07/27/brain-vs-deep-learning-sin...

A key assumption I don't share with Detmer is that we must match the brain's raw computational power to have a chance at human-level AI. From an evolutionary point of view, most of the brain's computations surely must be superfluous to intelligence and necessary instead to a vast host of other biological functions.

gwern · on Jan 1, 2017

> Naive trend-extrapolation (assuming 1 synapse == 1 parameter) implies 17 iterations of Moore's law before we get human level AI, or some sort of ASIC, I suppose.

I'm not sure how you get that. My point is that artificial NNs may well be more efficient than biological ones because they are punching so far above their weight based on what you would expect from a naive comparison of parameters, in which case it's not 17 iterations but a lot less. Further, modern NNs don't typically max out the memory of a single GPU (since we want to do minibatches of at least n=10 for the training phase and researchers care a lot less about the forward passes or deployment), and groups like Google Brain or Deepmind have shown the ability to use ~1000 GPUs, so that's 4 orders of magnitude right there (doable with asychronous training like synthetic gradients).

> I'm not sure what evidence the performance of these systems show us, as bees are pretty smart for their size.

What do bees do that is remotely as demanding as being able to translate between English, French, German, Japanese etc at very high quality?

Moshe_Silnorin · on Jan 1, 2017

The human brain has 100 trillion synapses. So it would be 17 iterations of Moore's law for computing power to increase 100000x. That's what I was thinking. Wrong to assume NNs are no more efficient than brains. You know way more about this than me, so I'm just going to replace my opinion on this with your own. If you think we can chop off 4 orders without the help of Moore, I believe you, though this is much less comforting than my prediction!

>What do bees do that is remotely as demanding as being able to translate between English, French, German, Japanese etc at very high quality?

This seems more impressive than anything any one NN can do, but less impressive than what all NNs can do: https://en.wikipedia.org/wiki/Bee_learning_and_communication

rand_r · on Jan 1, 2017

How about finding and landing on a flower while navigating in 3D space?

j2kun · on Dec 31, 2016

I read somewhere[1] that because we know far more now about human neurons than we did five years ago, modern CNN's are better compared to a _single_ neuron (but are probably still simpler than them). I'm far from an expert in brain biology, but what I took away from that is that it's specious to make these comparisons based on our current knowledge of either field.

[1]: http://timdettmers.com/2015/07/27/brain-vs-deep-learning-sin...

gwern · on Jan 1, 2017

> modern CNN's are better compared to a _single_ neuron

You obviously cannot reach human-level performance on ImageNet or do anything at all with a single neuron, so such an absurd equation refutes any such calculation or attempt to use a comparison as an upper bound.

j2kun · on Jan 1, 2017

The linked article explains how a single human brain neuron is a lot more complex than previously thought. Obvious is also a word we should avoid using in discussions about brain biology.

eva1984 · on Dec 31, 2016

Talking about size, but not specific architecture is pretty meaningless at this moment.

The assumption that only when we have comparable number of parameters to the number of brain cells we can have nearly human performance is no longer true. It gradually loses traction once people discover that stochastic connections may be more effective than merely increase layers/parameters.

ma2rten · on Dec 31, 2016

I am not surprised at all how many applications you can get from insect-brain sized networks, since they only solve one specific purpose. If the same network could solve all those problems that would be surprising.

I also think neural networks could save alot of parameters if they use sparse connections. That just doesn't happen to work well with our current hardware.

gwern · on Jan 1, 2017

> I am not surprised at all how many applications you can get from insect-brain sized networks, since they only solve one specific purpose. If the same network could solve all those problems that would be surprising.

Transfer learning demonstrates that the 'specialization' objection is moot. The CNNs are learning large parts of what goes into general-purpose vision, not super-narrow specific things, otherwise it would be impossible to take a classification CNN and reuse only a tweaked version for image captioning, localization, tagging, segmentation etc and find the intermediate layers learning semantically meaningful things.

mrfusion · on Dec 31, 2016

What do you guys think the next big thing after deep learning is and when will it come?

mindcrime · on Dec 31, 2016

My personal feeling: I think that the "next big thing" will be making real progress in terms of integrating probabilistic approaches (like deep neural networks) and symbolic processing (think "prolog" or "planning algorithms", FOL, or "STRIPS", things of that nature).

Deep neural networks are great at recognizing patterns, but they still don't give us much in the way of reasoning. So, "I should start cleaning now because lunch is at 1:00 and it takes uncle Bob 30 minutes to drive over here..." kind of stuff is - so far - pretty much outside of the realm of ANN's.

If / when we have systems that can utilize the best of probabilistic pattern matching and formal logic, I think we'll see a dramatic improvement in the "intelligence" available.

Note that there is no reason, in principle (at least that I'm aware of) that "formal logic" might not ultimately turn out to be an emergent aspect of a sufficiently complex ANN. Given that our brain can do formal logic, and does at times, and is "just" neural networks as far as we know, it seems likely. That said, in the short-term, I'm guessing we'll have hybrid systems that explicitly feature various AI techniques that have been developed over the years, sharing some representations (of concepts, knowledge, etc.). Maybe the "thought vectors" that Geoffrey Hinton has spoken of could be a step down that path. I'm also curious to see if something like rule induction using CN2 might not play a role. Or maybe it'll involve "semantic networks", something like ConceptNet. Who knows?

If I had to summarize, I'd say the answer is something like "coming up with some form of knowledge representation that allows us to share 'thoughts' or 'knowledge' between disparate AI/ML systems".

orthoganol · on Jan 1, 2017

> Note that there is no reason, in principle (at least that I'm aware of) that "formal logic" might not ultimately turn out to be an emergent aspect of a sufficiently complex ANN.

In principle, one reason against the ANN purist's approach for replicating human intelligence is that it ignores all the things our brains already have programmed from the moment we step into the world; we obviously aren't anything close to a blank slate. So I think there's a burden on advocates of pure ANN to explain why our core intelligence or "formal logic" is in fact independent of this pre-programming.

mindcrime · on Jan 1, 2017

So I think there's a burden on advocates of pure ANN to explain why our core intelligence or "formal logic" is in fact independent of this pre-programming.

That's a fair question, and I take no position on it. My take on ANN's is that they are useful tools. I'm not particularly interested in the degree to which they accurately replicate how the human mind works. I assume, perhaps too quickly, that the analogy is pretty close, but I admit that I can't prove that.

In either case, I still feel relatively confident that formal logic could emerge on top of a sufficiently complex ANN. One thing I don't know off hand, which would be interesting, is whether or not any specific research has bee done on proving or disproving that assertion.

orthoganol · on Jan 1, 2017

I'm not sure... if you discard appeal to the human brain metaphor/ authority, which was the purpose of my comment, why should one be confident that formal logic could emerge from an ANN? It seems like the entire argument about the potential of ANNs replicating human intelligence revolves around an appeal to the human brain, but I think it's an extremely fragile metaphor.

mindcrime · on Jan 1, 2017

why should one be confident that formal logic could emerge from an ANN?

It's closer to a gut feeling than anything. I believe if fairly strongly, but I haven't done any research in terms of trying to prove the point one way or the other. Don't read too much into it... it's just something I think. I may well be wrong.

eveningcoffee · on Jan 1, 2017

our brains already have programmed from the moment we step into the world; we obviously aren't anything close to a blank slate.

I believe that this statement dismisses the fact that humans have plenty of time to train their brains before the birth based on multiple sensory inputs.

orthoganol · on Jan 1, 2017

Any research you could link for this?

I get the impression from the 101 child development textbooks that infants display an impressive number of schemas and knowledge despite little, if none environmental exposure, but I'm not familiar with arguments linking these schemas and knowledge to exposure in the fetal stage.

eveningcoffee · on Jan 7, 2017

Sorry for the late reply and sorry again as I do not have any exact references, but for example new born are able to recognise their parents voice.

Babies are exposed to the environment to some degree (noise, gravity etc) and it would be very dismissive to ignore it.

gallerdude · on Dec 31, 2016

Hopefully deep learning gives us some sort of building blocks for a general intelligence. It's too early to tell so far whether we're getting warmer or colder.

One thing that is for sure though is that once we have artificial general intelligence, everything changes infinitely forever.

rspeer · on Dec 31, 2016

Let me make several similar statements:

"Hopefully machine learning techniques give us some sort of building blocks for a general intelligence."

"Hopefully knowledge graphs give us some sort of building blocks for a general intelligence."

"Hopefully Bayesian models give us some sort of building blocks for a general intelligence."

"Hopefully rule-based systems give us some sort of building blocks for a general intelligence."

"Hopefully LISP gives us some sort of building blocks for a general intelligence."

"Hopefully computers give us some sort of building blocks for a general intelligence."

People have said approximations to all of these over time. All of these probably are building blocks. But there's no reason to believe that we're about to build the top of the tower.

I believe we should understand AI in terms of what it can do for us now, and that AGI keeps appearing to be 50 years away because it's actually centuries away and we don't even understand what we don't understand yet.

gnipgnip · on Jan 1, 2017

Indeed! I understand that getting symbolic reasoning out of NN-like architectures is the new "body-mind" problem (J. Tenenbaum ?), but I mean we already have that with PGMs and Markov logic and all those things.

I really don't see why DNN change the game fundamentally. I mean graphical models, they really changed the way of thinking in the field. DNNs in contrast are really still fundamentally building on things built by Yann LeCunn and gang.

pmyjavec · on Jan 2, 2017

once we have artificial general intelligence, everything changes infinitely forever.

How do you think it will change everything infinitely forever? Are you confident this would be a positive change for us and all species on earth? This is where I'm a little lost about the enthusiasm for AGI but I'm curious what others think?

gallerdude · on Jan 2, 2017

Once you have a general intelligence, you can have something that can recursively improve itself.

The first few improvements may be tough for it, but when you consider that it could think 1000s of times faster than humans, you can imagine how this could happen very quickly.

And then once you assume that it can recursively improve itself, we enter the realm of "the intelligence with the IQ of 5000."

The person with the highest IQ ever was 190 [1]. Imagine the kind of things we could do with an intelligence of 5000. We're talking atomic reassemblers, grand unified theories, and (my dream) a personal paradise Matrix for everyone [2].

[1] https://en.wikipedia.org/wiki/Marilyn_vos_Savant [2] http://www.smbc-comics.com/?id=3183

Note, this sidesteps all ethical alignment issues. This is assuming that the intelligence is on the same team as us. My thought process is "we're all going to die anyway so let's give it a shot," but a lot of people disagree with me (which is fair).

pmyjavec · on Jan 3, 2017

So, it seems the big hope for the future of AI, is creating a "Genie" or "Wizard" that works for all of us and looks after our interests?

monk_e_boy · on Dec 31, 2016

I asked the Google deep mind team how their brain would work in 1. A mouse and 2. A lizard. They replied that they are no where near that level of intelligence.

I think AI will split into two sets, biological inspired that will be general purpose. The other will be tool, much moreasier alien and specific to a domain. Bit like how aeroplanes don't look like birds.

jspisak · on Jan 5, 2017

Amazon backing MXNet is pretty awesome and having it be a 'real' open project is great for the community..

nojvek · on Jan 1, 2017

I've been playing with openai universe. I would be shit scared by the algorithm that achieves super human ability. With nvidias super computers, big labelled datasets and better algos, it's really hard to grasp what the big player and govt is capable of.

WilliamDhalgren · on Jan 2, 2017

honestly whenever I read anything written about AlphaGo, I wanna start pulling out my hair at the inanity of it. Look at this utterly uninformed comment for example:

> As far as algorithmic ingenuity goes, this is pretty much all there is to it. With all the hype surrounding AlphaGo's victory this year, its success is just as much if not more attributable to data, compute power and infrastructure advancements than algorithmic wizardry.

... Like is anything about this sentence even approximately true??? There was nothing new about the dataset used (and to this day I can't understand why they use the exact datasets they do; KGS for the predictive net, and Tygen for rollout softmax - both amateur player databases, and don't even use the GoGoD database of pro players; seems other teams have comparable or better results at prediction with it), nor in using a sizable cluster to execute a go playing algorithm (limits to the size of it were and are when the algorithm one uses hits steep diminishing returns but anyhow, already MoGo was using rather big ones), nor in training on such a dataset for go playing. The only damn difference were PRECISELY the algorithms used, exactly in contradiction to the claim here! The nugget of truth there is that the algorithms aren't terribly innovative , just their application in this problem - but its training setup and targets are however literally groundbreaking.

Rewind the time a little bit, to the end of 2014, and you'll see the result of an Oxford team, as well from Google's team, that demonstrate a large improvement in the accuracy of the move predictor task - given a board position, predict the next move from an actual game, by using a convnet to do it. That was a first sign that deep learning had potential in go, though at that point it didn't make for a particularly strong player. Systems were getting to a point where they could bias the search effectively with such a convnet for decent gains when the AlphaGo result was announced, that dwarfed these already exciting advances!

So clearly its not about "just" using more computers nor bigger datasets, nor just doing the straightforward thing in applying deep learning to the problem; all of the above was done before AlphaGo, yeah it helped but there was just no contest between the 7d KGS amateur rank the best of the rest were getting and at or beyond top humans AlphaGo did.

AlphaGo came up with the second component, and actually solved a problem thought unfeasable in the computer go world - creating an evaluation function for go. The entire monte carlo tree search revolution of the mid-to-late 00' in computer go was how to sidestep the problem of evaluating if a particular board position is good or bad, by just running full stochastic playthroughs of the game and scoring them instead. AlphaGo on the other hand first created a decent-ish player network (though honestly nothing special - 5d KGS - that's the one that's finetuned by reinforcement learning and notably isn't even a part of the final configuration but just generates a large dataset effectively, cuz humans just haven't played enough games in history for this training setup), then generated a large dataset of games this network was made to play, and then trained (supervised!) a net on predicting the game outcome, given a board position (and taking just one position from each game, somewhat conservatively avoiding overfitting this way).

THIS is the genious of the AlphaGo algorithm; it is a monte carlo tree search algorithm, biased by a convnet, rolled out by a softmax, and crucially with an evaluation function that is mixed, 50%-50% with rollout scores

THAT is a completely novel algorithm, nothing in the literature is particularly like it, in particular the evaluation function and its mix with rollouts was not considered possible! And it works orders of magnitude better than any other monte carlo tree search tried since 2006, as well as orders of magnitude better than other deep learning biased approaches tried since 2014.

And to think that the least of these things; ie the from all the work done since 2014 on AlphaGo, 3 days (!!!) spend on finetuning(!!!) the prediction net by reinforcement learning to make it stronger (though still just a 5d amateur) so as to generate the needed learningset for another component is the only thing mentioned about their approach !?