Differentiable Plasticity: A New Method Learning to Learn

no_identd · on April 10, 2018

"What if the plasticity of the connections was under the control of the network itself, as it seems to be in biological brains through the influence of neuromodulators?"

Anyone who wishes to explore this idea would do well to go back to the basics of neural nets and read Warren McCulloch's seminal papers on neural nets, from the 40s:

http://www.cse.chalmers.se/~coquand/AUTOMATA/mcp.pdf A Logical Calculus of the ideas immanent in nervous activity

http://vordenker.de/ggphilosophy/mcculloch_heterarchy.pdf A heterarchy of values determined by the topology of neural nets

(After having read those two papers, one can then try to make sense of Heinz von Förster's masterpiece, http://www.univie.ac.at/constructivism/archive/fulltexts/127..., Objects: Tokens for (Eigen-)Behaviors, which also bears some relevance to this matter. However, most people find it incomprehensible.)

nairboon · on April 10, 2018

Thank you for those papers! Do you have some further suggestions that build upon Försters idea?

no_identd · on April 14, 2018

Try anything citing von Förster's paper that was written by:

• Louis H. Kauffman if you're mostly interested in the pure math aspect of it. He wrote numerous papers on the topic of von Förster's paper, you can find one them here: https://arxiv.org/abs/1109.1892

• JM Stern and CAB Pereira if you're mostly interested in the application to statistics and fundamental questions of the epistemology of statistics. I wrote a thread on their works on my Twitter account at some point: https://twitter.com/no_identd/status/877883663014400000

Try this AMAZING paper by AF Zimpel: http://www.emeraldinsight.com/doi/pdf/10.1108/03684920510581...

Also, this paper by Heinz von Förster (first order author) and Karl H. Miiller (second order editor), where Miller basically took a lot of old papers by von Förster and rearranged them to give a more coherent view for certain types of readers:

http://www.univie.ac.at/constructivism/archive/fulltexts/309...

trisimix · on April 10, 2018

Oh god my Im going to bomb finals if I read this

guskel · on April 11, 2018

Don't bother. Reading through the third link reveals that it's more alike thoughts during an opiate induced dream dressed in flowery language than actual science. Disappointing, really.

no_identd · on April 14, 2018

Sounds like you struggle with the math notation and with the epistemological concepts involved.

Did you read & comprehend the two papers linked before trying to make sense of it?

If so, mind sharing your notes? Pretty sure you must have taken some if you claim to have comprehended them.

dpflan · on April 10, 2018

Interesting. Some highlighted links from the writeup:

1. Differentiable plasticity: training plastic neural networks with backpropagation (https://arxiv.org/abs/1804.02464)

2. Born to Learn: the Inspiration, Progress, and Future of Evolved Plastic Artificial Neural Networks (https://arxiv.org/abs/1703.10371)

3. Github for the project: https://github.com/uber-common/differentiable-plasticity

4. Learning to Learn (http://bair.berkeley.edu/blog/2017/07/18/learning-to-learn/)

5. Meta-Learning: http://metalearning.ml/

trextrex · on April 10, 2018

Very cool. It's interesting how powerful the recurrent network becomes with the addition of the learned hebbian term. For context, even without the Hebbian term, recurrent networks can learn to learn to do quite interesting things (Hochreiter et al. 2001).

Shameless plug -- our lab recently ported LSTMs to spiking networks without a significant loss in performance, and showed that learning to learn works quite well even with spiking networks (Bellec et al. 2018).

So it seems like this method of learning to learn could provide a extremely biologically realistic and fundamental paradigm for fast learning. The addition of the Hebbian term neatly fits in with this paradigm too.

Hochreiter et al. 2001: http://link.springer.com/chapter/10.1007/3-540-44668-0_13

Bellec et al. 2018: https://arxiv.org/abs/1803.09574

dchichkov · on April 10, 2018

It’d be interesting to compare this approach against a simpler baseline: setting a different (10 – 100 times higher?) learning rate for a fraction (10% ?) of neurons in an LSTM.

fizx · on April 11, 2018

Interesting... I mean knockout already does this (sort of), by setting the learning rate of 10% artificially low. I'm not sure if inverting the common thing is useful, but it might be fun to try.

letitgo12345 · on April 10, 2018

Is the plasticity update guaranteed to reach equilibrium assuming the network is run on iid data (as in do the H_{ij} values reach a fixed point)?

Edit: Seems like it should be reached eventually as the equilibrium point is H_ij = y_i * y_j and they keep doing a weighted average of the former with the latter (this is not a proof ofc as y_i * y_j keeps changing with each sample).

adrianratnapala · on April 11, 2018

So the "plastic component" of a connection strength is a thing which decays away exponentially, but is replenished whenever the two endpoints do the same thing.

I have heard that neuroscientists have an adage "fire together wire together". Is that all that ML people mean by "plasticity".

signa11 · on April 10, 2018

very cool stuff ! it might be possible to use this stuff for pruning edges which are not that plastic as well.

whatever1 · on April 10, 2018

Good luck with getting even more suboptimal solutions with this extra non linearity.

No wonder why when your autonomous cars are plowing into people or walls you have no clue of what is going on.

btcindivist · on April 11, 2018

You can use the same trick to make your models sparse. Less parameters means less flexibility to interpret output "correctly".

atomical · on April 10, 2018

Seems like a LIDAR problem.

whatever1 · on April 10, 2018

Teslas do not have lidars. They still drive happily towards stationary objects.

Because somehow we accepted that very bad probabilistic solutions to a very tough problem are good enough.

petters · on April 11, 2018

Holmes: You have a Lidar problem.

Tesla: But we don't use Lidars!

Holmes: That's the problem.

ben_w · on April 11, 2018

Humans don’t have lidar. If machine vision isn’t good enough to drive a car at human levels without lidar, then it isn’t good enough yet.

(IIUC, it is superhuman overall but with some really dumb edgecases where humans go “well that was obviously wrong, why didn’t it see the thing?”)

Firadeoclus · on April 11, 2018

But that's only because our edge cases and the machine vision edge cases don't match, since the underlying concepts are different.

And frankly, we probably don't want them to match. The main goal should be to make the cars drive better than humans on average, even if they're not perfect. And we know that there will always be edge cases.

ben_w · on April 12, 2018

Agree they should not match, however I think it is also important that people trust the tech, which in turn requires it to not make any mistakes we would consider obvious. People are really bad at estimating risk, a problem which I think will be easier to overcome this way.

A system which makes exactly the same mistakes as a competent human but never gets tired, never uses the phone, has 360 degree vision with no blind spots… even that would be a huge improvement over the status quo. On the other hand, a system which crashes because it missed something Joe Average calls obvious when they see the black box pictures on the news… that system will never be trusted enough to replace human drivers, not even when it has a tenth of the fatality rate.