The author of this piece calls Dynamic Programming "one of the top three achievements of Computer Science", however, it doesn't have much to do with computer science, as it's just a synonym for mathematical optimization, used seemingly exclusively for being "politically-correct" (avoiding the wrath and suspicion of managers) at RAND Corporation:
> I spent the Fall quarter (of 1950) at RAND. My first task was to find a name for multistage decision processes. An interesting question is, "Where did the name, dynamic programming, come from?" The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word "research". I’m not using the term lightly; I’m using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term research in his presence. You can imagine how he felt, then, about the term mathematical. The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose? In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various reasons. I decided therefore to use the word "programming". I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying. I thought, let's kill two birds with one stone. Let's take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense. It also has a very interesting property as an adjective, and that is it's impossible to use the word dynamic in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It's impossible. Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it as an umbrella for my activities.
>as it's just a synonym for mathematical optimization
That's exceedingly untrue; the domain of mathematical optimization includes so many algorithms unrelated to dynamic programming it's hard to imagine. Furthermore dynamic programming is used in algorithms that could only be considered optimization in the sense that any CS problem could be considered optimization of a trivial loss function where a solution is 0 and 1 is a non-solution.
Dynamic programming is at the heart of so many powerful and important algorithms in CS, (viterbi, bellman Ford, matrix multiplication, etc). The fact that it was named weirdly isn’t really a knock against it.
I was told ~30 years ago by a leading computer scientist in the NN field that biology has nothing to teach us in terms of implementation. I switched from CS to neuroscience anyway. I've wrestled with his statement ever since. I'll say that nothing I've seen since then has shown him wrong.
You understand airplanes thanks to air foils, not the study of birds, bumblebees, or bats.
Not that it's wasteful to study birds. It's just somewhat ironic that studying birds teaches us very little about practical flight.
---------
I think it's important to learn the human mechanism for learning. It's clearly not backprop, any more than a propeller compares to bird wings. Understanding human learning would have huge implications for the field of education, rhetoric, marketing, UI design and more.
What is the physical process of learning? There's just one now?
Adtech will never stop; it will spawn baby universes replete with simulated human beings complete with bad breath and failing hair follicles if they're convinced they can make one more ad dollar.
My neural networks teacher in university joked that "if you come here to study brains you are in the wrong place because this is a linear algebra course". I think that's a fair characterization and I wasn't very good at it/interested in it (the math part) and subsequently dropped out. However, it's a field that clearly takes its inspiration from the way biological brains work and then reduces that to a mathematical problem that we know how to optimize. It's a bit like not seeing the trees for the forest. There's a lot of matrix computation going on but in the end it's maybe not about that.
My observation of the field 25 years later is that it's still dominated by a lot of people good at math that are at this point generating results that are definitely very interesting yet not fully understood. We've gotten very good at wiring together matrix computations in interesting ways and training them to do useful stuff. But just like with real brains, we struggle to understand how the resulting networks actually work or how to design them properly.
I've seen high school kids work with tensor flow and they can definitely pull some magic tricks with that but they'd probably struggle to explain how the magic works as it's basically a black box component for them. And I suspect that is the case for large parts of the machine learning community as well. Math makes all of the lower level stuff work obviously but it's not a great tool for the more complex behavior that is basically emergent complexity of these networks. We can put them together and tinker with them but we seem to not have a deep understanding of how they work or even why they work. Very much like real brains.
OP here. I hope that's not the takeaway readers glean from my article - the point I was making was just that it doesn't make sense to shoehorn a biophysical learning mechanism into a DNN, rather we should use a DNN to find a biophysical learning mechanism.
As to whether biophysical learning has anything to teach us is an entirely different question which I don't discuss in the post.
I did get and agree with your point. And I do see great benefits using DNN to study biophysical learning mechanisms. I was indeed addressing that different question, but it's also an interesting one.
Nobody understands the biology fully enough to drive a better ML implementation. Individual neurons are very complex and their interactions even more so
Is there some truth to the “reverse” though —- that is, is the emerging patterns similar for similarish problems? What comes to mind is the similar first-layers in the human vision and google’s vision AI, with vertical/horizontal lines being “matched”.
While the continual one-upmanship of ever more intricate biologically plausible learning rules is interesting to observe (and I played around at one point with a variant of the original feedback alignment), I think OP's alternative view is more plausible.
Fwiw I am involved in an ongoing project that is investigating a biologically plausible model for generating connectomes (as neuroscientists like to call them). The connectome-generator happens (coincidentally) to be a neural network. But exactly as the OP points out, this "neural network" need not actually represent a biological brain -- in our case it's actually a hypernetwork representing the process of gene expression, which in turn generates the biological network. Backprop is then applied to this hypernetwork as a (more efficient) proxy for evolution. In the most extreme case there need not be any learning at all at the level of an individual organism. You can see this as the ultimate end-point of so-called Baldwinian evolution, which is the hypothesized process whereby more and more of the statistics of a task are "pulled back" into genetically encoded priors over time.
But for me the more interesting question is how to approach the information flow from tasks (or 'fitness') to brains to genes on successively longer time scales. Can that be done with information theory, or perhaps with some generalization of it? I also think it is a rich and interesting challenge to parameterize learning rules in such a way that evolution (or even random search) can efficiently find good ones for rapid learning of specific kinds of task. My gut feeling is that biological intelligence has many components that are ultimately discrete computations, and we'll discover that those are reachable by random search if we can just get the substrate right, and in fact this is how evolution has often done it -- shades of Gould and Eldredge's "punctuated equilibrium".
(if anyone is interested in discussing any of these things feel free to drop me an email)
I don't follow how punctuated equilibrium fits in here, but I do agree with your general intuition. Evolution 'likes' spaces that are navigable. Protein evolution is, in my mind, the paragon of this: even though the space of possible amino acid sequences is tremendously huge, since 2010 relatively few new folds have been discovered, and it seems that there are only ~ 100k of them in nature. See https://ebrary.net/44216/health/limits_fold_space
Proteins get the substrate right, and a handful of folds are sufficient for all the interactions an organism could need -- so evolution can find new solutions quickly. (It only took hundreds of millions of years for LUCA's parents to figure /that/ out.)
It seems that being able to parameterize the problem space such that solutions are plentiful and accessible via random search is nearly equivalent to solving the problem... In this case, using an ANN to stand in for ('parameterize') organismal development is entirely reasonable (and would hence 'solve' the problem), look forward to seeing the results of that. But as with the OP I'm cautious as to the efficiency of backprop vs evolution.
But, if your realistically-spiking, stateful, noisy biological neural network is non-differentiable (which, so far as I know, is true), then how are you going to propagate gradients back through it to update your ANN approximated learning rule?
I suspect that given the small size of synapses the algorithmic complexity of learning rules (and there are several) is small. Hence, you can productively use evolutionary or genetic algorithms to perform this search/optimization. Which I think you'd have to due to the lack of gradients, or simply due to computational cost. Plenty of research going on in this field. (Heck, while you're at it, might as well perform similar search over wiring typologies & recapitulate our own evolution without having to deal with signaling cascades, transport of mRNA & protein along dendrites, metabolic limits, etc)
Anyway, coming from a biological perspective: evolution is still more general than backprop, even if in some domains it's slower.
This is a good question. I think many "biologically plausible" neural models are willing to make some approximations for the benefit of computational power (e.g. rate coding instead of spike coding, point neurons and synapses instead of a cable model). As for non-differentiable operations, I think one strategy might be to formulate it as a multi-agent communication problem (e.g. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewFil...), where gradients are obtained via a differentiable relaxation or using a score-function gradient estimator (e.g. REINFORCE)
You can actually calculate exact gradients for spiking neurons using the adjoint method: https://arxiv.org/abs/2009.08378 (I'm the second author). In my PhD thesis I show how this can be extended to larger problems and more complicated and biologically plausible neuron models. I agree with the gist of your post though: Retrofitting back propagation (or the adjoint method for that matter) is the wrong approach. One should rather use these methods to optimise biologically plausible learning rules. The group of Wolfgang Maass has done exciting work in that direction (e.g. https://arxiv.org/abs/1803.09574, https://www.frontiersin.org/articles/10.3389/fnins.2019.0048..., https://igi-web.tugraz.at/PDF/256.pdf).
I was aware of Neftci's work, but not your result -- I stand corrected! Given the perspective, given LIF networks are causal systems, of course you can reverse it with sufficient memory. I understand the memory in this case are input synaptic currents at the time of every spike (e.g. what synapses contributed to the spike). This is suspiciously similar to spine and dendritic calcium concentrations. Those variables are usually only stored for a short time - but that said the hippocampus (at least) is adept at reverse replay so there is no reason calcium could not be a proxy for 'adjoint'. hum.
I agree that calcium seems like a natural candidate and I suggest as much in my thesis. Coming from physics, I didn't know about reverse replay in the hippocampus for a long time, but I also have this association now. I would be glad to talk more, is there a way to reach you?
Also from their paper Backpropagation in the brain:
"It is not clear in detail what role feedback
connections play in cortical computations,
so we cannot say that the cortex employs
backprop-like learning. However, if feedback
connections modulate spiking, and spiking
determines the adaptation of synapse
strengths, the information carried by the
feedback connections must clearly influence
learning!"
Hebbian learning is a good biologically plausible learning rule. It works.
Minsky's result used a popular but too simple model. Still, that led to back propagation which the field has been squeezing as much as it can out of since.
Decades ago that result was bypassed by adding a term of location into the network model (i.e. Hopfield+Hebbian) and modulating according to a locationally differentiated trophic factor (i.e. the stuff that the molecular processes of learning use as input). This allows for linearly inseparable functions to be learned (in not really but important "contradiction" to Minsky's result). Jeffrey Elman and others found this in the 90s and I was able to replicate it up to six dimensions in 2004. So we didn't really need back propagation, though it's been useful.
Admittedly these models remove even more legibility from the models.
Predictive coding seems not only plausible but also potentially advantageous in some ways. Such as being inherently well-suited to generative perception.
Deep Learning has nothing to do with biophysical neuron simulation, even though there is a confusing overloading of the term "neural network". A good introduction to deep learning is this chapter: https://mlstory.org/deep.html.
STDP falls under biophysical models of neuron simulation, where we try to faithfully reproduce biophysics of brain simulation (trivia: I started my undergrad in computational neuroscience and implemented STDP several times [1, 2, 3]). STDP is a learning mechanism, but it has not demonstrated the ability to learn as powerful models as DNN.
I have asked it in this thread already, but I am really interested in your answer to this as well:
> Is there some truth to the “reverse” though —- that is, is the emerging patterns similar for similarish problems? What comes to mind is the similar first-layers in the human vision and google’s vision AI, with vertical/horizontal lines being “matched”.
I don’t know much about NNs, but in the big picture isn’t it sort of similar? Instead of discrete on-off signals where multiple firings in a short succession result in higher activation, in NNs we sort of take the integral over time of said discrete signals.
Of course things like neuron fatigue are not accounted for in this model, but does the base idea differ all that much?
Or perhaps I misunderstood the whole topic and only the learning process’s difference is the question now?
I'm surprised that Numenta [1] did not get a mention in the comments. As a mostly lay person (with moderate exposure to Computational Neuroscience) I quite like their approach even though it seems their results have slowed down a bit of late. I still haven't finished parsing BAMI [2], but seems very interesting.
Given the obvious benefits of increased intelligence, the fact that hominid brain size (and presumably computational power) plateaued for the past 300,000 or so years, and that no other species has developed superior intelligence (ie there are bigger but not more effective brains out there), does seem to indicate that we are at or close to a local maxima for biological intelligence. Presumably that's some threshold beyond which the gains from increased computing power lead to limited improvements in intelligence. Of course that's not to say it's a global maximum.
In order to get brains as big as we already do, biology has us:
* Birthing prematurely compared to other species (most species live young are functional at birth) in order to fit the brain through the hips.
* Permanently reducing the mothers top speed (wider hips means the joints are exposed to higher forces while walking).
* Requiring 5% or so more food.
All three of those have been serious impediments to human survival for most of our existence, and provided a counter-pressure against improvements in biological intelligence.
In the past century, the Haber-Bosch process has meant we can reliably grow enough food without pursuit hunting. Without that counterpressure, evolutionary improvement seems very likely, but will take hundreds of thousands of years without genetic engineering or eugenics (neither of which seems like a good idea to me).
Neanderthals had larger brains than us, so nature definitely didn't hit a hard limit on how large ours could physically be. Yes there are costs to a larger, more powerful brain, but up until us those costs were justified by the benefits of increased intelligence. More intelligent apes could better raise children, relied less on running quickly, and could acquire more food.
Somewhere close to our current level of intelligence, the benefits of increased intelligence stopped justifying further improvement. The question is why here?
Birthing issues seems like a particularly weak explanation, given than that the negatives of bigger hips scale with the radius of the head while the benefits of intelligence scale with the volume - the more intelligent we get, the more it makes sense to keep making the skull larger. With increasing skull size, there is no need for babies to become any less functional. Calorie demands should scale linearly with intelligence, so it's possible increased intelligence doesn't help a hunter gatherer acquire more food beyond a certain point, though given how much food production has been increased through ingenuity this seems unlikely to be the case.
> negatives of bigger hips scale with the radius of the head
Implying linearity here is ridiculous. The increase in force applied to the hip joint scales with the radius of the head; cartilage gets damaged if you apply too much force but will last a lifetime if you don't.
> the benefits of intelligence scale with the volume
That's completely unsupported. Firstly because the number of brain cells scales with the volume, but there's no evidence that larger brains in humans are associated with higher intelligence (then again, most mutations are negative), and secondly because it implies a linear relationship between 'amount of intelligence' and 'amount of benefit'.
> given how much food production has been increased through ingenuity this seems unlikely to be the case
This has not happened on evolution-relevant timescales. There was famine in eastern Europe only 50 years ago. People died of hunger because their calorific needs were not met - selection pressure towards surviving famine has not stopped.
FYI: the obstetrical dilemma isn't so strongly supported nowadays [0]. I can't find the study i'm thinking of at the moment, but at least one study proposes the early birthing in humans allows the brain extra social development.
That link says wider hips don't slow down walking.
However, women need hip replacements at a far higher rate than men. Given men are typically heavier one would expect their joints to wear out faster, but as alluded in my comment upthread, widening the pelvis means walking exerts a higher peak force on the hip joint due to increased torque, which is a plausible mechanism for it wearing out faster.
Intelligence has a strong nurture component that we call education. It's unclear what good, if any, better genetics would do. We are already capable of reproducing and outsourcing many of the brain's functions to computers, and it seems likely we will reach the AGI endgame within a century or two conservatively. Arguably we have far too much intelligence and will end up causing our own destruction.
That, or our brains have hit some biological equivalent of Moore's law ending for silicon computing. Maybe getting a doubling in brain size would require a quadrupling in brain energy consumption (and power dissipation) at our level.
Yeah that's what I mean by not scaling above a threshold. Our brains could be bigger (neanderthals had larger ones than us), but those bigger brains for whatever reason weren't "worth it."
> and that no other species has developed superior intelligence
Another possibility is that other species are constrained by us, and we have manipulated our environment to reduce the survival advantage of increased intelligence for humans as a whole.
We've only been able to do so for a brief sliver of geological time. That earth would go billions of years without advanced intelligence only to have two such intelligences emerge independently within a few thousand years of eachother seems improbable.
I've been interested in AGI for a long time. But I've never been on board with the speculation about AGIs making multiple orders of magnitude improvement in intelligence.
That is total speculation, and also it's not necessary to assume so much to have a similar outcome. It seems much easier to imagine an AI that is 2 or 3 times smarter than a human, and has very fast transfer of knowledge with other compatible beings.
I think that's enough for them to take over, if there are enough of them.
But anyway it seems obvious to me that we absolutely should avoid trying to build fully autonomous digital creatures that compete with us. We should rather aim carefully for something more like a Star Trek computer, without any real autonomy or necessarily fully generalized skills or cognition or animal-like characteristics/drives.
But that day to day information doesn't need to be stored as weights in the network in cyclic networks like you see in biology. It can be stored in fluctuations of the data oscillating around, with the individual weights not really changing. Sort of like how your CPU doesn't change the linear region of it's transistors to perform new tasks.
Evolution and thought are the same process. It's funny how many people still don't get that. Gregory Bateson wrote about it in the 1970's. "Steps to and Ecology of Mind"; "Mind and Nature: a Necessary Unity"
Yes, thanks for asking, although I recommend reading the two books I mentioned by Bateson. He was an anthropologist who famously got into Cybernetics.
There are (at least) two ways to come at this, the physical-material and informational-metaphysical.
The physical-material level is being explored by Levin et. al. ( https://news.ycombinator.com/item?id=18736698 start here.) In brief, the bio-molecular machinery that our nerves use to think is present in all cells, intelligence is ambient, and we humans are nodes of accelerated processing, like Gaia's datacenter, if you will. Another way to say it is that all life thinks, and thinking is like respiration or digestion or anything else: cells evolving.
The metaphysical level is covered by Bateson (he is applying Cybernetics to evolution) and is more difficult to summarize. I just thought about it and I don't think I'm up to the challenge this morning. I'm sorry. :( But you can read the books! :)
That video by Michael Levin is one of the most idea rich, mindblowing research content filled presentation that i have ever seen in my life. I need to rewatch this a few more times and take notes for further research/study. Leaps of imagination and insight all backed by Hard Science.
Thank you for posting the link.
I need more videos/articles/books on GeneticEngineering/StemcellEngineering/Biochemistry/Bioenergetics/Neuroscience/Immunology and basically all frontiers of Biology.
> I spent the Fall quarter (of 1950) at RAND. My first task was to find a name for multistage decision processes. An interesting question is, "Where did the name, dynamic programming, come from?" The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word "research". I’m not using the term lightly; I’m using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term research in his presence. You can imagine how he felt, then, about the term mathematical. The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose? In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various reasons. I decided therefore to use the word "programming". I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying. I thought, let's kill two birds with one stone. Let's take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense. It also has a very interesting property as an adjective, and that is it's impossible to use the word dynamic in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It's impossible. Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it as an umbrella for my activities.
https://en.wikipedia.org/wiki/Dynamic_programming#History