A modern self-referential weight matrix that learns to modify itself

heyitsguay · on April 13, 2022

I know Schmidhuber is famously miffed for missing out on the AI revolution limelight, and despite that he runs a pretty famous and well-resourced group. So with a paper like this demonstrating a new fundamental technique, you'd think they would eat the labor and compute costs of getting this up and running on a full gauntlet of high-profile benchmarks, in comparison with existing SOTA methods, vs the sort of half-hearted benchmarking that happens in this paper. It's a hassle, but all it would take for something like this to catch the community's attention would be a clear demonstration of viability in line with what groups at any of the other large research institutions do.

The failure to put something like that front and center makes me wonder how strong the method is, because you have to assume that someone on the team has tried more benchmarks. Still, the idea of learning a better update rule than gradient descent is intriguing, so maybe something cool will come from this :)

nullc · on April 13, 2022

Or they hurried the publication to avoid getting scooped and will follow up with interesting benchmarks later.

taneq · on April 13, 2022

If it’s really that new and different, maybe it’d be a little premature and even misleading to present the sort of full sweep you suggest. People are much better at pooh-poohing new ideas than at accurately assessing their potential.

P-NP · on April 16, 2022

"miffed for missing out on the AI revolution limelight?" Despite all those TV docs and newspaper articles about him? :-)

ricardobayes · on April 13, 2022

It's a super weird feeling to click on a hacker news top post and find out I know one of the authors. The world is a super small place.

watersb · on April 14, 2022

First, congratulations! It's a paper worth HN attention. Very cool.

Second: do Hacker News posts form a small-world network? I don't know. I don't even know if my question is well posed (it might be a meaningless question). Does the set of Hacker News articles change over time in ways that resemble annealing or self-training matrices? (likewise, I question this question, but I wonder.)

https://en.m.wikipedia.org/wiki/Small_world_network

goodmattg · on April 14, 2022

Need time to digest this paper, but you can assume if it's from Schmidhuber's group it will have some impact, even if only intellectual.

TekMol · on April 13, 2022

I have been playing with alternative ways to do machine learning on and off for a few years now. Some experiments went very well.

I am never sure if it is a waste of time or has some value.

If you guys had some unique ML technology that is different to what all the others do, what would you do with it?

drewm1980 · on April 13, 2022

Start with the assumption that someone has already done it... Do a thorough literature survey... Ask experts working on the most similar thing. Don't be disheartened if you weren't the first; ideas don't have to be original to have value; some ideas need reviving from time to time, or were ahead of their time when first discovered.

Szpadel · on April 13, 2022

ML is still fairly new topic and if you have some idea there is high chance that nobody actually tried it yet

swagasaurus-rex · on April 13, 2022

Create a demo of it doing -something-. Literally anything. Then show it off and see where it goes.

hwers · on April 13, 2022

Write a paper about it. Post it on arxiv.org. Contact some open minded researchers on twitter or here (show HN) for critique.

nitrogen · on April 13, 2022

You have to be affiliated with an institution to submit to arxiv.

rsfern · on April 14, 2022

You don’t strictly need an institutional affiliation, but it makes things easier for sure. Arxiv has an endorsement process you can go through to post without an affiliation

https://arxiv.org/help/endorsement

jah242 · on April 13, 2022

Sounds like we are in very similar positions and have a very similar question :). My only real plan so far is to try and beat or match SOTA on a recent benchmark from a large corporate / research lab, give them an email and hope they are willing to talk to you.

daveguy · on April 13, 2022

Demo speaks louder than words. If you don't want to go into the details of how it works, it would still be interesting to just see where it over and under performs compared to existing systems.

mark_l_watson · on April 13, 2022

Absolutely! Also, if possible, a Colab (or plain Jupiter notebook) and data would be good.

nynx · on April 13, 2022

I’d make a blog and post about my experiments.

andai · on April 13, 2022

And a video too, please :)

javajosh · on April 13, 2022

Host it on a $5 VPS with full internet access and "see what happens".

ggerganov · on April 13, 2022

I would make a "Show HN" post

Eliezer · on April 14, 2022

Don't burn the capabilities commons. You probably don't have anything, in which case, why bother people? If you do have something, that advances AI capabilities and shortens the time before AGI; and while nobody actually has anything resembling a viable plan for surviving that, the fake plans tend to rely on having more time rather than less time.

voldacar · on April 14, 2022

A bit LARPy, don't you think?

ur-whale · on April 14, 2022

> what would you do with it?

Use the "proof is in the pudding" method:

Do something with it - preferably useful - that no one else can.

Scene_Cast2 · on April 13, 2022

If you do end up posting any sort of musings on this topic, I'd be really interested in taking a look.

mark_l_watson · on April 13, 2022

I haven't really absorbed this paper yet, but first thoughts were Hopfield Networks we used in the 1980s.

For unsupervised learning algorithms like masked models (BERT and some other Transformers), it makes sense to train in parallel with prediction. Why not?

My imagination can't wrap around using this for supervised (labeled data) learning.

codelord · on April 13, 2022

I haven't read the paper yet, no comment on the content. But it's amusing that more than 30% of references are self-citation.

lol1lol · on April 13, 2022

Hinton et al. self cite. Schmidhuber et al. self cite. One got Turing, the other got angry.

savant_penguin · on April 13, 2022

Just skimmed the paper but the benchmarks are super weird

jdeaton · on April 13, 2022

I'm having a hard time reading this paper without hearing you-again's voice in my head.

nh23423fefe · on April 13, 2022

It's only a matter of time until the technological singularity

> The WM of a self-referential NN, however, can keep rapidly modifying all of itself during runtime. In principle, such NNs can meta-learn to learn, and metameta-learn to meta-learn to learn, and so on, in the sense of recursive self-improvement.

Everyone who doubts is hanging everything on "in principle" being too hard. Seems ridiculous to me, a failure of imagination.

_Microft · on April 13, 2022

"It suddenly stopped self-improving." "What happened?" "It ... it looks like it found a way to autogenerate content that is optimized for its reward function and now binges on it 24/7..." ><

cwmoore · on April 14, 2022

"It seems to have optimized for reruns of Breaking Bad, and exhibits markedly elevated energy consumption during the car chase scenes."

godelski · on April 13, 2022

I think there's some context missing when we're talking about the singularity, this is the whole Marcus "AI is hitting a wall" debate (maybe I'm reading part of that reference in your comment and it isn't there). For different people "hitting the wall" means different things and we're not really communicating well with one another. Marcus is concerned about AGI while others are concerned about ML in general and what it can do. So LLMs and LGMs (like Dall-E) are showing massive improvements and seem to be counter to Marcus's claim. But from the other side, there's still issues with solving AGI with things like causal learning and symbolic learning. But what's bugged me a bit about Marcus's claim is that those areas are also rapidly improving. I just think it is silly to say that Dall-E is a proof that Marcus is wrong rather than pointing towards our improvements in causal learning. But I guess few are interested in CL and it isn't nearly as flashy. I know Marcus reads HN, so maybe you don't think we've been making enough strides in CL/SL? I can agree that it doesn't get enough attention, but ML is very hyped at this point.

synquid · on April 13, 2022

Schmidhuber has written about recursive self-improvement since his diploma thesis in the 80s: "Evolutionary principles in self-referential learning, or on learning how to learn: The meta-meta-... hook".

Your quote sounds like it could just as well have been from that thesis.

Tr3nton · on April 13, 2022

There's a saying that goes, as soon as we can build it, it's not AI any more

jmmcd · on April 13, 2022

"In principle" is not only (1) hard in practice but also even in principle it is (2) limited by the capacity of the NN. It's (2) which gives me some reassurance.

erdos4d · on April 13, 2022

Even if the algorithms were here to do that job, where will the hardware come from? I'm staring at almost a decade of commercial processor (i7) lineage right now and the jump has been from 4 to 6 cores with no change in clock speed (maybe the newer one is even slower actually). There definitely won't be any singularity jumping off unless today's hardware gets another few decades of Moore's law, and that is not happening.

beebeepka · on April 13, 2022

Greetings, time traveller. Welcome to 2022! We've had 16-cire consumer CPUs for almost half a decade. And even higher core counts on the higher end platforms. But you already knew all that, didn't you

dotnet00 · on April 13, 2022

It's a bit dishonest to be looking at specifically the manufacturer who spent most of the past decade enjoying its effective monopoly on desktop CPUs as a reference for how computers have improved. Even moreso since 4 and 6 core CPUs are not representative of the high end systems used to train even current state of the art ML models.

armchair_ · on April 13, 2022

Well, the code that was published alongside the article is written in Python and CUDA, so you're not looking at the right kind of processor to start.

My 5-year-old, consumer-grade GPU does 1.5 MHz * 2300 cores, whereas the equivalent released this year does 1.7 Mhz * 8900 cores. Granted, not the best way to measure GPU performance, but it is roughly keeping pace with Moore's law, and it's going to be a better indicator of the future than Intel CPU capabilities, especially for machine learning applications.

erdos4d · on April 13, 2022

So you are saying that we can get another 3 decades of Moore's law by switching to GPUs made of silicon rather than CPUs made of silicon? Well fuck, problem solved then. I was completely unaware it was so easy.

VyperCard · on April 13, 2022

Yeah, but mostly for matrix multiplications

ponow · on April 14, 2022

GHz not Mhz for your GPU specs.

p1necone · on April 14, 2022

I'll give you the benefit of the doubt and assume you just haven't looked at the CPU market in the last 5 years, but AMD kicked Intels ass, and Intel is also finally being forced to compete now. There's even a 16c/32t CPU on AMDs standard consumer socket: https://www.amd.com/en/products/cpu/amd-ryzen-9-5950x

pizza · on April 13, 2022

Singularity is not that important. Scale and locality is. Information that has to travel across the world suffers from misordering/relativity. Same for across the room or across a single wire that is nondeterministically but carelessly left unplugged. An oracle doesn’t help in that case. Instead what you want is a new kind of being.