Hacker News new | past | comments | ask | show | jobs | submit login

I seem to recall that there a recent theory paper that got a best paper award, but can't find it.

If I remember correctly, their counter-intuitive result was that big overparameterized models could learn more efficiently, and were less likely to get trapped in poor regions of the optimization space.

[This is also similar to how introducing multimodal training gives an escape hatch to get out of tricky regions.]

So with this hand-wavey argument, it might be the case that two-phase training is needed: A large overcomplete pretraining focused on assimilating all the knowledge, and a second that makes it compact. Other, that there is a hyperparameter that controls overcompleteness vs compactness and you adjust it over training.




I don't see that contuer-intuitive at all. If you have a barrier in your cost function in 1d model you have to cross over it no matter what. In 2d it could be only a mount that you can go around. More dimensions mean more ways to go around.


This is also how the human brain works. A young babby will have something more similar to a fully connected network. Versus a Biden type elderly brain will be more of a sparse minimally connected feed forward net. The question is (1) can this be adjusted dynamically in silico and (2) if we succeed in that, does fine-tuning still work?


You don't have to compare to old age. Even 10 year old child has its brain pruned immensely when compared to its babyself.


The lottery ticket hypothesis paper from 2018?


Seems this way. Gigantic model, hit the jackpot, prune the nonsense. It doesn't seem like smaller models are enough tickets.


I guess we can think of it like one giant funnel; it gets narrower as it goes down.

Vs trying to fill something with just a narrow tube, you spill most of what you put in.


"Train large, then compress"




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: