Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s my understanding that, amazingly enough, blending the models is done by literally performing a trivial linear blend of the raw numbers in the model files.

Someone even figured out they could get great compression of specialized model files by first subtracting the base model from the specialized model (using plain arithmetic) before zipping it. Of course, you need the same base file handy when you go to reverse the process.



It is not typically possible to blend models like that, since the training process is (lateral) order insensitive, as far as the model goes.


I thought so too until found that there are quite a bit of literatures nowadays about "merging" weights, for example, this one: https://arxiv.org/pdf/1811.10515.pdf and also the OpenCLIP paper.


Is that still the case when all models have a common ancestor (i.e. finetuned) and haven’t yet overfit on new data?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: