It’s my understanding that, amazingly enough, blending the models is done by literally performing a trivial linear blend of the raw numbers in the model files.
Someone even figured out they could get great compression of specialized model files by first subtracting the base model from the specialized model (using plain arithmetic) before zipping it. Of course, you need the same base file handy when you go to reverse the process.
I thought so too until found that there are quite a bit of literatures nowadays about "merging" weights, for example, this one: https://arxiv.org/pdf/1811.10515.pdf and also the OpenCLIP paper.
Someone even figured out they could get great compression of specialized model files by first subtracting the base model from the specialized model (using plain arithmetic) before zipping it. Of course, you need the same base file handy when you go to reverse the process.