> Merged means you are modifying the model weights, which means you are stuck with that one model on that device (though, this usually applies for most implementations for the unmerged versions too).
If one is careful with floating point issues, it's straightforward to unmerge the weights.
Right, it's mathematically easy (again, up to floating point issues) to recover the weights as needed, but in terms of distribution/serving I'm guessing the plan is to have the original weights and carry around the LoRA weights and merge as necessary.
(Also, I'm assuming you're the first author of LoRA.)
Yes, the plan is to keep the original weights in VRAM and merge/unmerge LoRA weights on the fly. You can even cache a large library of LoRA ckpts in RAM.
If one is careful with floating point issues, it's straightforward to unmerge the weights.
W_0 = W_1 - BA
Yes, prompt-based methods don't involve swapping weights.