that is correct, however I am already using all of my VRAM. it would mean I have to degrade my model quality. I instead decided that I would rather have one solid model, and have all my use cases tied to that one model. using RAM instead proved to be problematic for the reasons I mentioned above.
if I had any free VRAM at all, I would fit faster-whisper before I touch any other LLM lol
if I had any free VRAM at all, I would fit faster-whisper before I touch any other LLM lol