Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Waiting for Mixed Quantization with MQQ and MoE Offloading [1]. With that I was able to run Mistral 8x7B on my 10 GB VRAM rtx3080... This should work for DBRX and should shave off a ton of VRAM requirement.

1. https://github.com/dvmazur/mixtral-offloading?tab=readme-ov-...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: