~20 GB vram for the 7B model and 48 GB for the 13B model. It depends on the cont...

speedgoose · on Aug 12, 2023

Thanks. What about the 70B model? I assume a 4090 will not be enough. Is it linear system requirements ?

bart__ · on Aug 12, 2023

4090 only has 24 GB and will only be able to fine tune (and merge, which is more memory intensive) the 7B model. The RTX6000 with 48 GB is able to fine tune the 13B model. The 70B model presumably needs multiple GPUs, like 4 RTX6000. For people starting out, you can also use a free GPU from Google colab to fine tune a 7B model. Finetuning 70B gets more expensive and I would suggest trying smaller models first with a high quality dataset.

It is mostly linear I think.

speedgoose · on Aug 12, 2023

Thanks. My plan is to use this research cluster: https://www.ex3.simula.no/resources

I will probably train how to fine tune on the small model but I don’t really need to use a worse model to save money.