Hacker News new | past | comments | ask | show | jobs | submit login

~20 GB vram for the 7B model and 48 GB for the 13B model. It depends on the context size as well. I'd recommend renting a 4090 from a cloud provider like runpod/vast ai to get started, using a PEFT tutorial.



Thanks. What about the 70B model? I assume a 4090 will not be enough. Is it linear system requirements ?


4090 only has 24 GB and will only be able to fine tune (and merge, which is more memory intensive) the 7B model. The RTX6000 with 48 GB is able to fine tune the 13B model. The 70B model presumably needs multiple GPUs, like 4 RTX6000. For people starting out, you can also use a free GPU from Google colab to fine tune a 7B model. Finetuning 70B gets more expensive and I would suggest trying smaller models first with a high quality dataset.

It is mostly linear I think.


Thanks. My plan is to use this research cluster: https://www.ex3.simula.no/resources

I will probably train how to fine tune on the small model but I don’t really need to use a worse model to save money.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: