Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The article is hinting at this but I also think many people who complain that OpenAI didn't release the model don't understand how big this model actually is. Even if they had access to the parameters they couldn't do much with it.

Assuming you used single precision the model is 350 gigabytes (175 billion * 2 bytes). For fast inference the model needs to be in GPU memory. Most GPUs have 16GB of memory, so you would need 22 GPUs just to hold the model in memory and that doesn't even include the memory for activations.

If you wanted to do fine tuning, you would need 3x as much memory for gradients and momentum.



If it was open, there would be other services offering this, and not just an opaque beta and now a single expensive service.


I don't think it's a fair assessment as many researchers are disappointed that the model wasn't released. And I'm pretty sure they do understand the model size concerns.

Running inference on this massive model would be a really interesting challenge for people working on model compression and pruning as well as those working on low memory training. New challenges are always a good thing for research.

Personally, I just wish it was easier to get an access to their API. I have an experiment in mind that I can't wait to try.


Tell cryptocurrency miners that this is a big model to compute... the size of this problem seems very tiny.

If there are millions of ASIC, GPU, etc devices mining cryptocurrencies it is fair to speculate that democratizing AI has a special room in this model.


A one-time investment of $60,000, $200,000 in the worst case, isn't a way of dismissing the 'many people who complain the model [wasn't released]', especially given the alternative is 'being Microsoft', which costs $1,570,000,000,000.


You need tiny bit of memory for activations if you don't want fine-tuning. I think for GPT-3, fine-tuning is out of window. But it is reasonable to expect inference takes less than a minute with single 3090 and fast enough SSD.


OpenAI offers a fine-tuning api.

How did come up the one minute estimate? According to a quick google search I did, the fastest SSDs these days have a bandwidth of 3100 MB/s. So it would take 112s just to read the weights.


I don't have access to see whether they have fine-tuning API. Do you have any links explain the said fine-tuning? It is certainly surprising given there is no fine-tuning experiment mentioned in the GPT-3 paper.

Weights loading is embarrassingly simple to parallelize. Just use madam with 3 or 4 NVMe SSD sticks are sufficiently enough. You are more likely bounded by PCIe bandwidth than the SSD bandwidth. Newer NVIDIA cards with PCIe-4 support helps.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: