The article is hinting at this but I also think many people who complain that Op...

tarboreus · on Sept 26, 2020

If it was open, there would be other services offering this, and not just an opaque beta and now a single expensive service.

ebalit · on Sept 26, 2020

I don't think it's a fair assessment as many researchers are disappointed that the model wasn't released. And I'm pretty sure they do understand the model size concerns.

Running inference on this massive model would be a really interesting challenge for people working on model compression and pruning as well as those working on low memory training. New challenges are always a good thing for research.

Personally, I just wish it was easier to get an access to their API. I have an experiment in mind that I can't wait to try.

wslh · on Sept 26, 2020

Tell cryptocurrency miners that this is a big model to compute... the size of this problem seems very tiny.

If there are millions of ASIC, GPU, etc devices mining cryptocurrencies it is fair to speculate that democratizing AI has a special room in this model.

anoncareer0212 · on Sept 26, 2020

A one-time investment of $60,000, $200,000 in the worst case, isn't a way of dismissing the 'many people who complain the model [wasn't released]', especially given the alternative is 'being Microsoft', which costs $1,570,000,000,000.

liuliu · on Sept 26, 2020

You need tiny bit of memory for activations if you don't want fine-tuning. I think for GPT-3, fine-tuning is out of window. But it is reasonable to expect inference takes less than a minute with single 3090 and fast enough SSD.

ma2rten · on Sept 26, 2020

OpenAI offers a fine-tuning api.

How did come up the one minute estimate? According to a quick google search I did, the fastest SSDs these days have a bandwidth of 3100 MB/s. So it would take 112s just to read the weights.

liuliu · on Sept 26, 2020

I don't have access to see whether they have fine-tuning API. Do you have any links explain the said fine-tuning? It is certainly surprising given there is no fine-tuning experiment mentioned in the GPT-3 paper.

Weights loading is embarrassingly simple to parallelize. Just use madam with 3 or 4 NVMe SSD sticks are sufficiently enough. You are more likely bounded by PCIe bandwidth than the SSD bandwidth. Newer NVIDIA cards with PCIe-4 support helps.