How does this compare to https://lambdalabs.com/ ?

flaque · on July 30, 2023

Ah, we're running a medium amount of compute at zero-margin. The point is not to go sell the Fortune 500, but to make sure a grad student can spend a $50k grant.

Right now, it's pretty easy to get a few A/H100s (Lambda is great for this), but very hard to get more than 24 at a reasonable price ($~2 an hour). One often needs to put up a 6+ month commitment, even when they may only want to run their H100s for an 8 hour training run.

It's the right business decision for GPU brokers to do long term reservations and so on, and we might do so too if we were in their shoes. But we're not in their shoes and have a very different goal: arm the rebels! Let someone who isn't BigCorp train a model!

trostaft · on July 30, 2023

> but to make sure a grad student can spend a $50k grant.

As a graduate student, thank you. Thankfully, my workloads aren't LLM crazy so I can get by on my old NVIDIA consumer hardware, but I have coworkers struggling to get reasonable prices/time for larger scale hardware.

narrator · on July 30, 2023

So what happens when some big bucks VC backed closed source LLM company buys all your compute inventory for the next 5 years? This is not that unlikely. Lambda Labs a little while back was completely sold out of all compute inventory.

xeromal · on July 30, 2023

I assume it's up to them to say no. They did say they're not in it to make bookoo bucks

agajews · on July 30, 2023

Yeah we aren’t going to let anyone book the whole thing for years. If we ever have to make the choice, we’ll choose the startups over the big companies.

flaque · on July 30, 2023

Yeah, if someone doesn't care about the cost and wants to buy whole cluster, they might be better off using an existing provider.

stavros · on July 31, 2023

I must say, this is the worst I've seen "beaucoup" spelled.

xeromal · on July 31, 2023

Hahahaha. I can honestly say no on has told me in my entire uneducated life that "bookoo" was a real word. I appreciate the lesson.

finnh · on July 31, 2023

You just might enjoy Jumbo, a track off Underworld's "Beaucoup Fish" album

https://open.spotify.com/track/3VIMS1p3sNifH0RQnmDf7s

stavros · on Aug 1, 2023

Haha, you're bookoo welcome!

gnopgnip · on July 31, 2023

Presumably them buy more gpus

ec109685 · on July 30, 2023

How can you allow people to get big chunks of GPU’s without a lot of expensive slack in the system?

lulunananaluna · on July 30, 2023

This is great. Thank you very much for your work.

wongarsu · on July 30, 2023

Very similar price, but from what I gather very different model. One important difference might be if you regularly run short-ish training runs over many GPUs. Lambdalabs might not have 256 instances to give you right now. With OP you are basically buying the right to put jobs in the job queue for their 512 GPU cluster, so running a job that needs 256 GPUs isn't an issue (though you might wait behind someone running a 512 GPU job).

No idea how capacity at lambdalabs actually looks like though. Does anyone have insight how easy it is to spin up more than 2-3 instances up there?

agajews · on July 30, 2023

Yeah it’s pretty hard to find a big block of GPUs that you can use for a short time, esp if you need infiniband for multinode training. Lambda I think needs a min reservation of 6-12 months if you want IB.

jorlow · on July 30, 2023

You can usually only get a few h100s at a time unless you're committed to reserved instances (for a longer time period)

ivalm · on July 30, 2023

No real way to get a big block without commitment. Iirc smallest h100 commitment is 64gpus for 3 years (about $3M usd).

theptip · on July 30, 2023

My question too. At $2/hr for H100 that seems more flexible? But I haven’t tried to get 10k GPU-hours on any of these services, maybe that is where the bottleneck is.