Ask HN: Worth it to buy 4x Nvidia Tesla K40 for AI?

rythie · on Feb 26, 2023

The pytorch binaries from pip and conda won’t work on these GPUs, though there are some alternative binaries being maintained that still work: https://blog.nelsonliu.me/2020/10/13/newer-pytorch-binaries-...

The latest Nvidia driver no longer supports the K40, so you’ll have to use version 470 (or lower, officially Nvidia says 460, but 470 seems to work). That supports CUDA 11.4 natively. Newer versions of CUDA 11.x are supported: https://docs.nvidia.com/deploy/cuda-compatibility/index.html though CUDA 12 is not.

In my testing, a system with a single RTX3060 was faster in tensorflow than with 3 K40s and probably close to the performance of 4 k40s.

If you are considering other GPUs, there are some good benchmarks here (The RTX3060 is not there, though the GTX1080Ti was almost the same performance in the tensorflow test they run): https://lambdalabs.com/gpu-benchmarks

As others have said Google CoLab is free option you can use.

jphoward · on Feb 26, 2023

Multi-GPU training is a double-edged sword. If you are at the stage where you are running your code in a iPython notebook then you are almost certainly not going to benefit from the multiple GPUs, and I strongly suspect you'd be better with fewer and larger GPUs, even if training time is prolonged.

The reason I say that is, if we go with PyTorch, you basically have 2 options for multi-GPU training.

- DataParallel - where you clone your model over each GPU, but each one functions independently, and after each 'training step' they pool their data. This has downsides, in that you don't get to process intermediate layer outputs and synchronise your batch normalisation layers - so you can't use it to train 'big' models. It just makes your smaller models train more quickly. However, you can at least use these in a 'normal' training script.

- DistributedDataParallel - this is 'proper' multi-GPU training - you can now train big models and put a little bit of data on each GPU, and have then synchronise their results after each layer. However, this can be very annoying to use - each GPU runs in its own background process which is either spawned or forked (depending on Windows/Linux) and you therefore cannot run it in an iPython notebook, or an interactive Python console. It also makes tracking metrics etc. MUCH harder - because you need to reduce your metrics over each GPU process (because otherwise you get 4 accuracies, 4 mean squared errors etc. if you have 4 GPUs, and each process only sees one of them).

I personally prefer having 1 GPU with 24 Gb RAM over 3 GPU with 12 Gb RAM - because I can have a larger batch size on each GPU, which is VERY VERY advantageous in large models where you can only have small batch sizes, and batch normalisation starts falling down. I'd rather wait 2x as long for a 'better' model to train.

ImprobableTruth · on Feb 26, 2023

DistributedDataParallel (potentially) does both model and data parallelism. Data parallelism is also absolutely used when training large models, it has its downsides, but I don't think there's any way around it if you're training with a large amount of gpus.

p1esk · on Feb 27, 2023

How does DDP do model parallel?

ImprobableTruth · on Feb 27, 2023

I phrased that wrongly, DDP itself doesn't of course. I meant that using it in the way GP does is also doing model parallelism.

p1esk · on Feb 27, 2023

I don’t see any mention of model parallel in GP post. How could you possibly use DDP to enable it?

ImprobableTruth · on March 1, 2023

GP contrasts DP and DDP by saying that DP is "where you clone your model over each GPU" and DDP is "'proper' multi-GPU training - you can now train big models and put a little bit of data on each GPU". That's simply not what DP or DDP is. What could this possibly mean if it's not misunderstanding DP as data parallelism and DDP as model parallelism? I'm fairly certain that what they're describing is using DDP (which only does data parallelism) in addition to model parallelism.

JonathanFly · on Feb 26, 2023

While it's technically not a bad price in terms of raw compute and memory, the noise from the high speed server fans alone is reason enough not to get this. Never mind the power usage.

I was going to say maybe it'd be worth it if they were 24GB GPUs, but I'm not sure you can even use recent Pytorch with cards that old. You'd have to work around the limitations.

You don't even need a GPU to learn anyway, you can use tiny models that train super fast even on CPU for that. You need the beefy GPUs once you want to generate tons of content or train a modern model on big datasets.

Get a 3060 12GB or a 2080Ti 11GB and call it day, at most.

aosmith · on Feb 26, 2023

Had a poweredge in my apartment in college, can confirm they sound like a leaf blower.

ckdarby · on Feb 26, 2023

Ha, isn't that what basements are for these days?

woleium · on Feb 27, 2023

I live in Vancouver. Round here basements are for renting out.

sekh60 · on Feb 26, 2023

That is where my rack is!

speedylight · on Feb 26, 2023

Well thank you all for the suggestions, I think I mostly agree it’s not a good idea. From power cost/consumption, to lack of modern CUDA, PyTorch support as well as the complexity with parallel compute really put a dent in the perceived value.

I may have to just save up and get a more capable card with more VRAM. I still want to learn how to do parallel compute but I realize I could just do that at any other time and doesn’t have to be hardware that I necessarily own (rent out a cloud server) even though that would be really nice.

unsigner · on Feb 26, 2023

Unless you know what you're doing, choose simplicity, relatively new hardware, and low power consumption, over good price.

The only thing that sounds exceptional about this system is the 4x12 GB GPU memory. Is that worth it over the inability to use modern CUDA? I don't know much about ML, but I doubt it. People tend to move very quickly in this field (and in others TBH), not caring much about supporting old hardware.

sdenton4 · on Feb 26, 2023

For just 5x more cash, you can get two new 3090's... 20k cuda cores, and much more modern, so should be much faster than the core count suggests.

Worth considering just building two machines, as well; the ability to train multiple models in parallel is in many cases more valuable than the ability to train one big one.

JosephRedfern · on Feb 26, 2023

> For just 5x more cash

I don’t feel the words “just” and “5x more cash” really fit in the same sentence!

Not questioning 3090s being more performant, but that’s a whole lot more money before even thinking about buying the supporting components.

max0r_ · on Feb 26, 2023

Remember that the K40s do not support the latest PyTorch operations due to their old compute capabilities (e.g., torch.compile).

MasterScrat · on Feb 26, 2023

If you want to learn about which GPUs make sense to buy or rent for various ML workload, i highly recommend this Tim Dettmers article:

https://timdettmers.com/2023/01/30/which-gpu-for-deep-learni...

pavlov · on Feb 26, 2023

That K40 setup requires a 1 kW power supply. With energy prices up, that can get pretty expensive in the long run.

I suspect a newer Nvidia chip manufactured on a more efficient semiconductor process will deliver the same performance for a fraction of the power consumption.

This is often a problem with old hardware. The power efficiency gains from improvements between chip process nodes are so fundamental that it’s hard for older chips to compete in total cost of ownership.

ckdarby · on Feb 26, 2023

Depends on where you live. In Quebec, Canada I'm paying $0.064 to $0.098/kWh

psychoguineapig · on Feb 27, 2023

I bought that system! Hopefully didn't snag it from you. Admittedly don't have a ton of hair in the game yet, but excited to try to get parallel gpus working. Hoping to be able to use it as a render node too and maybe retrain some diffusion models with images of myself. Please link helpful resources if you're able to ;) hehe. You can call me dumb too. I will be mastering in geography starting in the fall, so i'm hoping it will be nice for use with qgis and other gis related projects, rasters are fairly similar to images after all. Again, naive but hey only 500$ in the game after all, would rather take that step than buying a 1k$+ card I might just end up using for minecraft. The fellow said he had to compile cuda from source which is something I am not looking forward to, though I'd like to look into virtualization options for distributed computing... might be something there. Wish me luck! Time to assemble the home lab.

aljungberg · on Feb 26, 2023

For some workloads, it’s almost all about the VRAM. In those cases I’ve been wondering if getting a high memory M1 or M2 Mac could be a good lab machine thanks to unified memory. It’ll run more quietly, use significantly less power, no worries about overloading your electric circuit. On a 128 GB RAM Mac Studio you could theoretically run or even train models that otherwise would require multiple $6k A6000 GPUs in custom machine builds taking oodles of power at the plug. It’d be slow but slow beats not possible. And if you need a new development machine anyhow, you can justify some of that beefy Mac Studio’s cost as part of your required spend anyhow. PyTorch has supported “mps” as a target device for some time now.

langsoul-com · on Feb 26, 2023

How much online AI compute can you get for $400.

Should be enough for most smaller projects yeah?

ashirviskas · on Feb 26, 2023

Around a month and a half of a VM with T4 on GCP. Depends on what you need.

Rastonbury · on Feb 26, 2023

Lambda labs has A100 for like $1 per hour

dan1234 · on Feb 26, 2023

If you’re just learning, I’d actually look at using a cloud based provider where you can just spin up a gpu vps for the duration of each experiment.

phyalow · on Feb 26, 2023

I 100% agree, Google Colab Pro (or whatever they call it now) is a decent first step, for minimal money.

p1esk · on Feb 26, 2023

From the specs it’s a good value - 1660 alone is ~$150 used.

You probably won’t be able to use latest versions of pytorch because of k40 cuda support, but that’s okay.

Make sure you load test it for at least 15-20 min to see how high are the gpu temperatures before parting with your money. Do not buy if you can’t test it - an old system like this can have all sorts of hw problems.

thwoeruoi34234 · on Feb 26, 2023

This is essentially a GTX 970. Not worth it IMO.

hedgehog0 · on Feb 26, 2023

Those who do not recommend, do you have any suggestions for people just getting into the field?

roxgib · on Feb 26, 2023

For deep learning/NNs, Google Colab/Kaggle notebooks are where most people start. There are lots of cloud providers that are harder to set up on but might be cheaper/faster for a particular project, but while Colab is limiting in some ways it's dead simple to use so you can focus on learning data science.

There's a lot of talk about large models atm, but when you're learning you should probably stick to smaller models because the training time drastically reduces how fast you can iterate, and most jobs aren't NLP based anyway.

Remember also that 99% of problems classical models are faster and just as good or even better than neural nets. Of course if you're interested in neural nets as a hobby, go for it, they're truly fascinating and you can do some amazing things just in Colab, but they still aren't the best option for most real world problems.

hedgehog0 · on Feb 26, 2023

Yeah currently it’s mostly my hobby and interests, not directly for my work…

janalsncm · on Feb 26, 2023

If you’re just getting into the field, by far the easiest thing will be to start out with free Google Colab notebooks until you hit their free tier limits. Kaggle also has free GPUs to play with. When you’re starting out your bottleneck will likely not be your hardware but the learning curve of the content.

After that, use a managed service with GPUs. They will keep all of the hardware working, you just pay a monthly fee. I have no interest in diverting mental energy from learning ML to how to build a computer so that’s what I do. I use Paperspace but it has some very annoying issues so I can’t fully endorse it.

hedgehog0 · on Feb 26, 2023

Thank you. I also saw Colab and Paperspace recommended elsewhere.

abudabi123 · on Feb 26, 2023

    https://coral.ai/

Or, search

    nvidia ai starter kit

ofrzeta · on Feb 26, 2023

Coral is Tensorflow Lite only, though.

Taek · on Feb 26, 2023

For Stable Diffusion inference I recommend an nvidia 3060. If you want to train your own stable diffusion models I recommend a 3090. If you want to work with LLMs I recommend 2x3090.

I also recommend at least 64 GB of RAM, and at least 2 TB of storage. It seems like overkill at first but man do the GBs disappear like magic.

0x008 · on Feb 26, 2023

modern AI models and frameworks often require cuda versions and features which are not available on these old GPUs. Unless you know what you are doing I would not buy them.

Taek · on Feb 26, 2023

It doesn't have any tensor cores, which means it will run a lot slower for many machine learning tasks, especially the deep learning stuff.

karmasimida · on Feb 26, 2023

simple answer no

Buy the most advanced single card out there with largest VRAM.

emanuele-em · on Feb 26, 2023

[flagged]

netRebel · on Feb 26, 2023

Interestingly, that's a ChatGPT answer.

p1esk · on Feb 26, 2023

It’s actually a decent answer. The only one so far that made a valid point about noise.

Lewton · on Feb 26, 2023

It’s literally the only answer recommending to buy it, so I don’t see how that’s “decent”

KyeRussell · on Feb 26, 2023

This was written by ChatGPT, right?

KyeRussell · on Feb 26, 2023

Sure enough, looking at your account, you’re churning out do-nothing multi paragraph long responses in just a minute or two. Chill out.

toddmorey · on Feb 26, 2023

For sure

rvz · on Feb 26, 2023

yes it is