Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Worth it to buy 4x Nvidia Tesla K40 for AI?
37 points by speedylight on Feb 26, 2023 | hide | past | favorite | 47 comments
I saw a post on a local market place that’s selling a complete system with 4 Tesla K40s 12 GBs VRAM w/ passive cooling for $400. The post description said that the system was intended to be used for training AI models, which is what I want to use it for… nothing too serious I am mostly still learning here. The cards themselves were released on 2013 and would have a combined cuda cores of 12,928 if I’m counting the 5th video card for a monitor (GTX 1660)

Here are the complete specs from the post description… from a dollar value of all these parts, I’m not really losing any money… I just don’t have good enough intuition to see if this system is worth it to learn practice modern day AI.

Specs:

Motherboard: MSI MAG Z390 Tomahawk gaming 9th generation with dual Ethernet ports for wiring with other servers, and max speed 4400 MHz in overclock mode. CPU: Intel Core i5-9400f @4.10 GHz x 6 cores (overclock mode). RAM: 64 GB (4x16) DDR4 max speed 3600 MHz. Storage: One m.2 NVMe SSD 256 GB (for operating system) + Two 3 TB Hard Disk Drive (for data storage) Gaming Display Support: 1 GTX 1660 Super graphic card with 6 GB memory and 1,408 cuda cores, supporting max 3 monitors at the same time. Bus max transfer speed 8.0 GB/s (gen3 mode). AI Deep Learning: 4 Tesla K40 AI accelerators each with 12 GB memory and 2,880 cuda cores, dedicating to machine or deep learning, Bus max transfer speed 8.0 GB/s (gen3 mode) each. Power supply safety: One 700 W PSU dedicated to the motherboard and the GTX 1660 monitor GPU. Another 1,000 W PSU dedicated to the Tesla K40 AI accelerators. CPU Cooling: Cooler Master liquid cooler with LED light control. AI Accelerator Cooling: 4 cooling fans at front and 3 cooling fans at back. Structure: Open frame of high strength Al alloy to safeguard your system in an intensive working environment. Power switch: Big button switch with 5 ft flexible extension cable, and LED indicator for hard drive.



The pytorch binaries from pip and conda won’t work on these GPUs, though there are some alternative binaries being maintained that still work: https://blog.nelsonliu.me/2020/10/13/newer-pytorch-binaries-...

The latest Nvidia driver no longer supports the K40, so you’ll have to use version 470 (or lower, officially Nvidia says 460, but 470 seems to work). That supports CUDA 11.4 natively. Newer versions of CUDA 11.x are supported: https://docs.nvidia.com/deploy/cuda-compatibility/index.html though CUDA 12 is not.

In my testing, a system with a single RTX3060 was faster in tensorflow than with 3 K40s and probably close to the performance of 4 k40s.

If you are considering other GPUs, there are some good benchmarks here (The RTX3060 is not there, though the GTX1080Ti was almost the same performance in the tensorflow test they run): https://lambdalabs.com/gpu-benchmarks

As others have said Google CoLab is free option you can use.


Multi-GPU training is a double-edged sword. If you are at the stage where you are running your code in a iPython notebook then you are almost certainly not going to benefit from the multiple GPUs, and I strongly suspect you'd be better with fewer and larger GPUs, even if training time is prolonged.

The reason I say that is, if we go with PyTorch, you basically have 2 options for multi-GPU training.

- DataParallel - where you clone your model over each GPU, but each one functions independently, and after each 'training step' they pool their data. This has downsides, in that you don't get to process intermediate layer outputs and synchronise your batch normalisation layers - so you can't use it to train 'big' models. It just makes your smaller models train more quickly. However, you can at least use these in a 'normal' training script.

- DistributedDataParallel - this is 'proper' multi-GPU training - you can now train big models and put a little bit of data on each GPU, and have then synchronise their results after each layer. However, this can be very annoying to use - each GPU runs in its own background process which is either spawned or forked (depending on Windows/Linux) and you therefore cannot run it in an iPython notebook, or an interactive Python console. It also makes tracking metrics etc. MUCH harder - because you need to reduce your metrics over each GPU process (because otherwise you get 4 accuracies, 4 mean squared errors etc. if you have 4 GPUs, and each process only sees one of them).

I personally prefer having 1 GPU with 24 Gb RAM over 3 GPU with 12 Gb RAM - because I can have a larger batch size on each GPU, which is VERY VERY advantageous in large models where you can only have small batch sizes, and batch normalisation starts falling down. I'd rather wait 2x as long for a 'better' model to train.


DistributedDataParallel (potentially) does both model and data parallelism. Data parallelism is also absolutely used when training large models, it has its downsides, but I don't think there's any way around it if you're training with a large amount of gpus.


How does DDP do model parallel?


I phrased that wrongly, DDP itself doesn't of course. I meant that using it in the way GP does is also doing model parallelism.


I don’t see any mention of model parallel in GP post. How could you possibly use DDP to enable it?


GP contrasts DP and DDP by saying that DP is "where you clone your model over each GPU" and DDP is "'proper' multi-GPU training - you can now train big models and put a little bit of data on each GPU". That's simply not what DP or DDP is. What could this possibly mean if it's not misunderstanding DP as data parallelism and DDP as model parallelism? I'm fairly certain that what they're describing is using DDP (which only does data parallelism) in addition to model parallelism.


While it's technically not a bad price in terms of raw compute and memory, the noise from the high speed server fans alone is reason enough not to get this. Never mind the power usage.

I was going to say maybe it'd be worth it if they were 24GB GPUs, but I'm not sure you can even use recent Pytorch with cards that old. You'd have to work around the limitations.

You don't even need a GPU to learn anyway, you can use tiny models that train super fast even on CPU for that. You need the beefy GPUs once you want to generate tons of content or train a modern model on big datasets.

Get a 3060 12GB or a 2080Ti 11GB and call it day, at most.


Had a poweredge in my apartment in college, can confirm they sound like a leaf blower.


Ha, isn't that what basements are for these days?


I live in Vancouver. Round here basements are for renting out.


That is where my rack is!


Well thank you all for the suggestions, I think I mostly agree it’s not a good idea. From power cost/consumption, to lack of modern CUDA, PyTorch support as well as the complexity with parallel compute really put a dent in the perceived value.

I may have to just save up and get a more capable card with more VRAM. I still want to learn how to do parallel compute but I realize I could just do that at any other time and doesn’t have to be hardware that I necessarily own (rent out a cloud server) even though that would be really nice.


Unless you know what you're doing, choose simplicity, relatively new hardware, and low power consumption, over good price.

The only thing that sounds exceptional about this system is the 4x12 GB GPU memory. Is that worth it over the inability to use modern CUDA? I don't know much about ML, but I doubt it. People tend to move very quickly in this field (and in others TBH), not caring much about supporting old hardware.


For just 5x more cash, you can get two new 3090's... 20k cuda cores, and much more modern, so should be much faster than the core count suggests.

Worth considering just building two machines, as well; the ability to train multiple models in parallel is in many cases more valuable than the ability to train one big one.


> For just 5x more cash

I don’t feel the words “just” and “5x more cash” really fit in the same sentence!

Not questioning 3090s being more performant, but that’s a whole lot more money before even thinking about buying the supporting components.


Remember that the K40s do not support the latest PyTorch operations due to their old compute capabilities (e.g., torch.compile).


If you want to learn about which GPUs make sense to buy or rent for various ML workload, i highly recommend this Tim Dettmers article:

https://timdettmers.com/2023/01/30/which-gpu-for-deep-learni...


That K40 setup requires a 1 kW power supply. With energy prices up, that can get pretty expensive in the long run.

I suspect a newer Nvidia chip manufactured on a more efficient semiconductor process will deliver the same performance for a fraction of the power consumption.

This is often a problem with old hardware. The power efficiency gains from improvements between chip process nodes are so fundamental that it’s hard for older chips to compete in total cost of ownership.


Depends on where you live. In Quebec, Canada I'm paying $0.064 to $0.098/kWh


I bought that system! Hopefully didn't snag it from you. Admittedly don't have a ton of hair in the game yet, but excited to try to get parallel gpus working. Hoping to be able to use it as a render node too and maybe retrain some diffusion models with images of myself. Please link helpful resources if you're able to ;) hehe. You can call me dumb too. I will be mastering in geography starting in the fall, so i'm hoping it will be nice for use with qgis and other gis related projects, rasters are fairly similar to images after all. Again, naive but hey only 500$ in the game after all, would rather take that step than buying a 1k$+ card I might just end up using for minecraft. The fellow said he had to compile cuda from source which is something I am not looking forward to, though I'd like to look into virtualization options for distributed computing... might be something there. Wish me luck! Time to assemble the home lab.


For some workloads, it’s almost all about the VRAM. In those cases I’ve been wondering if getting a high memory M1 or M2 Mac could be a good lab machine thanks to unified memory. It’ll run more quietly, use significantly less power, no worries about overloading your electric circuit. On a 128 GB RAM Mac Studio you could theoretically run or even train models that otherwise would require multiple $6k A6000 GPUs in custom machine builds taking oodles of power at the plug. It’d be slow but slow beats not possible. And if you need a new development machine anyhow, you can justify some of that beefy Mac Studio’s cost as part of your required spend anyhow. PyTorch has supported “mps” as a target device for some time now.


How much online AI compute can you get for $400.

Should be enough for most smaller projects yeah?


Around a month and a half of a VM with T4 on GCP. Depends on what you need.


Lambda labs has A100 for like $1 per hour


If you’re just learning, I’d actually look at using a cloud based provider where you can just spin up a gpu vps for the duration of each experiment.


I 100% agree, Google Colab Pro (or whatever they call it now) is a decent first step, for minimal money.


From the specs it’s a good value - 1660 alone is ~$150 used.

You probably won’t be able to use latest versions of pytorch because of k40 cuda support, but that’s okay.

Make sure you load test it for at least 15-20 min to see how high are the gpu temperatures before parting with your money. Do not buy if you can’t test it - an old system like this can have all sorts of hw problems.


This is essentially a GTX 970. Not worth it IMO.


Those who do not recommend, do you have any suggestions for people just getting into the field?


For deep learning/NNs, Google Colab/Kaggle notebooks are where most people start. There are lots of cloud providers that are harder to set up on but might be cheaper/faster for a particular project, but while Colab is limiting in some ways it's dead simple to use so you can focus on learning data science.

There's a lot of talk about large models atm, but when you're learning you should probably stick to smaller models because the training time drastically reduces how fast you can iterate, and most jobs aren't NLP based anyway.

Remember also that 99% of problems classical models are faster and just as good or even better than neural nets. Of course if you're interested in neural nets as a hobby, go for it, they're truly fascinating and you can do some amazing things just in Colab, but they still aren't the best option for most real world problems.


Yeah currently it’s mostly my hobby and interests, not directly for my work…


If you’re just getting into the field, by far the easiest thing will be to start out with free Google Colab notebooks until you hit their free tier limits. Kaggle also has free GPUs to play with. When you’re starting out your bottleneck will likely not be your hardware but the learning curve of the content.

After that, use a managed service with GPUs. They will keep all of the hardware working, you just pay a monthly fee. I have no interest in diverting mental energy from learning ML to how to build a computer so that’s what I do. I use Paperspace but it has some very annoying issues so I can’t fully endorse it.


Thank you. I also saw Colab and Paperspace recommended elsewhere.


    https://coral.ai/
Or, search

    nvidia ai starter kit


Coral is Tensorflow Lite only, though.


For Stable Diffusion inference I recommend an nvidia 3060. If you want to train your own stable diffusion models I recommend a 3090. If you want to work with LLMs I recommend 2x3090.

I also recommend at least 64 GB of RAM, and at least 2 TB of storage. It seems like overkill at first but man do the GBs disappear like magic.


modern AI models and frameworks often require cuda versions and features which are not available on these old GPUs. Unless you know what you are doing I would not buy them.


It doesn't have any tensor cores, which means it will run a lot slower for many machine learning tasks, especially the deep learning stuff.


simple answer no

Buy the most advanced single card out there with largest VRAM.


[flagged]


Interestingly, that's a ChatGPT answer.


It’s actually a decent answer. The only one so far that made a valid point about noise.


It’s literally the only answer recommending to buy it, so I don’t see how that’s “decent”


This was written by ChatGPT, right?


Sure enough, looking at your account, you’re churning out do-nothing multi paragraph long responses in just a minute or two. Chill out.


For sure


yes it is




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: