I have recently been wondering how people deploy high-throughput models in production which require a GPU to really hit their intended performance. My favorite hoster (Hetzner) doesn't offer GPU dedicated servers anymore, OVH doesn't seem to either, in general the problem seems to be around fraud wrt to coinmining etc. So what do people use instead? Dedicated EC2 instances still seem expensive.
Hi there, I work at OVH (product manager data/IA) :)
In fact we do offer GPUs from multiple ways
- Dedicated servers (baremetal) : hidden from the website today, we are refreshing the hardware parts. New ones with GPUs are planned in few weeks.
But you will need to buy servers with GPUs for at least 1 one. Good but no so flexible. Not good for AI Training budget imho espect if it's 24/7 trainings.
- Public cloud / VMs : like AWS/GCP/AZure we provide VMs with NVIDIA 1 to 4 x Tesla V100 16GB
The good thing is flexibility (pay go hourly)
The "bad thing" is about perf when dealing with large dataset, and it also require sysadmin tasks (ssh, dist upgrade, you know it). You'll also need tools such as kubeflow to allow a team to work with pipelines, orchestration and debugging tools
- (NEW) Public Cloud AI Training : it's basically GPU as a service. like paperspace for example.
You start a "job" via UI/CLI/API with 1 to 4 GPUs NVIDIA Tesla V100s, plugged to provided images (notebooks + tensorflow, notebooks / pytorch, fastAI, HuggingFace, ...), or you own images (hello docker public/private registries).
The good thing is : you have full flexibility (you can launch as many jobs as you want), pay per minute, start in 15 seconds.
No SSH required, no drivers to install, ...
And one last good thing is the hidden part : specific "cache" storage near the GPUs to play with large datasets (several TBs) without bottlenecks such as latency. It's as good a local NVMe storage
the bad thing is : nothing :)
I'm not here for hidden advertisement but fuck yeah we have something to propose for GPU :p
To find the links or price, everything is public in our website, for a free Voucher just DM me !
Have a good day.
Advanced scheduling with K8s... inference requests distributed across a ton of spot/preemptible VMs. We’ve achieved best performance on CPU with MKL.
It’s often a misconception that you need GPU for inference. It many cases, the overhead of data transfer to GPU makes it much slower than a well tuned CPU.
Intel has also recently announced AVX512 VNNI (Vector Neural Network Instructions), but it is only available for the Xeon CPU family. Not desktop. This should significantly improve inference performance on the cpu.
(It depends on the CPU, how many cores, and the structure of the neural net, but MKL will generally only achieve between 80% and 90% of NN-512's AVX-512 performance)
You usually tune the software that runs on it (compiler, libraries, build options etc.). You can also play the game of overclock if you run dedicated hardware.
You should check out Lambda GPU cloud. It's around 40% of the price of AWS/GCP/Azure. The default instance image uses the Lambda Stack repository which is designed for Deep Learning (PyTorch/TensorFlow/Caffe/Drivers pre-installed) saving you time upfront.
If you're still using EC2 you should just switch to us, your monthly bill will thank you.
OP here. Probably some form of container orchestration system like Kubernetes to manage the deployments + resources. There is also specialized ML tooling like Kubeflow to help out. Its still a developing landscape though.
We made BudgetML not to replace the above but to make it easy for scenarios where the need would be to just "get out there and deploy". Do you see value in that too?