I have recently been wondering how people deploy high-throughput models in produ...

baaastijn · on Feb 1, 2021

Hi there, I work at OVH (product manager data/IA) :)

In fact we do offer GPUs from multiple ways

- Dedicated servers (baremetal) : hidden from the website today, we are refreshing the hardware parts. New ones with GPUs are planned in few weeks. But you will need to buy servers with GPUs for at least 1 one. Good but no so flexible. Not good for AI Training budget imho espect if it's 24/7 trainings.

- Public cloud / VMs : like AWS/GCP/AZure we provide VMs with NVIDIA 1 to 4 x Tesla V100 16GB The good thing is flexibility (pay go hourly) The "bad thing" is about perf when dealing with large dataset, and it also require sysadmin tasks (ssh, dist upgrade, you know it). You'll also need tools such as kubeflow to allow a team to work with pipelines, orchestration and debugging tools

- (NEW) Public Cloud AI Training : it's basically GPU as a service. like paperspace for example. You start a "job" via UI/CLI/API with 1 to 4 GPUs NVIDIA Tesla V100s, plugged to provided images (notebooks + tensorflow, notebooks / pytorch, fastAI, HuggingFace, ...), or you own images (hello docker public/private registries).

The good thing is : you have full flexibility (you can launch as many jobs as you want), pay per minute, start in 15 seconds. No SSH required, no drivers to install, ... And one last good thing is the hidden part : specific "cache" storage near the GPUs to play with large datasets (several TBs) without bottlenecks such as latency. It's as good a local NVMe storage the bad thing is : nothing :)

I'm not here for hidden advertisement but fuck yeah we have something to propose for GPU :p To find the links or price, everything is public in our website, for a free Voucher just DM me ! Have a good day.

ramoz · on Feb 1, 2021

Advanced scheduling with K8s... inference requests distributed across a ton of spot/preemptible VMs. We’ve achieved best performance on CPU with MKL.

It’s often a misconception that you need GPU for inference. It many cases, the overhead of data transfer to GPU makes it much slower than a well tuned CPU.

manojlds · on Feb 1, 2021

And ONNX helps too - https://github.com/microsoft/onnxruntime

olavgg · on Feb 1, 2021

Intel has also recently announced AVX512 VNNI (Vector Neural Network Instructions), but it is only available for the Xeon CPU family. Not desktop. This should significantly improve inference performance on the cpu.

37ef_ced3 · on Feb 1, 2021

Even AVX512F (Foundation) is powerful: https://NN-512.com

(It depends on the CPU, how many cores, and the structure of the neural net, but MKL will generally only achieve between 80% and 90% of NN-512's AVX-512 performance)

DelightOne · on Feb 1, 2021

How do you tune a CPU for inference?

37ef_ced3 · on Feb 1, 2021

You can generate custom SIMD inference code for your neural net (tensor shapes, etc.) and CPU cache hierarchy (L1, L2 per thread, etc.)

For example, here is ResNet50 for a particular Skylake-X CPU:

https://nn-512.com/browse/ResNet50

DelightOne · on Feb 1, 2021

I didn't know this exists, thank you!

antoinealb · on Feb 1, 2021

You usually tune the software that runs on it (compiler, libraries, build options etc.). You can also play the game of overclock if you run dedicated hardware.

DelightOne · on Feb 1, 2021

I wondered what those options are. The other comment mention SIMD is interesting. Thanks :)

sabalaba · on Feb 1, 2021

You should check out Lambda GPU cloud. It's around 40% of the price of AWS/GCP/Azure. The default instance image uses the Lambda Stack repository which is designed for Deep Learning (PyTorch/TensorFlow/Caffe/Drivers pre-installed) saving you time upfront.

If you're still using EC2 you should just switch to us, your monthly bill will thank you.

https://lambdalabs.com/service/gpu-cloud

htahir111 · on Feb 1, 2021

OP here. Probably some form of container orchestration system like Kubernetes to manage the deployments + resources. There is also specialized ML tooling like Kubeflow to help out. Its still a developing landscape though.

We made BudgetML not to replace the above but to make it easy for scenarios where the need would be to just "get out there and deploy". Do you see value in that too?

nemoniac · on Feb 1, 2021

OVH does but they don't put it front and centre. It's as an optional add-on when you're configuring your server.

AlchemistCamp · on Feb 1, 2021

What kind of fraud wrt coin mining does a GPU make possible?

Havoc · on Feb 1, 2021

Card fraud. Stolen card mine Bitcoin and run like hell. Can’t reverse the crypto so doesn’t matter if card gets detected a day later

heipei · on Feb 1, 2021

Thanks everyone for the great recommendations!