Not really (I work on AI/ML Infrastructure at a well known tech company and talk...

gopher_space · on May 7, 2024

A cloud solution I looked at a few years ago could be replicated (poorly) in your browser today. In my mind the question has become one of determining when my model is useful enough to detach from the cloud, not whether that should happen.

throwitaway222 · on May 7, 2024

Inference on the edge is a lot like JS - just drop a crap ton of data to the front end, and let it render.

ethbr1 · on May 7, 2024

Power for power, any thoughts on what mobile inference looks like vs doing it in the cloud?

alfalfasprout · on May 8, 2024

Mobile can be more efficient. But you're making big tradeoffs. You are very limited in what you can actually run on-device. And ultimately you're also screwing over your user's battery life, etc.

teaearlgraycold · on May 7, 2024

Pytorch actually has surprisingly good support for Apple Silicon. Occasionally an operation needs to use CPU fallback but many applications are able to run inference entirely off of the CPU cores.

ein0p · on May 7, 2024

I’ve found it to be pretty terrible compared to CUDA, especially with Huggingface transformers. There’s no technical reason why it has to be terrible there though. Apple should fix that.

teaearlgraycold · on May 7, 2024

Yeah. It’s good with YOLO and Dino though. My M2 Max can compute Dino embeddings faster than a T4 (which is the GPU in AWS’s g4dn instance type).

ein0p · on May 7, 2024

MLX will probably be even faster than that, if the model is already ported. Faster startup time too. That’s my main pet peeve though: there’s no technical reason why PyTorch couldn’t be just as good. It’s just underfunding and neglect

whimsicalism · on May 8, 2024

t4's are like 6 years old

rcarmo · on May 7, 2024

And there is a lot of work being done with mlx.