Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not really (I work on AI/ML Infrastructure at a well known tech company and talk regularly w/ our peer companies).

That said, inference on apple products is a different story. There's definitely interest in inference on the edge. So far though, nearly everyone is still opting for inference in the cloud for two reasons:

1. There's a lot of extra work involved in getting ML/AI models ready for mobile inference. And this work is different for iOS vs. Android 2. You're limited on which exact device models will run the thing optimally. Most of your customers won't necessarily have that. So you need some kind of fallback. 3. You're limited on what kind of models you can actually run. You have way more flexibility running inference in the cloud.



A cloud solution I looked at a few years ago could be replicated (poorly) in your browser today. In my mind the question has become one of determining when my model is useful enough to detach from the cloud, not whether that should happen.


Inference on the edge is a lot like JS - just drop a crap ton of data to the front end, and let it render.


Power for power, any thoughts on what mobile inference looks like vs doing it in the cloud?


Mobile can be more efficient. But you're making big tradeoffs. You are very limited in what you can actually run on-device. And ultimately you're also screwing over your user's battery life, etc.


Pytorch actually has surprisingly good support for Apple Silicon. Occasionally an operation needs to use CPU fallback but many applications are able to run inference entirely off of the CPU cores.


I’ve found it to be pretty terrible compared to CUDA, especially with Huggingface transformers. There’s no technical reason why it has to be terrible there though. Apple should fix that.


Yeah. It’s good with YOLO and Dino though. My M2 Max can compute Dino embeddings faster than a T4 (which is the GPU in AWS’s g4dn instance type).


MLX will probably be even faster than that, if the model is already ported. Faster startup time too. That’s my main pet peeve though: there’s no technical reason why PyTorch couldn’t be just as good. It’s just underfunding and neglect


t4's are like 6 years old


And there is a lot of work being done with mlx.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: