Not really (I work on AI/ML Infrastructure at a well known tech company and talk regularly w/ our peer companies).
That said, inference on apple products is a different story. There's definitely interest in inference on the edge. So far though, nearly everyone is still opting for inference in the cloud for two reasons:
1. There's a lot of extra work involved in getting ML/AI models ready for mobile inference. And this work is different for iOS vs. Android
2. You're limited on which exact device models will run the thing optimally. Most of your customers won't necessarily have that. So you need some kind of fallback.
3. You're limited on what kind of models you can actually run. You have way more flexibility running inference in the cloud.
A cloud solution I looked at a few years ago could be replicated (poorly) in your browser today. In my mind the question has become one of determining when my model is useful enough to detach from the cloud, not whether that should happen.
Mobile can be more efficient. But you're making big tradeoffs. You are very limited in what you can actually run on-device. And ultimately you're also screwing over your user's battery life, etc.
Pytorch actually has surprisingly good support for Apple Silicon. Occasionally an operation needs to use CPU fallback but many applications are able to run inference entirely off of the CPU cores.
I’ve found it to be pretty terrible compared to CUDA, especially with Huggingface transformers. There’s no technical reason why it has to be terrible there though. Apple should fix that.
MLX will probably be even faster than that, if the model is already ported. Faster startup time too. That’s my main pet peeve though: there’s no technical reason why PyTorch couldn’t be just as good. It’s just underfunding and neglect
That said, inference on apple products is a different story. There's definitely interest in inference on the edge. So far though, nearly everyone is still opting for inference in the cloud for two reasons:
1. There's a lot of extra work involved in getting ML/AI models ready for mobile inference. And this work is different for iOS vs. Android 2. You're limited on which exact device models will run the thing optimally. Most of your customers won't necessarily have that. So you need some kind of fallback. 3. You're limited on what kind of models you can actually run. You have way more flexibility running inference in the cloud.