I think the number of people interested in running ML models locally might be greatly overestimated [here]. There is no killer app in sight that needs to run locally. People work and store their stuff in the cloud. Most people just want a lightweight laptop, and AI workloads would drain the battery and cook your eggs in a matter of minutes, assuming you can run them. Production quality models are pretty much cloud only, and I don’t think open source models, especially ones viable for local inference will close the gap anytime soon. I’d like all of those things to be different, but I think that’s just the way things are.
Of course there are enthusiasts, but I suspect that they prefer and will continue to prefer dedicated inference hardware.
I have some difficulty with estimating how heavy Recall’s workload is, but either way, I have little faith in Microsoft’s ability to implement this feature efficiently. They struggle with much simpler features, such as search. I wouldn’t be surprised if a lot of people disable the feature to save battery life and improve system performance.
Huh? All the files are local and models are gonna be made to have a lot of or ultimately all of them in the "context window". You can't really have an AI for your local documents on the cloud because the cloud doesn't have access. Same logic for businesses. The use case follows from data availability and barriers.
We've observed the same on web pages where more and more functionality gets pushed to the frontend. One could push the last (x) layers of the neural net for example, to the frontend, for lower expense and if rightly engineered, better speed and scalability.
AIs will be local, super-AIs still in the cloud.
Local AIs will be proprietary and they will have strings to the mothership
The strings will have business value both related to the consumer and for further AI training.
From what I’ve seen, most people tend to use cloud document platforms. Microsoft has made Office into one. This has been the steady direction for the last few years; they’ll keep pushing for it, because it gives them control. Native apps with local files is an inconvenient model for them. This sadly applies to most other types of apps. On many of these cloud platforms, you can’t even download the project files.
> You can't really have an AI for your local documents on the cloud because the cloud doesn't have access
Yes, up to the cloud it all goes. They don’t mind, they can charge you for it. Microsoft literally openly wants to move the whole OS to the cloud.
> Same logic for businesses
Businesses hate local files. They’re a huge liability. When firing people it’s very convenient that you can just cut someone off by blocking their cloud credentials.
> We've observed the same on web pages where more and more functionality gets pushed to the frontend
It will never go all the way.
> One could push the last (x) layers of the neural net for example, to the frontend, for lower expense and if rightly engineered, better speed and scalability
I’ll believe it when I see it. I don’t think the incentive is there. Sounds like a huge complicating factor. It’s much simpler to keep everything running in the cloud, and software architects strongly prefer simple designs. How much do these layers weigh? How many MB/GB of data will I need to transfer? How often? Does that really give me better latency than just transferring a few KBs of the AIs output?
> I think the number of people interested in running ML models locally might be greatly overestimated [here]. There is no killer app in sight that needs to run locally. People work and store their stuff in the cloud. Most people just want a lightweight laptop, and AI workloads would drain the battery and cook your eggs in a matter of minutes, assuming you can run them. Production quality models are pretty much cloud only, and I don’t think open source models, especially ones viable for local inference will close the gap anytime soon. I’d like all of those things to be different, but I think that’s just the way things are.
Of course there are enthusiasts, but I suspect that they prefer and will continue to prefer dedicated inference hardware.
Local ML isn't a CPU workload. The NPUs in mobile processors (both laptop and smartphone) are optimized for low power and low precision, which limit how much memory bandwidth they can demand. So as I said, demand for more memory bandwidth depends mainly on how powerful the GPU is.
I suspect consumer workloads to rise