with all of the local ML being introduced by Apple and Google and Microsoft this...

throwuxiytayq · 2024-09-28T07:38:11 1727509091

I think the number of people interested in running ML models locally might be greatly overestimated [here]. There is no killer app in sight that needs to run locally. People work and store their stuff in the cloud. Most people just want a lightweight laptop, and AI workloads would drain the battery and cook your eggs in a matter of minutes, assuming you can run them. Production quality models are pretty much cloud only, and I don’t think open source models, especially ones viable for local inference will close the gap anytime soon. I’d like all of those things to be different, but I think that’s just the way things are.

Of course there are enthusiasts, but I suspect that they prefer and will continue to prefer dedicated inference hardware.

0x000xca0xfe · 2024-09-28T12:47:27 1727527647

Microsoft wants to bring Recall back. When ML models come as part of the OS suddenly there are hundreds of millions of users.

throwuxiytayq · 2024-09-28T16:08:47 1727539727

I have some difficulty with estimating how heavy Recall’s workload is, but either way, I have little faith in Microsoft’s ability to implement this feature efficiently. They struggle with much simpler features, such as search. I wouldn’t be surprised if a lot of people disable the feature to save battery life and improve system performance.

layer8 · 2024-09-28T15:50:41 1727538641

It remains to be seen whether users want what Microsoft wants.

fasa99 · 2024-09-29T00:36:12 1727570172

Huh? All the files are local and models are gonna be made to have a lot of or ultimately all of them in the "context window". You can't really have an AI for your local documents on the cloud because the cloud doesn't have access. Same logic for businesses. The use case follows from data availability and barriers.

We've observed the same on web pages where more and more functionality gets pushed to the frontend. One could push the last (x) layers of the neural net for example, to the frontend, for lower expense and if rightly engineered, better speed and scalability.

AIs will be local, super-AIs still in the cloud. Local AIs will be proprietary and they will have strings to the mothership The strings will have business value both related to the consumer and for further AI training.

throwuxiytayq · 2024-09-29T06:44:32 1727592272

> All the files are local

From what I’ve seen, most people tend to use cloud document platforms. Microsoft has made Office into one. This has been the steady direction for the last few years; they’ll keep pushing for it, because it gives them control. Native apps with local files is an inconvenient model for them. This sadly applies to most other types of apps. On many of these cloud platforms, you can’t even download the project files.

> You can't really have an AI for your local documents on the cloud because the cloud doesn't have access

Yes, up to the cloud it all goes. They don’t mind, they can charge you for it. Microsoft literally openly wants to move the whole OS to the cloud.

> Same logic for businesses

Businesses hate local files. They’re a huge liability. When firing people it’s very convenient that you can just cut someone off by blocking their cloud credentials.

> We've observed the same on web pages where more and more functionality gets pushed to the frontend

It will never go all the way.

> One could push the last (x) layers of the neural net for example, to the frontend, for lower expense and if rightly engineered, better speed and scalability

I’ll believe it when I see it. I don’t think the incentive is there. Sounds like a huge complicating factor. It’s much simpler to keep everything running in the cloud, and software architects strongly prefer simple designs. How much do these layers weigh? How many MB/GB of data will I need to transfer? How often? Does that really give me better latency than just transferring a few KBs of the AIs output?

tucnak · 2024-09-28T10:44:40 1727520280

> AI workloads would drain the battery and cook your eggs in a matter of minutes, assuming you can run them

M2 Max is passively cooled... and does 1/2 of 4090's token bandwidth in inference.

MindSpunk · 2024-09-28T15:55:51 1727538951

Um, no? The MBP has fans? The base level M2 in a MacBook Air is passively cooled and thermal throttles very aggressively when you push it.

tucnak · 2024-09-30T14:32:21 1727706741

That sounds right, I was judging by mac studio I own that is passively cooled.

Onavo · 2024-09-28T07:56:30 1727510190

> I think the number of people interested in running ML models locally might be greatly overestimated [here]. There is no killer app in sight that needs to run locally. People work and store their stuff in the cloud. Most people just want a lightweight laptop, and AI workloads would drain the battery and cook your eggs in a matter of minutes, assuming you can run them. Production quality models are pretty much cloud only, and I don’t think open source models, especially ones viable for local inference will close the gap anytime soon. I’d like all of those things to be different, but I think that’s just the way things are. Of course there are enthusiasts, but I suspect that they prefer and will continue to prefer dedicated inference hardware.

Do you use FTP instead of Dropbox?

cvs268 · 2024-09-28T17:10:24 1727543424

Do you use Dropbox instead of rsync?

( Couldn't resist (^.^) )

wtallis · 2024-09-28T16:06:48 1727539608

Local ML isn't a CPU workload. The NPUs in mobile processors (both laptop and smartphone) are optimized for low power and low precision, which limit how much memory bandwidth they can demand. So as I said, demand for more memory bandwidth depends mainly on how powerful the GPU is.