> better for edge AI than whatever is out there, so I'm looking forward to this ...

gopher_space · on May 7, 2024

> Everyone is going crazy about edge AI, what am I missing?

If you clone a model and then bake in a more expensive model's correct/appropriate responses to your queries, you now have the functionality of the expensive model in your clone. For your specific use case.

The size of the resulting case-specific models are small enough to run on all kinds of hardware, so everyone's seeing how much work can be done on their laptop right now. One incentive for doing so is that your approaches to problems are constrained by the cost and security of the Q&A roundtrip.

jitl · on May 7, 2024

Quantized LLMs can run on a phone, like Gemini Nano or OpenLLAMA 3B. If a small local model can handle simple stuff and delegate to a model in the data center for harder tasks and with better connectivity you could get an even better experience.

SmellTheGlove · on May 7, 2024

> If a small local model can handle simple stuff and delegate to a model in the data center for harder tasks and with better connectivity you could get an even better experience.

Distributed mixture of experts sounds like an idea. Is anyone doing that?

cheschire · on May 7, 2024

Sounds like an attack vector waiting to happen if you deploy enough competing expert devices into a crowd.

I’m imagining a lot of these LLM products on phones will be used for live translation. Imagine a large crowd event of folks utilizing live AI translation services being told completely false translations because an actor deployed a 51% attack.

jagger27 · on May 7, 2024

I’m not particularly scared of a 51% attack between the devices attached to my Apple ID. If my iPhone splits inference work with my idle MacBook, Apple TV, and iPad, what’s the problem there?

moneywoes · on May 7, 2024

what about in situations with no bandwidth?

mr_toad · on May 8, 2024

Using RAG a smaller local LLM combined with local data (e.g. your emails, iMessages etc) can be useful than a large external LLM that doesn’t have your data.

No point asking GPT4 “what time does John’s party start?”, but a local LLM can do better.

jwells89 · on May 8, 2024

This is why I think Apple’s implementation of LLMs is going to be a big deal, even if it’s not technically as capable. Just making Siri better able to converse (e.g. ask clarifying questions) and giving it the context offered by user data will make it dramatically more useful than silo’d off remote LLMs.

jchanimal · on May 7, 2024

It fits on your phone, and your phone can offload battery burning tasks to nearby edge servers. Seems like the path consumer-facing AI will take.

callalex · on May 7, 2024

In the hardware world, last year’s large has a way of becoming next year’s small. For a particularly funny example of this, check out the various letter soup names that people keep applying to screen resolutions. https://en.m.wikipedia.org/wiki/Display_resolution_standards...