If current LLMs hit a scaling wall and the game becomes about efficiency, I wonder if there's going to be space in the market for small models focussed on specific use cases.
I use Gemini to extract structured data from images and the flash model is great at this. I wonder how much effort it would be to create a smaller model that would run on something like a NUC with an AMD APU that is good enough for that one use case.
Or perhaps you end up with mini external GPU sticks that run use case specific models on them. Might not be much of a market for that, but could be pretty cool.
that's already the case, and it's called model distillation. You use LLMs to generate labels but then you use a dedicated smaller model (usually NN) to run at 1000x cheaper cost of inference.
I think beyond the technical aspect it's a product and packaging problem.
All the effort is in productizing foundational models and apps built on top of them, but as that plateaus distilled models and new approaches will probably get more time in the sun. I'm hopeful that if this is the case we will see more weird stuff come available.
Yes, and people buying random GPUs for ether etc. I'm not a huge fan of what crypto has become but there was something exciting about hacking stuff together at home for it which is currently missing in AI IMO.
Maybe it's not really missing and the APIs for LLMs are just too good and cheap to make homebrew stuff exciting.
It's possible to run models locally, fidget with temp etc
Being able to change other things on the fly like identify weights most used for a prompt and just changing those to see what happens is much harder.
I've tried both LLMS and image generators on my machine locally and while it's gotten in easier it's a long task just setting up. Especially if you run into driver issues.
I use Gemini to extract structured data from images and the flash model is great at this. I wonder how much effort it would be to create a smaller model that would run on something like a NUC with an AMD APU that is good enough for that one use case.
Or perhaps you end up with mini external GPU sticks that run use case specific models on them. Might not be much of a market for that, but could be pretty cool.