How is open source supposed to keep up with the compute demands of training models in the long term? From what I've seen, open source AI is pretty much entirely downstream of corporations releasing their models at a loss (e.g. Stability), or their models being leaked by accident (e.g. LLaMA), and that's hardly sustainable.
Traditional open source works because people can easily donate their time, but there isn't so much precedent for also needing a few hundred H100s to get anything done. Not to mention the cost of acquiring and labeling clean training data, which will only get more difficult as the scrapable internet fills up with AI goop.
It would be interesting if private LLM sites pop up similar to private trackers, using a BitTorrent like sharing of resources models. Seems impossible now, but if there becomes a push for computers to be able to handle local models it would be interesting to if there becomes a way to pool resources for creating better models.
Historically compute/memory/storage costs have fallen as demand has increased. AI demand will drive the cost curve and essentially democratise training models.
This assumes that commercial models won't continue to grow in scope, continuing to utilize resources that are beyond the reach of mere mortals. You could use 3D rendering as an analogy - today you could easily render Toy Story on a desktop PC, but the goalposts have shifted, and rendering a current Pixar film on a shoestring budget is just as unfeasible as it was in 1995.
It's always been the case that corporates have more resources, but that hasn't stopped mere mortals outcompeting them. All that's required is that the basic tools are within reach. If we look at the narrow case of AI at this point, then the corporates have an advantage.
But the current model of huge, generic, trained models that others can inference, or possibly just query, is fundamentally broken and unsuitable. I also believe that copyright issues will sink them, either by failing to qualify as fair use or through legislation. If there is a huge LLM in our future is will be regulated and in the public domain, and will be an input for other's work.
The future not only consists of a multitude of smaller or refined models but also machines that are always learning. People won't accept being stuck in a (corporate) inference ghetto.
or the other way around - large, general-purpose models might sink copyright itself since good luck enforcing it.... even if they somehow prohibit those models, they'll still be widely available
> Mistral AI [...] has raised 385 million euros, or about $415 million in October 2023 [...] In December 2023, it attained a valuation of more than $2 billion.
This sounds like the same deal as Stability - burning VC money to subsidise open source models for marketing. You don't get a $2 billion valuation by giving your main assets away for free indefinitely, the rug will be pulled at some point when they need to realise returns for their investors.
Today is the AI equivalent of the PDP11 times in general computing. "Personal computers" were rare, expensive and it was easy for large companies (IBM etc) to gate keep. Open source was born in these days, but it really thrived after PCs became commonplace. This will happen for ai training hardware too. And pooling of resources will happen too.
Although companies like Google and will do everything in their power to prevent it (like creating a nice inference/training chip and leveraging OS to make it one of the standard tools if AI and making available only a horribly cut down version - Edge TPU, on the market).
The only thing that can slow this down this is brain dead stupid regulation which all these large companies, and various "useful idiot" doogooders are lobbying for. Still, this(modern AI) is as powerful ability amplifier for humans as the Internet and the PC itself. I remember when personal computers entered the picture, and the Internet, and I'm telling all you younger people out there, thus is far bigger than that. It(AI) gives far too much knowledge to a common man, you don't understand some technological or scientific concept, an algorithm? Talk to the ai chatbot (unless it happens to hallucinate) you will eventually gain the understanding you seek. I'd give a lot to have access to something like this when I was growing up.
What we are currently seeing with all these large companies is the "sell it well below the cost" phase of ai "subscriptions" once everyone makes a non removable part of their life they'll hike up the prices 3 orders of magnitude and everyone will pay, why? Will your job accept a 50% loss of productivity you gained with AI? Will you accept having to use the enshittified search engines when you can ask the AI for anything and get a straight (mostly true) answer? Will a kid that got used to asking the AI to explain every math problem to him be able to progress without it? No.
Don't get me wrong.AI is a marvellous force for the good, but there is the dangerous side. Not one promoted by various lobbyist of "ai taking over" no. The danger is that a small number of mega corps will have a leverage against the entire humanity by controlling access to it. Perhaps not even by money, but by regulating your behaviour. Perhaps Google will require you to opt-in will all your personal data to use their latest models. They will analyse you as an individual, for safety of course, so they "don't provide their most powerful models to terrorists, or extremists".
What is the antidote to this? Open source models. And leaked commercial models (like llama) until we can create our own.
Traditional open source works because people can easily donate their time, but there isn't so much precedent for also needing a few hundred H100s to get anything done. Not to mention the cost of acquiring and labeling clean training data, which will only get more difficult as the scrapable internet fills up with AI goop.