Realistically the problem here might just be that the concept of open source doesn't really fit machine learning models very well, and we should stop trying to force it.
Sharing the end product, but not the tools and resources used to produce it, is how open source has always worked. If I develop software for a commercial operating system using a commercial toolchain, and distribute the source code under GPL, we would call that software open source. Others who get the code don't automatically get the ability to develop it themselves, but that's kind of beside the point. I don't have the rights to publicly redistribute those tools, anyway; the only part I can put under an open source license is the part for which I have copyright.
Training data for a LLM like Llama works similarly when it comes to copyright law. They don't own copyright and/or redistribution rights for all of it, so they can't make it open, even if they want to.
If that seems unsatisfying, that's because it is. Unfortunately, though, I don't think the Free Software community is going to get very far by continuing to try to fight today's openness and digital sovereignty battles using tactics and doctrine that were developed in the 20th century.
It does fit it. Perfectly. It's incredible. Like an Internet of all Human Knowledge released before 1965. OpenAI could of done this. The battle to me is just people respecting ideas instead of saying they are impossible or unnecessary because what we have is good enough.