Gotta wonder if Google has used code from internal systems to train Gemini? Probably not, but at what point will companies start forking over source code for LLM training for money?
It seems much cheaper, safer legally and more easily scalable to simply synthesize programs. Most code out there is shit anyway, and the code you can get by the GB especially so.
I would assume that internal code at Google is of higher quality than random code you find on Github. Commit messages, issue descriptions and code review is probably more useful too.
Anything innovated must come from outside or have a very close permutation to be found.
Generative AI isn't scary at all now. It is merely rolling dice on a mix of other tech and rumors from the internet.
The data can be wrong or old...and people keep important secrets.