I think the whole situation where they got some serious investment from SBF and then he got indicted pushed them into commercialising their tech so they could have more standard sources of funding
Tautologically. Machine learning is a blanket term for a wide range of approaches that allow machines to mimic the brain's capability to learn, at least in function if not form.
The brain may employ wildly different machine learning algorithms from those currently in vogue, but whatever algorithms the brain is using must be machine learning algorithms. Unless of course you define machine in this case to be something which isn't the brain, in which case they're just learning algorithms. Regardless, an architecture exists to match the brain's capabilities with not only finite but surprisingly limited hardware.
In the very vaguest sense of the word, absolutely. Not that there is a literal transformer algorithm, but there is some evolved learning algorithm in the neurons of the brain that is at least somewhat a distant cousin of what we’re doing today.
I mean you didn't mention autoregressive models anywhere in your comment, whereas the post is about the connection between diffusion and autoregressive modelling. Also it's a blog post, if it has figured out a speed-up or improved method it would probably have been a paper
Criminalize acts that would clearly be free speech in the US: the trucker convoy (Canada) (even just voicing support for it); mean tweets as hate speech (UK) (recent riots, see also JK Rowling).
Depends on a ton of stuff really, like size of the model, how long do you want to train it for, what exactly do you mean by "like Hacker News or Wikipedia".
Both Wikipedia and Hacker News are pretty small by current LLM training sets standards, so if you train only on for example a combination of these 2 you would likely end up with a model that lacks most capabilities we associate with large language models nowadays