“The concern is that these models will play a larger role in autonomous systems ...

emporas · on July 29, 2023

Most probably the Statistical Engines of the future i.e. A.I., will be different than GPT and the likes. As soon as the context window can be extended to a billion tokens, as it is claimed by a recent microsoft paper, using a technique they named it as dilation, then there is no need to train the language model on random input from the internet.

We can use GPT4 to create different versions of the children's book "my little pony", with many different syntaxes of simple sentences, grammars and languages as well, and train the model in one million (one billion?) different rewordings of the same story.

From then on, if the model is trained correctly to recognize language input and generate it as well, then we load up to the context window the additional knowledge we want it to know. Say we are interested in medicine, we load up into the context window the whole pubmed of 36 million papers, and interact with that knowledge base.

As Yann Le Cunn have stated, we humans don't need exabytes of data to learn language, why should a computer need that much?