What always struck me about Chomskyists is that they chose a notion of interpret...

dekhn · on May 9, 2024

There is a ballmer peak for pontificating.

As an aside but relevant to your point, my entire introduction to DNA and protein analysis was based on Chomsky grammars. My undergrad thesis advisor David Haussler handed me a copy of an article by David Searls "The Linguistics of DNA" (https://www.scribd.com/document/461974005/The-Linguistics-of...) . At the time, Haussler was in the middle of applying HMMs and other probabilistic graphical models to sequence analysis, and I knew all about DNA as a molecule, but not how to analyze it.

Searls paper basically walks through Chomsky's hierarchy, and how to apply it, using linguistic techniques to "parse" DNA. It was mind-bending and mind-expanding for me (it takes me a long time to read papers, for example I think I read this paper over several months, learning to deal with parsing along the way). To this day I am astounded at how much those approaches (linguistics, parsing, and grammars) have evolved- and yet not much has changed! People were talking about generative models in the 90s (and earlier) in much the same way we treat LLMs today. While much of Chomsky's thinking on how to make real-world language models isn't particuarly relevant, we still are very deeply dependent on his ideas for grammar...

Anyway, back to your point. While CFGs may be O(n*3) I would say that there is a implicit, latent O(n) parseable grammar underlying human linguistics, and our brains can map that latent space to its own internal representation in O(1) time, where the n roughly correlates to the complexity of the idea being transferred. It does not seem even remotely surprising that we can make multi-language models that develop their own compact internal representation that is presumably equidistant from each source language.