The way Norvig tells it, language lends itself perfectly well to smoothing and even Chomsky's favourite absurd phrase "colorless green ideas sleep furiously" is assigned a (very small but non-zero) probability in some statistical models of English.
Truth is you can't do much with statistical models of language without some sort of way to account for what's missing from your data, which is always most of language. On the other hand, anything you might do is never going to be enough when that's the case: that you're missing the majority of language from your training data.
Some NLP tasks do indeed lend themselves well to smoothing, but I was thinking of language understanding tasks like question answering where changing or misunderstanding a single word in an entire novel can easily make a correct answer completely incorrect.
I agree with your second paragraph, and if I understand Chomsky correctly, that is part of why he argues in favor of a generative grammar. I can't say that I completely understand how such a grammar would be linked to semantics and experience though.
It demonstrates that human languages all have a certain structure, so it must be an innate faculty of human beings (as opposed to acquired). A neural network could not tell you that. It also suggests further avenues of inquiry (Why that particular structure?)
I don't think much is understood about how linear externalizations of language are deserialized into symbolic structures, what those structures are, and how those are symbol structures are mapped into mental representations.
The issue Chomsky raises isn't that statistical models aren't predictive enough, but that you don't understand they don't typically yield any understanding of the underlying system.
Similarly (and I understand this argument went through physics long before quantum mechanics showed up and made the argument seem silly), the ideal gas law, say, is predictive, but doesn't yield any understanding of the underlying system.
Truth is you can't do much with statistical models of language without some sort of way to account for what's missing from your data, which is always most of language. On the other hand, anything you might do is never going to be enough when that's the case: that you're missing the majority of language from your training data.