It's interesting how different this is from 10 years ago, when Chomsky's theories were the basis of all modern NLP, or even 5 years ago, when most NLP used a hybrid of formal grammars + embeddings. I remember attending a tech-talk on part-of-speech tagging in 2011; the state-of-the-art then was a probabilistic shift-reduce parser where the decision to shift vs. reduce at each node was done by a machine-learned classifier.
Wittgenstein emphasized meaning as context and usage before Chomsky, but the actual method was first properly investigated by structural linguists such as JR Firth and Zelig Harris, who was Chomsky's supervisor. Good articles here: