Chomsky is right up there with Minsky in being part of the problem. His ideas about language being part of the genome are fanciful nonsense. Skinner produce reams of reproducible empirical observations of behavior which are, today, critical to the evaluation of the performance of AIs. Chomsky has produced interesting theories, but mostly derailed linguistics on the basis of the argument 'language is complicated, something magical must be going on.' His contributions to political debate are monumental. His contributions to science are negligible if not entirely detrimental.
> His contributions to science are negligible if not entirely detrimental.
I'm not a linguistics expert, but from my vantage point as a computer scientist, it would be hard to conclude that. Chomsky essentially founded the field of formal language theory, and was the first to propose many widely-used constructs, such as the context-free grammar. They may or may not explain how English works, but pretty much everything on the Chomsky hierarchy has been put to productive use in computers.
In a much more general sense, his arguments against blank-slate learning are pretty mainstream in machine learning as well. Chomsky's specific approaches aren't too widely used (though there is some work on grammar induction), but the idea that inductive bias is key to machine learning is widespread, though perhaps in a weaker sense than the very specific and strong inductive bias Chomsky proposes for language learning.
> > His contributions to science are negligible if not entirely detrimental.
> I'm not a linguistics expert, but from my vantage point as a computer scientist, it would be hard to conclude that. Chomsky essentially founded the field of formal language theory
Chomsky did great work for CS (and NLP) through formal language theory, but his work in linguistics -- in particular, his anti-empiricism and hostility to statistical views of language -- was very unhelpful for getting computers to understand language. See e.g. http://www.cis.upenn.edu/~pereira/papers/rsoc.pdf
Language theory predates Chomsky. The earliest known activities in descriptive linguistics have been attributed to Panini around 500 BCE, with his analysis of Sanskrit in Ashtadhyayi.[6] So, while Chomsky is famous his actual contributions are far less than you might suppose.
Possibly a linguistic (hah!) confusion. I don't mean that he pioneered the study of language, i.e. linguistics. Clearly people have been studying languages for millennia. When I say he pioneered formal language theory, I mean it in the computer-science sense, that he pioneered the study of formal languages. That's the theory of how to mathematically and computationally parse sequences of tokens into semantic meanings. Chomsky wasn't the first there, since work in symbolic logic (e.g. Frege's work) also considered the formal syntax/semantic mapping. But much modern work in parsing and related areas is derived from Chomsky's work. Any programming-language researcher has heard of and likely used the concept of a "context-free grammar" for example, which Chomsky originated. More information: http://en.wikipedia.org/wiki/Chomsky_hierarchy
Put more simply, I'm saying that he did valuable scientific work in developing his formal analysis of grammars, even if it fails to capture human language. He intended it to capture human grammar, but it's an interesting computation-amenable model of grammar even if it isn't how humans speak, because it is pretty much how we now write programming languages. When computer scientists today talk about "grammars", for example someone saying that they're writing "an ANTLR grammar for Clojure", they mean it in the Chomskian sense.
I don't think he was claiming that Chomsky IS language theory, or even that he pioneered the field. However, to say his contributions to science are "negligible" is absurd.
Chomsky revolutionized linguistics in the 1960s. Compiler theory incorporates a huge amount of his work. Calling his contributions negligible or detrimental is like saying Freud's contributions were the same.
Could you elaborate on Freud? I thought the general consensus was that none of his "theories" were accurate and evidence doesn't support or it directly contradicts his ideas. From the admittedly little I've read, it comes off as just "making stuff up" and generating frameworks that can explain anything (hence nothing). Psychoanalysis certainly has no place in modern psychology.
Or are you saying that Freud, despite being entirely wrong, got things moving and eventually helped folks to start taking psychology as a serious science? Or another point I am not understanding?
I am not the grandparent author, but I think your last paragraph describes it well.
Yes looking back Freud's theories are bogus. However, that is only looking back and taking in consideration what has happened after. However, whatever happened after in the theory of how mind and human psychology was very much influenced by Freud. A lot of it was a reaction but it was still a reaction to something. Those that came after studied Freud and it was their background.
To say it another way, it hard to predict what would have happened if Freud was not there. We could have been in a worse shape maybe today.
CogSci PhD student here, I'm no fan of Chomsky but I'd still say his contributions were a lot more useful than Freud's. Freudian Psychology is mostly abandoned by practitioners at this point; the Chomsky hierarchy is still canon in CS.
The Chomsky hierarchy while being but a small part of his work is undoubtedly a useful technical contribution, but I don't see it as particularly relevant for cognition and linguistics. As an example, for cognition memory constraints seem to be much more fundamental than the type of formal rules. It is also rather arbitrary: many kinds of formal languages don't fit into his hierarchy, there are many other ways to carve up the space of possible languages/grammars.
I think that his general linguistic theories might share the same fate as Freud's theories: large, complex theories for which there is simply not much empirical support, making it difficult for people to continue to work on them without their originator imbuing them with his authority.
Though you have to wonder how much of that is a lack of interest in research. PCRE has been shown to not fit his hierarchy, they are more than regular but less than context-free.
Language is a set of behaviors
Behavior for humans and other animals requires a brain and body (almost always)
The organisation, growth, and function of the brain and body are influenced by genetics.
So we can say first and formost that language is influenced by genetics. Having a tongue, vocal chords, hands, ears. These all help to acquire language.
So now the question is to what degree does genetics influence language and language acquisition?
Is it fully governed by genetics? We know that removing a child from the company of others during development eliminates complex language and severely stunts their ability to acquire language. So, in a single individual, language has never been observed to arise spontaneously and the facility to acquire language is something you can lose.
If you damage certain portions of the brain aspects of language can be removed. See Fluent Aphasia for a particularly odd example. However, these aphasias are known to remit and studies show that other portions of the brain have taken over from the damaged portion. So we know that the capacity for language is not entirely localized to a genetically determined location.
We also know there was a time when there wasn't language and now there is, so at least once in human history language did arise spontaneously. The article you mention is evidence that language can evolve spontaneously in groups of humans. In fact there is evidence that the rudiments of language arise in groups of many animal species including apes, whales, and dolphins. Many animals communicate, but with insufficient sophistication as to be described as a language.
So it is safe to say that there are genetic factors in producing communicative behavior. Sounds, postures, marking etc. I feel it is also safe to say that there are genetic factors that predispose groups of some species to develop more complex systems of communication (but these are not limited to humans) and that groups of humans are particularly good at complex communication.
Unfortunately for Chomsky there is little to no evidence that any one type of language is more likely than another. Grammars, phonemes, words, abstract concepts ... all of these have huge between language variation. The similarities between languages are well explained by either physical characteristics of speech production or regularities in the environment of the languages origin.
So yes, language is both learned, and cultural, even if it isn't purely so.
For me, this perfectly sums up the futility of the Strong AI / AGI research project. So much of what we consider to be 'intelligence' or repercussions of intelligence are in fact language-based communications and culture - with an immense genetical legacy, bias, and quite frankly, burden. To create human-equivalent intelligence, most or indeed all of these evolutionary biases would have to be built in.
Consider a different example: constructing artificial vision. Human vision is the result of evolution, of course. It is incredibly inefficient, but evolution is blind (sorry). When we now construct computer vision, we capture an optical image and transmit it optically as long as possible, since this preserves information. The human eye does not: it sends information via neurons to the visual cortex, compromising immensely in bandwidth. That's why we need visual error-correction mechanisms in the brain, and redundancy of visual information (achieved by the eye moving rapidly many times per second, for example).
When we construct optical computer vision that achieves the a similar thing as human vision, the two have nothing in common. You can't plug in the artificial front end into the biological back end. The two systems produce literally different images, that are not comparable. The systems will not communicate with each other. We have skipped the legacy of biological evolution completely. To create an artificial system that accurately corresponds to the biological would be an immense waste.
The same goes for intelligence. Our 'wet' evolved intelligence is an entirely different picture from the project of Strong AI. They will produce very different manifestations of interacting with the world. Why would the latter ever result in the illusion of free will or the self-referentiality of personal identity, two things we assume are parts of human-level intelligence? They are like the neurological channels for conveying an optical image: hopelessly inefficient, but the ones that make sense in the light of our evolutionary legacy.
As a side note, yes we know of a time when there was no language, but it not a good representation to consider that a binary switch. We know of complex communication between other animals, and given that even current languages are in rapid flux, I think it is is fair to think of the transition from pre-language to language a continuum. Language is still arising.
I don't understand what bearing this should have on the probability of AGI succeeding. What we (should) care about is not whether the system is conscious or even human-like. What matters is that it performs well on the tasks it is assigned. I consider it highly likely that we can do better than evolution did, in less time than evolution took.
You should read Steven Pinker's The Language Instinct. He summarises Chomsky and the field's main insights into language brilliantly. One of the points is that while we see languages as extremely diverse, most of the differences are actually quite superficial and can be efficiently modelled by a small set of rules with a few key configurable settings, and that this basic structure for language is universal among all humans. We are just tuned to focus on the differences and not notice the overwhelming similarities. Great book anyway you cut it.
What's really fascinating is wild chimps use a rich form of communication that's far from an actual language. But, they can be trained use complex sign langue which they will then use to communicate with each other. So, language may have developed fairly rapidly relative to the underlying biology.
I'm surprised he brought that up, it's a weird point. Because a different part of the brain processes the constructed language it's no longer language? It's also untrue. We construct language patterns, poetry for example, where the positional number of a word or a syllable changes its meaning.
That said his point about the brain being wired to take the less computationally intensive route is a very important insight which I think extends beyond genetics and throughout the evolution of all biological processes.
Actually, his point is that our language organ takes the more computationally expensive route. Not the easier one. That's the puzzle.
I don't have his books on me, so I may misconvey this point, but IIRC, he also mentioned regular languages (you know, like with regexes) as another example of a computationally "easier" language family we don't pick up with our language organ. We don't speak arbitrary languages. The space of languages is filtered by genetics.
He delves into this point more deeply in many places and addresses the points you raise with a precision beyond what's found in interviews.
I suppose it depends on your definition of 'expensive'. By that I mean, if your computational model is inherently serialised, sure, regexes are cheap. If fuzzy matching and rough temporal correlation between processing units turns out to be cheap, perhaps regexes are ridiculous extravagance on your hardware family.
I suppose you could turn it around - assuming our language processing is optimal (bit of a leap) you can infer things about our hardware architecture by the languages which we parse efficiently.
I hope you're kidding about his contributions to political debate. Chomsky has the maximalist attitude that represents everything that is wrong with political debate today. His hatred of U.S. foreign policy is so extreme that he will defend absolutely everyone the U.S. opposes which means occasionally defending tyrants and denying genocide.
Sadly enough for (U.S. foreign policy and its supporters) he actually supports his claims very well with sources and footnotes, and if you read through this works, you might find that perhaps there is a reason others (who don't just watch Fox News) don't agree with said policy, and also that somehow Americans in certain parts of the world are not "hated because of our freedoms". There are other reasons.
I'm amused by the idea that Chomsky is somehow some kind of dunce when it comes to science but a brilliant thinker when it comes to politics. At least with either claim individually, I can conceive of who might say that, even while thinking they're radically wrong (no pun intended).
But the idea that Chomsky's political "contributions" somehow dwarf his contributions to science? That seriously floors me.
"My own concern is primarily the terror and violence carried out by my own state, for two reasons. For one thing, because it happens to be the larger component of international violence. But also for a much more important reason than that; namely, I can do something about it. So even if the U.S. was responsible for 2 percent of the violence in the world instead of the majority of it, it would be that 2 percent I would be primarily responsible for. And that is a simple ethical judgment. That is, the ethical value of one’s actions depends on their anticipated and predictable consequences. It is very easy to denounce the atrocities of someone else. That has about as much ethical value as denouncing atrocities that took place in the 18th century."
He dominated the field of linguistics for decades with anti-empiricist theories. You could argue that the popularity of his ideas held back alternative approaches; hence his contributions could be considered detrimental.
> He dominated the field of linguistics for decades with anti-empiricist theories.
But he dominated. Where was Norvig or IBM's Watson back then?
Unfortunately things like these don't happen in a vacuum. You can only say looking back that this is didn't or that was a bad idea. You should have said that at that time when the theory was advanced or came up with a better one.
We didn't exactly live in in a totalitarian regime where say some Politburo dictated what the official theory about Genetics should be and everyone else gets sacked, and now finally we have freedom from oppression and we can get back on track.
> We didn't exactly live in in a totalitarian regime
Ironically that is something that people allude to with regards to Chomsky (ironic because of his anarchist political beliefs); cf. the book the Linguistics Wars. The bottom line is that his theories were very dominant, not because of overwhelming empirical support, but because of his authority.
I'm not convinced that Chomsky is an 'anti-empiricist.' He's a theorist but his theories have been put to experiment, very successfully so I believe. Can you provide examples of Chomsky attacking the validity of experimental evidence?
How do you approach Heidegger's or Derrida's arguments that 'language is complicated, something magical must be going on'? Will you allow for no qualitatively different meaning to language than probabilistic associations of sounds with objects?
I can't speak to the OP, but extraordinary claims require extraordinary evidence; we are yet to observe anything in the natural world where "something magical [really turned out to be] going on", so this really is an extraordinary claim. It has been several years since I was current with the linguistics literature, but as far as I know no extraordinary evidence has been produced.
I don't think that requires that we view language solely as a probabilistic map between sounds and objects. All sorts of emergent behavior appear to be "magical" at first glance.
I did not reference linguistics literature, but philosophy literature (a distinction worth making because they are approaching the problem at different levels). I've never been good at the 'language problem' elevator speech, but how is it possible that we can capture the full power of language while using language to describe it? Language is a technology that allows for the production of concepts like 'probability' or 'apple'. Language as a tool is so fundamental to cognition that it becomes difficult to separate the two. Heidegger's Being and Time explores that line of thinking, and Derrida loves to play the game of "well that means you can't say anything!" Both have provided me with insight into what "magic" is referencing here (rather than your fairly pithy absence-of-evidence reference).
You're asking if I believe in qualia, and the answer is no. There are firing neurons and that's it. The great variety of ways in which neurons can fire, and the great variety of experiences that shape how neurons fire combine to form an exquisite set of possible firing patterns (this is literally what makes me me and you you) but ultimately, to mis-use Gertrude Stein's famous phrase, 'there is no there there.'
I don't believe I referenced qualia. The problem of language is an emergent phenomenon just as the utility of language is. I don't believe in subjective essentialism either.
It wasn't at all clear that that is what he was asking you. And you need to qualify the sense in which you "don't believe" in qualia. You don't believe that consciousness has phenomenal properties? Qualia certainly exist in some sense.
From what it sounds like, you are just dismissing compelling philosophical issues because it frustrates your beliefs.
Your observation is so vague and general to the point of being rather meaningless. Almost every physical theory is described by an underlying mapping between inputs.
The interesting point is the expression power of your model. : to take an example I am somehow familiar with, current large vocabulary speech recognizers have millions of parameters. They work relatively well, but they are very difficult to interpret, and it is hard to see how they help us understanding how speech recognition actually works in our brain.
To make a somehow flawed analogy, every Turing complete language is equivalent, but getting the machine code of a very large project is not very interesting if you want to understand it, while it is mostly enough if you just want to use it.
Do you have any reason to suspect this isn't how the brain works? Maybe language isn't a small set of high level rules. Why should we suspect it to be? The probablistic models seem to be very similar to how real people actually learn informal language. Formal languages of course have high-level rules, and these are well modelled algorithmically.
I don't particularly have any reason to believe one way or the other. Certainly, the probabilistic models for language are created "out of the blue" without any attempt to model how human learn languages.
That is EXACTLY the stance that Chomsky is challenging here. Don't get me wrong, I think the probabilistic view has truth value and is a powerful predictive tool. However, I don't think it captures the phenomenological properties of the act of language based cognition. See my cousin reply referencing Heidegger.
Chomsky claims language is built-in. So probabilistic associations are the exact opposite of his claims. Interestingly, there have been some very good baby studies that show babies inherently know statistics needed to learn probabilistic associations. You can show them a couple different color balls going into a box, then take out a bunch of the rare color that went in, and they will register more surprise even long before they can even talk. Things like that. Chomsky's assertions are, well, what you would expect from someone that old. They didn't understand neurons and biology and genetics so well then, so yay, magic things are possible!
In general he does. In this article he doesn't talk about he talks about approaches to AI.
> there have been some very good baby studies that show babies inherently know statistics needed to learn probabilistic associations.
Very good. Can we identify how that works and then build a robot that has the same mechanism in a more efficient way than simply simulating a brain at a molecular level. That is his argument here.
> They didn't understand neurons and biology and genetics so well then, so yay, magic things are possible!
So where were you 4-5 decades ago when he proposed his theory to propose a better one?
The idea that meaning consists of associations is extremely primitive. It works for concrete nouns and verbs but it quickly fails as things get more complex. Language is used to refer to refer to abstract things, imaginary things, counterfactual situations, etc. And even if you do arrive at a series of concepts using associations, you have to understand how they are supposed to combine, even for completely novel sentences. In all these cases, there's arguably nothing there to associate with. I can't answer your question (I think no one can), but we can conclude that meaning is more than associations.
Isn't that Chomsky's argument here (in this article, not in his approach to linguistics in general)? -- That it is a good idea to try to find a better understanding of how the internal mechanisms work so we can build or simulate it better. You do that with carefully constructed experiments not with just observing inputs and outputs and training a neural network or a Markov model with it.
Please argue that there is nothing to associate with.
Why is a real observation from your senses more privileged inside your brain that a random well-formed value by a (hypothetical) random number generator neuron?
I argued that if you only have associations with previous experiences, you won't be able to deal with novel input. Ergo, you need more than just associations (synthesis, imagination, counterfactual reasoning, etc.).
As to your second question, I don't see how it relates to my argument, but I'll answer anyway. If you're comparing a observation to a random number, you're looking at the observation qua value, in which case it has the same status. If, however, you look at the level of interpretation (what it means in your brain), the observation has a complex set of relations with the rest of your brain and gives rise to a perception, wheres the random number value is just noise that has to be tolerated by the brain.
Saying everything is just probabilistic associations is like saying everything is made from quanta of energy and thus every higher level concept or model is useless -- just simulate the quarks and you are set. Not only that -- simulate it by recording and observing the energy patterns going in and coming out of a black box.
Yes you can get some things to work and some to work well but the idea is that perhaps there is a better model that describes the mechanism or the encoding of meaning. That's what Chomsky is trying to say in this particular article. Some stopping at a brute force approach is a fine engineering approach but that doesn't mean everyone should, it is still worth trying to find a better model for it, if at least, just to gain an understanding.
It's skyhooking. Given enough time and enough compute power, we should be able to completely simulate it. It may not be economically feasible, and it may take a long time.
I had the same notion from the first part of the interview - wondering whether Chomsky would actually bring something to the table to "unriddle" this mystical machine in our head. Leaving his aversion to computational systems aside, I believe he makes a good point about our language being computationally inefficient (at least externally). maybe his insight [that our language is computationally inefficient] can be explained by molecular mechanisms that are efficient in terms of energy consumption (rather than on a computational level). looking which neural systems are most likely to evolve given a simple genetic code and reengineering (computationally inefficient and obscure) algorithms based on that might provide some valuable insight into the human brain - whereas probability measures alone might fail to explain underlying mechanisms.
I don't think you fairly represent Chomsky's theories about language, nor do I think you really understand them. There's no magic, and there's nothing even remotely ridiculous about language between a genetically-encoded instinct. Human beings have evolved complex brain modules which allow us to rapidly acquire complex combinatorial language systems. Language is to humans as hunting is to lions: a highly-evolved, highly-adaptive instinct.
The "right" way is to take endless numbers of videotapes of what's happening outside the video, and feed them into the biggest and fastest computer, gigabytes of data, and do complex statistical analysis -- you know, Bayesian this and that -- and you'll get some kind of prediction about what's gonna happen outside the window next. In fact, you get a much better prediction than the physics department will ever give. Well, if success is defined as getting a fair approximation to a mass of chaotic unanalyzed data, then it's way better to do it this way than to do it the way the physicists do, you know, no thought experiments about frictionless planes and so on and so forth. But you won't get the kind of understanding that the sciences have always been aimed at -- what you'll get at is an approximation to what's happening.
Chomsky seems to keep using naïve models as a strawman, and Norvig rightly calls him on it. If you use simple models, you can only get simple insights, but statistical machine translation (for example) builds probabilistic context-free grammars, which maps human notions of language far better than "make sure every three words in sequence is plausible".
Chomsky is agreeing that it's making a map. He just doesn't think that map is very useful on a scientific level, but is useful on an engineering level.
You're responding that he's wrong, because it's useful on an engineering level.
Right? I'm reading many comments here and they seem to keep boiling down to this notion. Am I wrong?
If, like Chomsky, you value having a model of the underlying cognition process rather than a set of black-box predictors for aspects of that problem (e.g., various corpus-driven translators), then you might be really annoyed that the black-box people are so satisfied with their results.
I object to your glibness. Probably both methods (first-principles cognitive modeling vs. high-degree-of-freedom black box learning) will prove informative, just in different ways.
Or in your terms, we may not get to pick the prettiest models, but we owe it to ourselves to explore the space of models to see if we can find the structure in it.
The engineer in me is pleased by the undoubted success the data-driven learning culture has had on problems of real importance. But this work is highly empirical, with a tendency to point solutions, and someone is likely to come in later on and generalize these methods (e.g., why do some families of black-box predictor or features outperform others for language learning). There's room for both approaches.
Breiman, as author of basic books on measure theory as well as on classification trees, was able to walk both sides of this line ("make a first-principles model" vs. "use lots of data"). He spent considerable energy over the years trying to introduce the data-intensive approach to conventional statistics. For instance, he was one of the handful of bona fide statisticians who would attend and contribute to neural net and machine learning conferences. Probably this strategy is more productive than Chomsky's grumpy-old-man warnings (or sagacious warnings, depending on how you look at it).
I think by default you'll find a disproportionate number of critics of Chomsky here. Some who understand what this about are more like to be engineers and like the engineering approach. Others who don't, saw Norvig's name and by default jumped to that side of the argument.
> If you use simple models, you can only get simple insights
Economics also play a large part in how information is parsed. The advancement of AI outside of academia is largely dependent on what it's being used for and how it's being used. Where great strides are being made is in search because it can be monetised and the computational power required is commensurate with the number of users/frequency of use and ROI. A complex model that can provide better insights but limits the number of concurrent users isn't as useful in a commercial sense.
I haven't finished reading TFA yet, but so far it's really good, because it sounds like Chomsky is actually getting to the point.... which he sometimes does, and sometimes absolutely doesn't (or so it often seems to me).
If you're interested in this area, which you might call the philosophy of artificial mind, or philosophy of cognitive science, I strongly suggest reading the link to Norvig's article (linked in TFA, but here it is again: http://norvig.com/chomsky.html ). In particular, I'd suggest reading it before reading the actual interview with Chomsky (maybe ideally reading it after the prefatory part of TFA).
My own inclination is that on the face of it, I find Norvig's approach less satisfying, as Chomsky appears to, but upon much consideration my current belief is that Chomsky's approach is too mystical and too just-so, and Norvig's approach at least has the merit of bearing fruit... fruit that one day might be concentrated into a concise and elegant theory.
You seem to agree with Norvig that doing massive data analysis on language will come to a scientific understanding, which would be a first in science.
Chomsky doesn't. If anything, Chomsky is grounded in reality, and Norvig and AI researchers are grounded in hope that this way of mapping out something will create meaningful understanding of the system.
This reminds me a bit of what scientists in other fields refer to as "empirical equations", which are equations fit from data without any particular theoretical backing or reason to believe that their components are a good model of reality. They're useful in that they may predict observations well, especially over a specific range of observables, but they don't necessarily give us an understanding of what's going on. An example is the historical Prony equation for hydraulic pressure loss (http://en.wikipedia.org/wiki/Prony_equation), which doesn't actually correctly model what happens in fluid flow, but does happen to empirically produce fairly good results over a range of values, partly through the use of two magic numbers fit from data.
Another example might be the winning entry in the Netflix competition. If your goal is to predict film preferences (which is Netflix's goal, of course), it appears to give pretty good estimates. But I don't think even its authors would claim that it's a a scientific model of how humans form preferences.
In both cases the underlying problem is that there are fairly general functional forms, such as a few terms of a Taylor series, or a forest of decision trees, that can empirically model almost anything to a certain degree of accuracy, given enough data, even if the underlying process looks nothing like them. Therefore they can give accurate predictions that work in practice, without being accurate models of what's happening in the underlying system. Chomsky appears to be of the opinion that statistical NLP systems are more of that variety, so may be good engineering solutions without being good scientific models.
Exactly. Well said, and clearly articulates the point here.
What I don't get is, what Chomsky is saying is all together the standard, and yet people are insulting him for what is all together a very simple idea you expressed plainly.
Yes, NLP systems are going to have many engineering uses, and Chomsky agrees. Are they going to help in the true scientific understanding of the systems?
It's unlikely. It's likely to be "good engineering solutions without being good scientific models" as you elegantly put it.
Yes, NLP systems are going to have many engineering uses, and Chomsky agrees. Are they going to help in the true scientific understanding of the systems?
So, what we have is (1) the engineering / statistical modelling / machine learning approach, and (2) the deep theoretical "Chomsky approach".
Chomsky despises, maybe rightly so, the engineering approach because it only provides tools that work, approximately, in practice but don't provide any deep "scientific" understanding.
The deep theoretical approach has a vision of a comprehensive theory that really provides understanding. Once we manage to find the deep fundamental theory, practical applications will be a child's play.
But here's the catch: Has the Chomsky approach takes us any closer to that deep theoretical progress? Why has all the practical progress come from the engineers? What if the deep theoretical thinkers are completely lost, like the alchemists in dark medieval times, in their theories and as they despise the data-driven approach, they also refuse to let empirical observations guide them to a right direction.
I don't think looking down on the engineers and their modest practical success is any kind of merit, if your only merit is dreaming of a deep theory, but making no measurable success towards said theory.
> Why has all the practical progress come from the engineers?
I see what you are saying and actually it is a good point. Where are the robots built on Chomsky's theory? A very valid question. I don't know the answer to it, Chomsky doesn't either. But I think what you mean by practical progress isn't what he mean progress. That is his point.
You have to see where he is coming from. He is an academic his ultimate goal is to understand how things work. Training a set of neurons with input data and ending up perhaps with millions activation weights in the end is not helping that goal even if this new machine can play chess, make coffee and drive you to work. I think that is his take on it.
I say we need both. There is no reason to not strive for both. There is not reason to turn all radical and start burning books and claim one approach should completely replace the other. I hope we one day find (or find that we can't find) a good explanatory model for meaning, language, learning, personality, or conscience, but in the meantime I enjoy playing chess with my computer, and I hope pretty soon I'll have my car drive me to work by itself.
Your point is also good. "Shallow engineering" will not give us deep theoretical understanding, and we need also deep theoretical understanding. But in our need for deep theory, we should not accept just any theory. The theory should be testable, and it should eventually yield some practical applications. Theoretical thinking can get quite lost if it's not guided by at least some connection to empirical data.
Early L. Ron Hubbard presented a theory (Dianetics) on the causes of mental illness. It's a theory alright, just not a very good one, and not very testable. We would still benefit from a better theoretical understanding of human mental health and illness, but we should not take just anybody, just because he is a deep thinker and has a theory.
Why do people keeping saying that probabilistic models do not provide understanding? We flew to Mars on a few simple principles applied to massive amounts of data. Comets aren't following sophisticated orbital plans, just F=ma.
Maybe (probably) all the sophistication of natural language is an emergent property of the pile on of lots of little similarly-shaped details like atoms. Sure, high level rules are nice approximations that satisfy our human craving for patterns, but that doesn't mean those patterns are how the brain really works, quite the opposite in fact.
High level models are illustrations, useful for game programmers and artists to efficiently create simulations and plausible imaginary creations. Low level models are how things actually work.
I think I agree with you. Chomsky is astoundingly intelligent and (I think) I understand what he is getting at (i.e. develop a theory of how something happens and use it to find evidence that supports or refutes that theory, rather than using statistical methods to develop a predictive model that doesn't teach you anything about the underlying system), but you can't but feel a sense of "yeah, but how?"
You can hardly blame the AI researchers for sticking with methods that have been very successful (at least practically speaking) after they had their funding cut out from under them in the 90's for the perceived "failures of AI".
I do like the idea of developing a theory (e.g. vision is processed via algorithm in the brain represented by X) and then attempting to find evidence of that. It can help to avoid the reductionist idea that you need to model everything down to the cellular level in order to understand the brain. It's like trying to disassemble code in memory and then read the algorithm rather than examining every 1 and 0 in memory and trying to make heads or tails of them.
I am not well acquainted with Noam Chomsky and his work, but the dichotomy you refer to I'd like to call "design vs evolution".
If the extreme "design" part the spectrum is what Chomsky talks about: "develop a theory of how something happens and use it to find evidence that supports or refutes that theory", then the extreme end of the "evolution" part of the spectrum must be this:
Choose a set of rules, use genetic programming techniques to rearrange those rules. Evidence you require will be acquired and utilised by the algorithm itself. The only crux is the set of rules to choose from and implementation of the technique. IMO, the more complicated the rules and the more they interact with each other, the better.
...only then can you get results like this: "...Five individual logic cells were functionally disconnected from the rest– with no pathways that would allow them to influence the output– yet when the researcher disabled any one of them the chip lost its ability to discriminate the tones."
See also the story of the MIT magic switch that toggled between Magic and More Magic, required to make a certain machine function, with no wires behind the switch.
So much hate in this thread for one of men who laid the foundations of Computer Science. We may disagree with his ideas now, and possibly his theories are out of date, but the magnitude of his contribution to linguistics, and by extension, neurology, psychology, and computer science, puts him in the very highest rank of scientists.
If you take a look at the progress of science, the sciences are kind of a continuum, but they're broken up into fields. The greatest progress is in the sciences that study the simplest systems. So take, say physics -- greatest progress there. But one of the reasons is that the physicists have an advantage that no other branch of sciences has. If something gets too complicated, they hand it to someone else.
If a molecule is too big, you give it to the chemists. The chemists, for them, if the molecule is too big or the system gets too big, you give it to the biologists. And if it gets too big for them, they give it to the psychologists, and finally it ends up in the hands of the literary critic, and so on.
I find your finding of insight in this quote to say more about the
challenge of AI than the quote itself ;-).
First, the quote directly contradicts itself. The first paragraph says that physicists alone hand their hard problems to others. The second describes how the chemists, biologists, and psychologists all also hand their hard problems to others.
Second, it ignores how nearly all fields of human endeavor hand of various kinds of hard problems to other fields. Software engineers hand off many hard problems to the mathematicians, philosophers, artists, hardware engineers, and sometimes even to other software engineers in adjacent levels in the software stack ;-). Similar relationships exist between archaeologists, anthropologists, paleontologists, geologists, climatologists, meteorologists, and so on. It's easy to think of examples.
Geologists can't hand all of their hard problems off to climatologists, but physicists can't hand all of their hard problems off to chemists either.
And yet, somehow, humans are able to perceive meaning from quotes like this. Our ability to read a quote, ignore what it actually says and figure out what we think it was meant to say is truly stunning.
I watched 'The manufacture of consent' recently, I had a hard time stopping listening to him. His wording dug right through my skull, very concise yet simple.
He's the most concise when it comes to explaining complicated ideas. Einstein said something about simple explanations but not too simple. The biggest point Chomsky made to me was that no real explanation or discussion can be had when you are limited to 2 minute soundbites on TV. I won't give his example but he said "You can't say <extremely flammatory counterculture yet objectively true statement> and not take the time to explain why."
"The approach taken by Chomsky and Marr toward understanding how our minds achieve what they do is as different as can be from behaviorism. The emphasis here is on the internal structure of the system that enables it to perform a task, rather than on external association between past behavior of the system and the environment. The goal is to dig into the 'black box' that drives the system and describe its inner workings, much like how a computer scientist would explain how a cleverly designed piece of software works and how it can be executed on a desktop computer."
The article completely misunderstands what Behaviorism actually is. Once again, the misunderstanding comes from confusing the Methodological Behavoirism advocated by John Watson, Edward Thorndike and others, who did indeed try to model people as a "black box", with Skinner's Radical Behavoirism, which denies that there's even a box to be opened; rather, Skinner holds that there is only a locus where a series of environmental processes happen to converge and interact in interesting ways. Some of these processes are much older than the others, and are expressed in genes; others are relatively newer, and are learned. Skinner did not deny that genetics played a role in language acquisition, nor did he ever maintain people are born a blank slate. While Skinner found Chomsky's work regarding universal grammar unconvincing, he maintained that it was not, as Chomsky claimed, directly opposed to his own work -- the two theories were orthogonal, and would succeed or fail independently of each other.
I've always considered Norvig's position pretty much self-evident for anybody who has dealt with real world problems. At the same time, Chomsky is still way more interesting to read, even if he's wrong. And he's wrong mostly for assuming that unsupervised learning has to be the end result when in fact it could be an intermediate step to more refined symbolic theories.
In any case, this kind of antagonism between the two approaches might be useful as it keeps the field more vital, preventing yet another stagnation.
By real world problems you're talking about engineering and not science. Chomsky says Norvig's position is great for engineering, just not science.
The whole disagreement stems from the idea that this will help us with a scientific understanding of language or not. The burden is on Norvig's side to prove it, and it hasn't.
I think the phrase "scientific understanding" is essentially qualitative, meaning that humans can disagree about what we "understand" and how "scientific" it is, and there's no objective way to adjudicate that fight.
How many physicists really believe they fully understand quantum mechanics? The theory is probablistic and strange but produces very accurate predictions.
The true measure of science is matching observations to hypotheses, and in that respect the approach that Norvig defends has demonstrated success. Google's language tools work well much of the time. Watson beat Ken Jennings.
Other branches of science are beginning to make more use of "big data" approaches as well. A friend doing post-doctoral research on evolution spends most of his time behind a laptop coding against big sets of digitized genetic information.
As someone who was familiar with the differences between Chomsky and Norvig (a simplification - maybe better to say Chomsky's view that developing empirical models is not scientifically interesting), I found this really enriched my understanding of Chomsky's view and gave me a lot of subtlety. He certainly doesn't seem antagonistic towards statistic modeling the way I imagined; just sort of bored by it. Be sure to persist to the end - there are some fun word games on the final page.
While I respect (some of Chomsky's work) it is amazing to me that he thinks language is something more than mathematics/statistics. Language is math, it is basically an advanced form of discreet mathematics. We can reproduce virtually anything on a computer, and it is no more "shallow" than artificial light from a lightbulb is "artificial." There is no magic going on, we are biological computers, walking number crunchers. Our brains just happen to operate with chemicals and analog signals, rather than transistors and digital signals. His view on this carries the drawbacks of academia, which has a tendency to over-complicate and over-formalize thinking.
it is amazing to me that he thinks language is something more than mathematics/statistics
He doesn't think that. Look at his linguistics work - google Chomsky Hierarchy. He very much does not think that it's magic.
All he's saying is that some kinds of statistical modeling - while stupidly useful and practical - don't give us a lot of explanatory power.
He's, metaphorically, complaining about folk who are happy using Boyle's Law because "it works", when he'd like more folk figuring out what atoms are all about.
Well aware of his work, but I realize I probably misread the article. :) I think I just expanded statistical modeling to encompass more fields than I meant to (you can, in fact, employ statistical methods to augment "deeper" systems like automatically developing rule systems for propositional logic).
I completely agree. The fact that math = language, and that it's built into our DNA, is astounding. And from there that the Halting Problem is basically tied into our DNA, that is, how we think, how our brains work, our ability to conceive ideas, is just mind-blowing. What Chomsky has done in his academic career in on par with the greatest scientists in our history.
What it strongly suggests is that in the evolution of language, a computational
system developed, and later on it was externalized.
So, the beginning of human language was not communication, but a computational system. Interesting stuff starting at that quote. NB: lots of mistakes in the transcript, sometimes rendering it unintelligible.
The field is now called AGI. It isn't mentioned in this article. Everyone seems to be ignoring the whole field of AGI (artificial general intelligence). Or maybe they truly are ignorant of it.
Anyway, suffice to say, AI and AGI didn't stop progressing, and Chomsky is no longer any sort of expert in those fields.
Even Norvig isn't up to speed on the most advanced approaches to AGI, but at least he enters the same room with people who are aware of the field. For example, he gave a talk at the recent Singularity Summit.
The Fifth Conference on Artificial General Intelligence is going to be in Oxford in December. http://agi-conference.org/2012/
Here is some information for people who are interested in pertinent ideas related to AGI.
>OpenCog is a diverse assemblage of cognitive algorithms, each embodying their own innovations — but what makes the overall architecture powerful is its careful adherence to the principle of cognitive synergy.
>The human brain consists of a host of subsystems carrying out particular tasks — some more specialized, some more general in nature — and connected together in a manner enabling them to (usually) synergetically assist rather than work against each other.
> PLN is a novel conceptual, mathematical and computational approach to uncertain inference. In order to carry out effective reasoning in real-world circumstances, AI software must robustly handle uncertainty. However, previous approaches to uncertain inference do not have the breadth of scope required to provide an integrated treatment of the disparate forms of cognitively critical uncertainty as they manifest themselves within the various forms of pragmatic inference. Going beyond prior probabilistic approaches to uncertain inference, PLN is able to encompass within uncertain logic such ideas as induction, abduction, analogy, fuzziness and speculation, and reasoning about time and causality.
Conceptually, knowledge in OpenCog is stored within large [weighted, labeled] hypergraphs with nodes and links linked together to represent knowledge. This is done on two levels: Information primitives are symbolized in individual or small sets of nodes/links, and patterns of relationships or activity found in [potentially] overlapping and nesting networks of nodes and links. (OCP tutorial log #2).
Large-Scale Model of Mammalian Thalamocortical Systems
> The understanding of the structural and dynamic complexity of mammalian brains is greatly facilitated by computer simulations. We present here a detailed large-scale thalamocortical model based on experimental measures in several mammalian species. The model spans three anatomical scales. (i) It is based on global (white-matter) thalamocortical anatomy obtained by means of diffusion tensor imaging (DTI) of a human brain. (ii) It includes multiple thalamic nuclei and six-layered cortical microcircuitry based on in vitro labeling and three-dimensional reconstruction of single neurons of cat visual cortex. (iii) It has 22 basic types of neurons with appropriate laminar distribution of their branching dendritic trees. The model simulates one million multicompartmental spiking neurons calibrated to reproduce known types of responses recorded in vitro in rats. It has almost half a billion synapses with appropriate receptor kinetics, short-term plasticity, and long-term dendritic spike-timing-dependent synaptic plasticity (dendritic STDP). The model exhibits behavioral regimes of normal brain activity that were not explicitly built-in but emerged spontaneously as the result of interactions among anatomical and dynamic processes. We describe spontaneous activity, sensitivity to changes in individual neurons, emergence of waves and rhythms, and functional connectivity on different scales.
>General intelligence, as described above, demands a number of irreducible features and capabilities. In order to proactively accumulate knowledge from various (and/ or changing) environments, it requires:
>1. Senses to obtain features from ‘the world’ (virtual or actual),
>2. A coherent means for storing knowledge obtained this way, and
>3. Adaptive output/ actuation mechanisms (both static and dynamic).
>Such knowledge also needs to be automatically adjusted and updated on an ongoing basis; new knowledge must be appropriately related to existing data. Furthermore, perceived entities/ patterns must be stored in a way that facilitates concept formation and generalization. An effective way to represent complex feature relationships is through vector encoding (Churchland 1995).
>Any practical applications of AGI (and certainly any real-time uses) must inherently be able to process temporal data as patterns in time – not just as static patterns with a time dimension. Furthermore, AGIs must cope with data from different sense probes (e.g., visual, auditory, and data), and deal with such attributes as: noisy, scalar, unreliable, incomplete, multi-dimensional (both space/ time dimensional, and having a large number of simultaneous features), etc. Fuzzy pattern matching helps deal with pattern variability and noise.
>Another essential requirement of general intelligence is to cope with an overabundance of data. Reality presents massively more features and detail than is (contextually) relevant, or that can be usefully processed. This is why the system needs to have some control over what input data is selected for analysis and learning – both in terms of which data, and also the degree of detail. Senses (‘probes’) are needed not only for selection and focus, but also in order to ground concepts – to give them (reality-based) meaning.
> A typical HTM network is a tree-shaped hierarchy of levels that are composed of smaller elements called nodes or columns. A single level in the hierarchy is also called a region. Higher hierarchy levels often have fewer nodes and therefore less spacial resolvability. Higher hierarchy levels can reuse patterns learned at the lower levels by combining them to memorize more complex patterns.
> Each HTM node has the same basic functionality. In learning and inference modes; sensory data comes into the bottom level nodes. In generation mode; the bottom level nodes output the generated pattern of a given category. The top level usually has a single node that stores the most general categories (concepts) which determine, or are determined by, smaller concepts in the lower levels which are more restricted in time and space. When in inference mode; a node in each level interprets information coming in from its child nodes in the lower level as probabilities of the categories it has in memory.
>Each HTM region learns by identifying and memorizing spatial patterns - combinations of input bits that often occur at the same time. It then identifies temporal sequences of spatial patterns that are likely to occur one after another.
I've heard of OpenCog before and it, along with the Singularity crowd gives me a weird amateur, bullshitty, vague, generalist feeling that Noam Chomsky does. Basically - where's the beef? What has been done by either crowd apart from taking credit from those who do things in the actual industry/real world?
My fundamental aversion to both OpenCog and the entire Singularity crowd is a) their statements are so general as to the point of being useless and b) they don't do anything. Google makes search simple - go to google.com and find out. Google makes cars drive themselves - ask Nevada/California and if you're a member of the press - request a test drive today. IBM's Watson definitively beat world champions in front of everyone and before that they did it with Blue Gene.
Everyone in the other communities fall under this category: All talk - no walk.
The entirety of what I've gotten out of both groups is essentially little more than what religious people get out of going to a sermon at a church. The future will be grand, lots of bullshitty buzz words, lots of hand waving with huge claims - no hard calculations, no hard examples of what they've actually achieved.
I'll stick with Norvig/Google and his/their demonstrated achievements and knowledge over the talk, hype and vaporware projects of groups that have yet to show any hard progress apart from a bunch of lectures to rich people with a lot of vague words.
The SENS movement gives me the exact same feeling.
Hi, this is Ben Goertzel, the chief founder of the OpenCog AGI-focused software project and of the AGI conference series.
Comparing Google Search and IBM Watson to OpenCog and other early-stage research efforts is silly. Google Search and IBM Watson have taken fairly mature technologies, pioneered by others over decades of research, and productized them fantastically. OpenCog is a research project and is aimed at breaking fundamentally new research ground, not at productizing and scaling-up technologies already basically described in the academic literature.
Lecturing is a very small percentage of what those of us involved with OpenCog do. We are building complex software and developing associated theory. Indeed parts of our approach are speculative, and founded in intuition alongside math and empirics. That's how early-stage research often goes.
Of course you can trash all early-stage research as not having results yet. And the majority of early-stage research will fail, probably making you tend to feel vindicated and high and mighty in your skepticism ;p .... But then, a certain percentage of early-stage research will succeed, because of researchers having the guts to follow their intuitions in spite of the ceaseless tedious sniping of folks like you ;p ...
Chomsky's expertise is in linguistics and political analysis. Stephen Pinker's The Language Instinct is a good, readable introduction to some of Chomsky's work (and the wider field to which he is pivotal.) Chomsky's Manufacturing Consent is probably his classic work of political analysis.
You know in the soft sciences everyone is a quack because fundamentally they don't practice - wait for it - science. Science stops false connections by correctly attributing cause to its respective effect. Social sciences do not. For all intents and purposes, the vast majority of social science is either unreproducible, vague, mixing correlation with causation, uses dependent variables, poorly reasoned, statistical quirks, pushed by agendas or fundamentally flawed.
> are effective and powerful ideological institutions that carry out a system-supportive propaganda function by reliance on market forces, internalized assumptions, and self-censorship, and without overt coercion
That's pretty self-evident to the point of being, well, pointless - admen of the 60s made their bread using this, and the PR pioneers of the 30s were already experts. But please let's all listen to what he has to say next. Let me guess: killing people is bad, and not killing people is good. If you call that amazing thinking, I'd hate to see the idiotic version.
Even better:
> Geoffrey Sampson maintains that universal grammar theories are not falsifiable and are therefore pseudoscientific theory. He argues that the grammatical "rules" linguists posit are simply post-hoc observations about existing languages, rather than predictions about what is possible in a language. Similarly, Jeffrey Elman argues that the unlearnability of languages assumed by Universal Grammar is based on a too-strict, "worst-case" model of grammar, that is not in keeping with any actual grammar. In keeping with these points, James Hurford argues that the postulate of a language acquisition device (LAD) essentially amounts to the trivial claim that languages are learnt by humans, and thus, that the LAD is less a theory than an explanandum looking for theories.
Sampson, Roediger, Elman and Hurford are hardly alone in suggesting that several of the basic assumptions of Universal Grammar are unfounded. Indeed, a growing number of language acquisition researchers argue that the very idea of a strict rule-based grammar in any language flies in the face of what is known about how languages are spoken and how languages evolve over time. For instance, Morten Christiansen and Nick Chater have argued that the relatively fast-changing nature of language would prevent the slower-changing genetic structures from ever catching up, undermining the possibility of a genetically hard-wired universal grammar. In addition, it has been suggested that people learn about probabilistic patterns of word distributions in their language, rather than hard and fast rules (see the distributional hypothesis). It has also been proposed that the poverty of the stimulus problem can be largely avoided, if we assume that children employ similarity-based generalization strategies in language learning, generalizing about the usage of new words from similar words that they already know how to use.
Another way of defusing the poverty of the stimulus argument is to assume that if language learners notice the absence of classes of expressions in the input, they will hypothesize a restriction (a solution closely related to Bayesian reasoning). In a similar vein, language acquisition researcher Michael Ramscar has suggested that when children erroneously expect an ungrammatical form that then never occurs, the repeated failure of expectation serves as a form of implicit negative feedback that allows them to correct their errors over time. This implies that word learning is a probabilistic, error-driven process, rather than a process of fast mapping, as many nativists assume.
Finally, in the domain of field research, the Pirahã language is claimed to be a counterexample to the basic tenets of Universal Grammar. This research has been primarily led by Daniel Everett, a former Christian missionary. Among other things, this language is alleged to lack all evidence for recursion, including embedded clauses, as well as quantifiers and color terms. Some other linguists have argued, however, that some of these properties have been misanalyzed, and that others are actually expected under current theories of Universal Grammar.
> You know in the soft sciences everyone is a quack because fundamentally they don't practice - wait for it - science.
I wonder if you know you're being ironic here. Plenty of us have never even read Chomsky's political works and have been exposed to him solely through mentions in the CS literature, like the Dragon book, or more in-depth stuff on his theory of context-free grammars. There is a startling amount of proof that he not only writes about politics but, at one time or another, actually worked for a living and helped our field produce useful stuff.
Angry much? Have you actually read Chomsky, or are you just taking snippets from Wikipedia pages and saying told-you-so? Perhaps you should try reading Manufacturing Consent, it's a very careful and thorough work of analysis and not nearly as bleedingly obvious as you try and portray it.
One point: Sampson's criticisms about linguists producing post-hoc descriptions could just as easily have been (and were, I believe) applied to Newton's theories. Good science includes mapping and describing phenomena.
Another point: negative feedback on errors is not enough to account for the explosive speed of language acquisition in children. Not to say that this sort of feedback doesn't occur, or isn't useful, but it only really is used when children learn exceptions (I.e. irregular verb forms in English) or vocabulary (and even much of vocabulary is rule-generated.) Basic language rules are encoded, and children's brains only require minimal stimulus to record the specific settings of the rules for the language they are learning.
Everett (2005) has claimed that the grammar of Pirahã is exceptional in displaying 'inexplicable gaps', that these gaps follow from a cultural principle restricting communication to 'immediate experience', and that this principle has 'severe' consequences for work on universal grammar. We argue against each of these claims. Relying on the available documentation and descriptions of the language, especially the rich material in Everett 1986, 1987b, we argue that many of the exceptional grammatical 'gaps' supposedly characteristic of Pirahã are misanalyzed by Everett (2005) and are neither gaps nor exceptional among the world's languages. We find no evidence, for example, that Pirahã lacks embedded clauses, and in fact find strong syntactic and semantic evidence in favor of their existence in Pirahã Likewise, we find no evidence that Pirahã lacks quantifiers, as claimed by Everett (2005). Furthermore, most of the actual properties of the Pirahã constructions discussed by Everett (for example, the ban on prenominal possessor recursion and the behavior of WH-constructions) are familiar from languages whose speakers lack the cultural restrictions attributed to the Pirahã. Finally, following mostly Gonçalves (1993, 2000, 2001), we also question some of the empirical claims about Pirahã culture advanced by Everett in primary support of the 'immediate experience' restriction. We conclude that there is no evidence from Pirahã for the particular causal relation between culture and grammatical structure suggested by Everett. -- Pirahã Exceptionality: A Reassessment, http://dash.harvard.edu/handle/1/3597237
> social science is either unreproducible, vague, mixing correlation with causation, uses dependent variables, poorly reasoned, statistical quirks, pushed by agendas or fundamentally flawed...
Dr. Freud would have had a good deal to say about your apparent fixation with bovine feces...
Seriously though, your comments are playing fast and lose with a range of fields that you’re conflating and dismissing. Not all social sciences are “soft” and many have empirically-based real world applications that shape your (and everyone’s really) everyday lives.
So I figured it out. Basically, they take the idea of AGI seriously, and actually consider and talk about the repercussions, and therefore you dismiss them and their ideas as fringe and not worth investigating. I know that, because if you had investigated at all, you would see that all of those projects had really interesting results and these people are not being vague and hand-waving.
Not all of those projects I listed identify themselves as AGI. However, they should go in the same group.
And anyway, all of those projects have demonstrated progress. If you looked into them at all then you would see that. Ben Goertzel is using some aspects of his AGI research in mainstream (narrow) AI projects. OpenCog has released a number of solid demonstrations of current features. And Goertzel isn't hand-waving or bullshitting in his numerous books and scientific papers, for example Probabilistic Logic Networks: A Comprehensive Framework for Uncertain Inference (336 pages).
Voss is using his system at Adaptive AI as a commercial enterprise.
Qualcomm is funding Brain Corporation (Izhikevich et al) so obviously they are taking it seriously. A bakery in Tokyo has tested Brain Corporation's machine vision technology to power a semi-automated cashier system
I'm sympathetic to both Chomsky and Open Cog's aims.
I know Chomsky is a serious scientist with considerable accomplishment.
I have seen totally loony stuff in videos of AGI conferences (Tachyons and stuff). Open Cog may be better than that. But it hasn't proved that it is better than that.
The 1970-80's AI involved the Chomskyan paradigm of "draw up a naive design of the mind and/or brain and implement it". That failed so badly that you need a really good argument why you can do things differently - at least to move into mainstream science. That is, Ben Goertzel seems nice, smart and enthusiastic but I can't see him bringing anything new to the "table". Jeff Hawkins had interesting ideas with his temporal paradigm but it seemed like the model he chose to instantiate wasn't all that different from that used by the statistical-brute-force crowd. And Numenta has had really few announcements for a six year old enterprise.
And the companies paying for AI to be added to their systems. That happened from the start but it wasn't ever enough. What's different here from the stuff from twenty years ago?
AGI is mainstream science, these days. The keynote of the 2012 AAAI conference (the major mainstream AI research conference each year), by the President of AAAI, was largely about how the time has come for the AI field to refocus on human-level AI. He didn't use the term "AGI" but that was the crux of it.
The "AI winter" is over. Maybe another will come, but I doubt it.
What's different from 20 years ago? Hardware is way better. The Internet is way richer in data, and faster. Software libraries are way better. Our understanding of cognitive and neural science is way stronger. These factors conspire to make now a much better time to approach the AGI problem.
As for my own AGI research lacking anything new, IMO you think this because you are looking for the wrong sort of new thing. You're looking for some funky new algorithm or knowledge structure or something like that. But what's most novel in OpenCog is the mode of organization and interaction of the components, and the emergent structures associated with them. I realize it's a stretch for most folks to realize that the novel ingredients needed to make AGI lie in the domain of systemic organizational principles and emergent networks rather than novel algorithms, data structures or circuits -- but so it goes. It wouldn't be the first time that the mass of people were looking for the wrong kind of innovation, hmm?
Regarding tachyons in videos of AGI conferences, could you provide a reference? AGI conference talks are all based on refereed papers published by major scientific publishers. Some papers are stronger than others, but there's no quackery there.... (There have been "Future of AGI" workshops associated with the AGI conferences, which have had some freer-ranging speculative discussions in them; could you be referring to a comment an audience participant made in a discussion there?)
I wish you luck (well sort-of - with great power would come great responsibility and all-that).
I wasn't making up the tachyon guy. If I have time, I'll dig the video (it'd be a little hard since the hplus website reorganized). He was presenter and not an audience member, had at least a paper at one of these conferences. I can easily believe the AGI conferences have gotten better.
I would stick to the point that AGI needs to make clear how it will overcome previous problems - clear to mainstream science is useful for funding but clear to yourselves so you have ways to proceed is most important.
I don't necessarily agree exactly with Herb Dreyfus' critique but I think that in the minimum a counter-critique to his critique is needed to clarify how an AGI could work.
I mean, I have worked in computer vision (not that much even). There's no shortage of algorithms that solve problem X but nothing in particular weds them together. Confronted with a new vision problem Y, you are forced to choose one of these thousand algorithms and modify it manually. You get no benefit from the other 999.
As far as open source methodologies solving the AGI question, I've followed multiple open source projects. While certain things might indeed work well developed using the "bazaar" style, I haven't seen something as exacting a computer language come out of such a process - languages tend to require an individual designer working rather exactly - with helpers certainly but in many, many situations almost alone (look at Ruby, Perl, Python, etc). I would claim AGI would at least exactly as a computer language, possibly more-so. Further, just consider how the "software crisis", the limitations involved in producing large software with large numbers of people, expresses the absence of AGI. Essentially, to create AGI, you would need to solve something like a boot strapping problem so that you cause the intentions of the fifty or five thousand people working together to add up to more than what fifty or five thousand intentions normally add up to in normal software engineer. I suppose I believe some progress on a very basic level is needed to address.
To me, the AGI conference seems to have a much higher ratio of "speculative ideas"/"technical results" talks. Also to me, this pretty much justifies the "all talk - no walk" assessment.
This is Ben Goertzel, chief founder of the AGI conference series.
You are correct that the AGI conferences have a higher ratio of "speculative ideas"/"technical results" to ICML. This is intentional and I belief appropriate -- because AGI is at an earlier stage of development than machine learning, and because it's qualitatively different in character than machine learning.
Machine learning (in the sense that the term is now typically used, i.e. supervised classification, clustering, data minign, etc.) can be approached mainly via a narrowly disciplinary approach. Some cross-disciplinary ideas have proved valuable, e.g. GAs and neural nets, but the cross-disciplinary ideas there have quickly been "computer science ized"...
OTOH, I think AGI is inherently more complex and multifarious than ML as currently conceived, and hence requires more "out of the box" and freely multi-disciplinary thinking.
I think that in 10-15 years, when the AGI field is much more mature, the conferences will seem a bit more like ML conferences in terms of the percentage of papers reporting strong technical results. BUT, they will never seem as narrowly disciplinary as ML conferences, because AGI is a different sort of pursuit...
Thanks for the kind reply. I said ICML, but NIPS would have been a better point of reference -- since it was originally conceived as a cross-disciplinary enterprise. The NIPS TOC looks like this:
which indicates it's possible to have a selection of papers both technically sharp and interdisciplinary. We should all be so lucky to attract such a set of papers.
>1. Senses to obtain features from ‘the world’ (virtual or actual),
>2. A coherent means for storing knowledge obtained this way, and
>3. Adaptive output/ actuation mechanisms (both static and dynamic).
What does that even mean?
If it's so easy to sum it up in a chapter of a book, why don't they build it and allow others to examine it, submit it for review, write papers and submit to ACM, build fantastic machines based on it? All I want is a bit of proof.
You and I have similar interests: we would like for AGI to happen. Even though I'm not sure what AGI means. It's a sort of dream right now for me, but perhaps more of a reality for you?
Most of that large post is neat, but it's not going to convince me if I've never heard of AGI and if I currently know something about AI.
The most you can do is to go do what I mentioned earlier: go build systems, investigate, write papers and go to conferences and I don't mean conferences where it's just AGI people.
The parent post spent a lot of time to try and inform us, including multiple links. To post a dismissive comment in response, with no explanation whatsoever, is well beyond rude.
I think people are suspicious or dismissive of AGI, HTM, etc because...well...there doesn't seem to be anything really to it. People who know AI I've talked to in the know about HTM don't know anything about it or have mildly negative things to say. Ditto for AGI. It's a contentious topic and people just get defensive.
Many of those links in grandparent post were from or about opencog. I can make long blog posts about opencog that refer to opencog as proof, too...but it wouldn't mean anything. Religious people do that sort of thing all the time.
The proof would be in the pudding, right? So if AGI at least has some hypothesis, then it should be able to produce some results, right?
I very much want AGI to happen. You want AGI to happen. Our interests are in agreement. However, there isn't really much proof about any current hypothesis, as far as I can tell, that can produce any real system. It's a dream so far.
I don't mean it has to be a really solid understanding of conscience but it's so undefined, unknown area right now we can't even approach it.
Instead of making long blog posts and replies to comments, and then get offended when people don't buy into it, the most people can do right now is to go investigate, hypothesize and try to build something.
I'm totally fine with empiricism! Your post here is helpful. It's just rude and not helpful to respond to a post like that with nothing more than "no".
AI has forever been filled with buzzwords and trendy absurdity. Most of this lies on the soft computing end of the field, where self-styled visionaries hold forth with holistic mumbo-jumbo while valuable work is done by reputable researchers elsewhere.
AGI is one of those little goofy microtrends: so far as I can tell it's essentially a rebranding of Strong AI by soft computing reactionaries responding to AI getting dominated by domain specificity (otherwise known as "being successful"). To claim, as the grandparent appears to be doing, that AI is now properly called Artificial General Intelligence, is crackpottery at its finest.
Wake me up when AGI even appears on the first results page of a Google search for "AGI".
I thoroughly agree - see my comment above. The Singularity and SENS peeps give me the exact same feeling - as does Noam Chomsky. All talk - no walk.
I think it's fundamentally the difference between soft bullshit and hard calculations. Everyone can talk about AI, or linguistics, or statistics (or any complex field) in very general, undefined and bullshitty terms.
But what we need, and what the machine learning guys are bringing is hard calculation - 1 + 1 = 2 or input data, get features and make decisions well above human abilities.
My question to all the fringe folks: Where's the beef? What have they done? Where are the automatic cars built on Chomsky's theories? Where are the talking robots from the AGI? What methods have the SENS people got? Are the singularity folks just leeches off gullible rich people - selling them a future and taking their cash in the process without providing any real value?
I think you mean talker. Walker would be things that actually did stuff - you know like search, translation, locomotion or prediction. That's a lot of words and a lot of books in a bunch of the soft sciences (linguistics and politics) which are highly susceptible to class A bullshit. All of that doesn't mean anything - no different from Richard Dawkins who irritates me in a similar fashion - I ask once again - where's the beef?
> Every time I fire a linguist, the performance of the speech recognizer goes up.
I still don't see Chomsky robots walking around, Chomsky translation translating my text to French or Chomsky AI driving cars. Nope - all Google/IBM/Microsoft/DARPA/Boston Dynamics/etc. AKA Hard science-engineers utilising statistics, not soft science blowhards.
As somebody who used to do a bit of NLP work in the dim and distant past I can testify that Chomsky's work on context free grammars, etc. (certainly used to anyway - not been poking at it for about fifteen years now) got applied a fuck of a lot.
The model probably doesn't have a great deal of relation to what happens inside folks heads - but it was stupidly useful for making computers do stuff though. Might be better techniques now - I don't know. But saying that it wasn't applicable practical work is basically ignoring the NLP stuff that was happening in the 80's and 90's.
(let alone the more obvious useful stuff for us geeky folk - the formal grammar stuff we use and think about for compilers. Chomsky Hierarchy, etc.)
And, I'm not enough of a historian of science to know, but it seems to me that the basic results Chomsky proved on Regular Languages and CFGs paved the way for the Hidden Markov Models (HMMs) that have been so effective in language understanding. Basically, the HMMs are the natural probabilistic extension of Regular Languages.
I'm not sure if Viterbi and the other developers of the basic HMM toolkit were directly influenced by Chomsky, or if state machines were just in the air. Certainly Chomsky's basic work in the late 1950s predated Viterbi's work in the 1960s.
If your definition of real science is something that has good prediction/application potential, that's a rather unusual definition of science. Are mathematicians, theoretical physicists all talk as well ?
Mathematicians aren't - you can prove their correctness over abstract planes and use them to for example run a hedge fund or a software company into trillions of revenue when making testable predictions in macro reality.
Theoretical physicists that use mathematics to make testable predictions are too. You can use them to also make electric engines, statistical extractors and accurate physical simulations that are corroborated with empirical evidence.
If the prediction is not testable, is unfalsifiable, is unreproducible, is not independent and is not supported by overwhelming evidence - it is bullshit - no ifs, buts or ands.
Better let Andrew Wiles know his 8 years spent on Fermat's Last Theorem was just a bullshit waste of time, because he couldn't use it to run a hedge fund, or software company...
It can only be 'proved' in the 'contrived' world of pure mathematics... what bullshit!
No, you provided a qualification of why they weren't... I gave you an example of a mathematician who broke your qualification, and logically should fall into your definition of a 'quack'.
The idea being, that you'd have to back pedal, and change your qualification. Which I could then use to apply to other fields, that you deem as 'quackery', and thus undo the foundation of your argument.
Instead, you just denied the reality of what you said... I didn't count on that. Well done.
I'm talking about the efforts to really reproduce human-like intelligence. I'm not saying that AI isn't a field, I am saying if you want human-like intelligence, the AGI people are the most far along, or at least the most serious about it.
Did you really look into AGI, for example the past conferences or those projects, and conclude that it is just invaluable holistic mumbo-jumbo?
That is so unfair and inaccurate, I can't see how you can possibly be evaluating things rationally if you really came to that conclusion.
What they are talking about is the idea of human-like general intelligence, and AI mostly doesn't try to do that anymore, although there are some people who are seriously trying and calling it AI and even a few who are sort of aware of what the AGI people are doing or have projects that are as sophisticated. But most of the researchers who are farthest along and most serious about it have been calling it AGI.
Anyway, you have to at least include AGI if you are serious about human-like AIs.
I think part of the problem is that we don't even know what AGI really is. How do you define conscience in a rigourous way. Doesn't mean it can't be done without such work but it just seems soooo undefined right now that people are suspicious if someone comes along and claims to have a partial solution.
Do we even know what we're looking for? When we do know or have an idea, I am willing to imagine that we'd have more AI research in that area and the AGI would be taken more seriously.
Right; as far as I know there is no serious research in AGI. AI is applied statistics, and anyone who doesn't have that point of view isn't producing results right now.
The point of AI is not conscious machines but machines that can do useful things. Consciousness is not useful except to the extent that it helps a machine do useful things.
You mean 'consciousness' which is actually mainly a philosophical distraction in common usage.. anyway there are a lot of good starting definitions for what AGI is, including a bit in my comment and also in the descriptions of the projects that I referred to.
> if success is defined as getting a fair approximation to a mass of chaotic unanalyzed data, then it's way better to do it this way than to do it the way the physicists do, you know, no thought experiments about frictionless planes and so on and so forth
This is factually incorrect! For physicists, those thought experiments are absolutely essential, and we would need many, many more orders of magnitude of statistical processing of video signals in order to get close to the real-world-useful physical predictions that we arrive at through thought experiments, equations, and so on. The contrast that Chomsky is missing is that for language, the statistical processing is amazingly successful, and the thought experiment style of investigation, while productive, has not been shown useful in real world tasks like translation.
For those arguing against Chomsky, none of the above means that we should abandon a theory-driven or symbolic approach to language.
If Chomsky and his opponents would just recognise that they have different goals (not just different ways of approaching the same goal), we wouldn't have to have this same argument every few months.
Those who argue that Chomsky singularly "pioneered" or "revolutionized" the study of formal language should thoroughly read the book "Linguistics and the Formal Sciences" by Marcus Tomalin. It is a great historical account of the development of this particular strain of formal linguistic study. Knowing more about the intellectual environment and his predecessors and contemporaries helps to erode the mythology of Chomsky as the sole revolutionary catalyst in the development of formal language theory. In fact the major principles of his classic theory of syntax can be thought of as fairly incremental developments from previous work. Many of the specific claims he is well known for had been made by others before.
As for his contributions to cognitive science, I think one side of the field simply feels that he is clinging to some outmoded notions of what Bayesian modeling can achieve in terms of explanatory power.
As a counterpoint, EVERYONE should read Andy Clark's beautifully written BBS paper "Whatever Next? Predictive Brains, Situated Agents, and the Future of Cognitive Science."
Like many things, Chomsky is both right and wrong. He's right in that if we studied the structure and functioning of the brain we could built more accurate AI. He's terribly wrong because the structures of the brain are fuzzy and give rise to probabilistic and 'statistical' functioning that is itself based on training (much like the models he derides).
In these there is obviously some contest going on between fuzzy classifiers, as there is in conceptual association games, misinterpretations of song lyrics between people and errors like the Freudian slip. There are at least large parts of our brains that seem to operate in this manner.
That said our use of logic and reason certainly says there is a part of our brain that works in a non-fuzzy way, or at least can be trained to work like. However, while there are people who understand the odds and are just there for a good time, it's instructive to go to a Casino and see how many people believe they can win and believe in lucky charms.
This topic is a minefield of semantic games with hidden assumptions and people arguing across each other though.
Like maybe when you add 7 and 6, let's say, one algorithm is to say "I'll see how much it takes to get to 10" -- it takes 3, and now I've got 4 left, so I gotta go from 10 and add 4, I get 14. That's an algorithm for adding -- it's actually one I was taught in kindergarten. That's one way to add.
Norvig still wins, IMO. Statistics may not explain everything, but it does an amazing job of separating the explainable from the unexplainable. I still see this as the requisite Minimum Viable Product for Artificial Intelligence. If we sat around waiting to perfectly understand neurocognitive linguistic processes before we tried to replicate it, we wouldn't have a functioning Google Translate for another 200 years.
Chomsky's famous review of Verbal Behavior was from 1959. He REISSUED it in 1967, as the document the article links to plainly and clearly states. Geez, this is a pretty stupid mistake for Katz and The Atlantic to make. They should fix it.
Chomsky's argument here is basically the same argument he used to reinvent linguistics in the 50's and 60's, so understanding what's really being said here depends on a bit of context. Since neither previous discussions nor this article has quite nailed it, the discussion is indeed worth having (sorry, ColinWright).
The crux of the debate is this. When Chomsky was a grad student at UPenn most linguists thought that language was learned by a complicated mimicry -- that we learn language by imitating behavior, similar to how birds learn to call. The "hard problems" of linguistics were completely solved and linguistics had become a primarily classificatory science, with linguists simply cataloging words into parts of speech. This line of thought was interchangeably known as behavioralism or empiricism.
One of Chomsky's transformative insights was that most sentences that are reasonably long are completely unique in human history, and will also never again be uttered by another person, ever. (For example, try Googling that exact sentence.) One consequence of this is that people realized that the mimicry argument could not really account for the robust well-formed structure of sentences and give us an infinite set of them. What we need to generate an infinite set of well-structured sentences is a grammar. Thus the universal grammar was born, and while Chomsky did not convince all the prominent linguists of the time, he did convince all their grad students, and the field of linguistics has seldom looked back.
Where this begins to intersect with AI is where Chomsky is usually criticized for not having been quite revolutionary enough. His outline of linguistics basically split the field into semantics (which studies the meaning of language) and syntax (which studies the structure of language). He argued that everything about language that must be interpreted (like meaning) must go on the semantics side of the line, and everything else should go on the other side of the line. He does not believe syntax to be interpretive at all, and tends to react violently when anyone tries to push empiricism into the syntax dialogue. Even a lot of his students don't buy that syntax is completely not-interpretive, and so someone like Lakoff would claim that if he was revolutionary, he was not quite revolutionary enough.
Here's what this means for AI. Chomsky sees the statistical approach to learning as a type of empiricism. You take a corpus, learn some stuff statistically, and then perform well on a task. To someone like Chomsky this probably looks like Skinner's old model, but instead of words like "mimicry", we use words like "statistical inference." Remember that empiricism and syntax should be strictly separate, and it becomes easy to see why something like this would make him cranky.
Of course, computer scientists like Norvig and linguists like Lakoff disagree. Their collective argument is that some aspects of syntax are indeed interpretive, and that (in the case of Norvig) they can be learned statistically (using, e.g., PCFGs). For example consider the sentence "John called Mary a Republican and then SHE insulted HIM". This really only makes sense if you presume that the participants think that "Republican" is an insult, but how do you know that? The answer seems to be through some sort of past experience, which Norvig would say can be and should be modeled statistically. And that in short is the debate and why it exists.
The other complaints with what Chomsky said here are that it's scientifically incorrect. Chomsky claimed, for example, that statistical models are basically not real science, and not used in the history of science, which is obviously wrong. Norvig pointed out, for example, that in physics sometimes our only choice is to infer something statistically, as in the case of the gravitational constant or the Higgs boson. But given Chomsky's history, it's fair to assume he meant this in the context of behavioral science, in which case his point is mostly true (modulo the "old" model of linguistics, which he hates). But of course it's important to remember that his model of linguistics was also without precedent in the history of science, so that alone is not really justification for his position.
2. > One consequence of this is that people realized that the mimicry argument could not really account for the robust well-formed structure of sentences and give us an infinite set of them.
What about gradient well-formedness? Does anyone even in generative linguistics believe in programming language-hard grammaticality? What about acceptability?
3. > What we need to generate an infinite set of well-structured sentences is a grammar. Thus the universal grammar was born ... the field of linguistics has seldom looked back.
...which grammarless statistical models have no problem doing (or if you want to reject them for making mistakes, do you reject people?)
Which linguistic univerals, if any, have survived the test of time? What are some testable predictions it has made?
I hope I'm not missing the point, and apologies if this has already been said (but not in these exact words, ha ha...)
Noam Chomsky irritates me here's why - he's vague, so astonishingly vague that he can hide his uselessness within it.
> Chomsky derided researchers in machine learning who use purely statistical methods to produce behavior that mimics something in the world, but who don’t try to understand the meaning of that behavior. Chomsky compared such researchers to scientists who might study the dance made by a bee returning to the hive, and who could produce a statistically based simulation of such a dance without attempting to understand why the bee behaved that way.
> But the number of parameters in his theory continued to multiply, never quite catching up to the number of exceptions, until it was no longer clear that Chomsky’s theories were elegant anymore. In fact, one could argue that the state of Chomskyan linguistics is like the state of astronomy circa Copernicus: it wasn’t that the geocentric model didn’t work, but the theory required so many additional orbits-within-orbits that people were finally willing to accept a different way of doing things. AI endeavored for a long time to work with elegant logical representations of language, and it just proved impossible to enumerate all the rules, or pretend that humans consistently followed them. Norvig points out that basically all successful language-related AI programs now use statistical reasoning
> But his fundamental stance, which he calls the “algorithmic modeling culture,” is to believe that “nature’s black box cannot necessarily be described by a simple model.” He likens Chomsky’s quest for a more beautiful model to Platonic mysticism, and he compares Chomsky to Bill O’Reilly in his lack of satisfaction with answers that work. “Tide goes in, tide goes out. Never a miscommunication. You can’t explain that,” O’Reilly once said, apparently unsatisfied with physics as an explanation for anything.
In which way is he vague? He basically reinvented a Turing Machine with human language and brought linguistics around to the idea that, yes, language isn't something that's vaguely "out there" tabula-rasa-style, it's built into our genetics at a very fundamental level. Fundamental enough that he tied linguistics DIRECTLY to math and from there to programming. The Chomsky Heirarchy is no joke.
Your link relating to statistical models is only a tiny, tiny part of Chomsky's fundamental arguments and even then is debatable.
> it's built into our genetics at a very fundamental level.
Chomsky's evidence for this is.... iffy at best. Yes, I think we are predisposed to HAVE language, but I don't think we can learn as much as he proposes about the structure of modern language from the human genome.
2) I don't think the problem with learning about language from the genome is specific to language. There are just so many layers of molecular interactions between the genetic code and activity at our level of reality that trying to link the two is incredibly difficult, and we are not even close to having the computing power or theoretical models necessary to link them up. But that doesn't mean that language and genes aren't linked.
Being cited doesn't mean he has something specific to say about those fields. He's an influential scientist so people may cite him when they found something vaguely related to his theories, to make their findings seem more important.
I didn't imply a yearning for anything, I was just saying a citation can mean different things in different circumstances. I think you've fallen prey to the polarization that Chomsky is putting forth: either you are dealing with huge amounts of data and don't care about theory, or you're a rationalist whose theories don't need any empirical support. The reality of successful science is on neither of these extremes, of course.
And by the way I do think that judging human performance by simple metrics is problematic, but not because it's statistics or not 'high-level', simply because it doesn't take enough information into account; it's a shortcut to the actual concept of quality, which is dangerous when metrics are used in decision-making. Automated metrics give an air of objectivity which an expert opinion doesn't have, even though the latter may well be much more informed.