Hacker News new | past | comments | ask | show | jobs | submit login
Untapped opportunities in AI (oreilly.com)
164 points by dennybritz on June 4, 2014 | hide | past | favorite | 40 comments



Cool article. I really like repetition that model complexity is not a pancea. Seems like the industrial AI/ML movement as a whole has gone down a road where practitioners will, by default, throw the most powerful model they know at a problem and see how it pans out. Works well on benchmarks(if you regularize/validate carefully) but isn't a very sustainable way to engineer a system.

Separately, I do find it curious that his list of "pretty standard machine-learning methods" included Logistic Regression, K-means and....deep neural nets? Sure they're white hot in terms of popularity and the experts have done astounding things, but unless I've missed some major improvements in their off-the-shelf usability they strike me as out of place in this list.


Deep convolutional neural nets are the staple method for companies doing computer vision at scale. Google, Facebook, Yahoo, and Baidu make extensive use of them. In 2014 they definitely deserve their place on the shortlist of "pretty standard machine-learning methods". They are the current state of the art for visual recognition in particular (see the results on ImageNet from the past few years).

They have also been commoditized by libraries such as Theano (Python) and Torch (Lua). Google and Facebook use their own tools based on Torch.

My own version of the shortlist would be: Logistic regression, KNN, random forests, SVM, and deep convnets.


This is why I'm starting skymind[1] to address. Many of these companies are using cutting edge techniques without a focus on usability (since they know this stuff). The hope here is to create an off the shelf package people can just use with only knowledge of fairly conventional machine learning techniques while also avoiding the problems of having to program in java (which isn't realistic in data science) or even lua.

Many people could benefit from a neural nets ability to create good features for itself, but it's hard to use in a practical setting. That being said over the next few years I believe this can change.

[1]: http://wired.com/2014/06/skymind-deep-learning/


I like the idea a lot. Just trying to understand this better: it seems like your company is entirely about selling consulting services, yet your stated goal is to "give people machine learning without them having to hire a data scientist". What's your path to that goal?


In this case, being that on staff data scientist for them.

Many companies only need a one off model to set themselves up for some sort of baseline data product. This can also be training for them on using machine learning for their problem solving.

The goal isn't necessarily to totally supplant data scientists (love press sensationalism), but to help enable companies to build easy to use models in their apps.

This can also map to saving data scientists time by not necessarily "skipping" the feature extraction part (which they can with deep learning and still do reasonably well) but allowing them to just use a fairly good machine learning model out of the box to use as a baseline.

The great thing about machine learning is the ability to mix different techniques. Google's voice detection is a great example of this. They use neural nets for feature extraction and hidden markov models for final translation of speech to text.

I think deep learning (if wrapped in the right apps or sdks) with the auto feature extraction alongside then specifying say: a "task". This task could be predicting a value, labeling things, or even compression of data[1] would allow companies to not focus on machine learning, but on straight problem solving.

The idea would be once they are familiar enough with how to feed data in to the system, and specifying a "task", they can do a lot of machine learning by themselves without having to think too much about the problem they are solving (what features work best given the data I have?)

[1] http://www.slideshare.net/agibsonccc/ir-34811120


Interesting that none of the methods in either short list are inherently probabilistic.


Agreed on both points.

In addition, I don't know why anyone would think Google is going to make all the AI, just because today's most notable, state-of-the-art AI systems are made by a few big companies with the resources to fund large teams of experts for years. Fifty years ago this article could have been called "Untapped opportunities in software" (or operating systems) and talked about IBM -- is there software IBM can't or won't write for us?


I think the key difference between Google and IBM is that Google has shown a clear interest in AI, and practice of it is pretty pervasive throughout the company. There's a world of difference between "The largest tech company that can fund a team of experts" and "The largest tech company that is funding multiple teams of experts to actively work on the problem."


>I think the key difference between Google and IBM is that Google has shown a clear interest in AI

Umm... What about Watson?


Google needs AI for its core product.

Better search is an AI problem. What did the user ask for versus what did the user want to ask for?


In comments he mentions two "high quality open-source packages":

http://torch.ch/ (used widely at large companies) http://deeplearning4j.org/ (newer)

I have no idea if they are quality or open source, just posting the information.


Hi, creator of deeplearning4j here.

I'd like to add this IS newer. That being said, I am going to be focusing on ease of use here.

Currently a lot of the deep learning frameworks out there aren't focused on a lot of practicality.

I am also hoping to add sdks for different languages trying to make this a fast general purposed deep learning core that allows for people to do neural nets in different languages while also benefiting from a fast runtime.

I'm going to be opening up a good contribution pipeline here soon and would love to answer questions where possible.

I'm looking for help on everything from documentation to feature requests.


That's fair pushback re how "standard" deep learning is. That said, those methods are rapidly establishing themselves as the go-to for applications in speech, vision, and text. At least for those outfits that can afford the substantial dev costs.


This is what I am hoping to launch with the launch of skymind[1]. I believe in the potential of deep learning, especially in the ways deep learning can learn better representations for feature extraction than some hand curated models.

Practicality has always been a problem with deep learning, and I think enabling access to this powerful technology will be a great enabler for many people in the long term.

[1] http://wired.com/2014/06/skymind-deep-learning/


I do find it curious that his list of "pretty standard machine-learning methods" included Logistic Regression, K-means and....deep neural nets?

The next sentence is much more curious to me: "the point is that they’re widely available in high-quality open source packages" because I have yet to find a proper well documented well maintained non-toy open source deep neural network implementation.


http://www.ersatzlabs.com/ claim to offer intuitive deep learning.


Massive datasets do outperform clever theories... but I think that's just because no one has yet worked out the theories that work best with the data. This requires insight, in addition to data, and could come from anyone.

The alternative - that massively complex probabilistic models are the best theory of the data - is hopefully not true. Especially not of our minds. But it could be true, and if so, it would mean that our intelligence is irreducible, and we are forever beyond our own self-understanding (even in principle). Our history is full of inexplicable mysteries that were eventually understood. But not all: quantum randomness. I really hope intelligence is will be one of the former.


There are a lot of AI problems that can be solved with less than human intelligence but some human numbers for reference:

By the time you're 30 you have been exposed to:

~1.4 petabytes of visual information ~1.8 terabytes of auditory information

Touch and proprioceptive bandwidth is harder to calculate but the ascending pathway through the spinal cord is about 10 million fibers, which is 10x the optic nerve (Or 5x the number of fibers from both eyes). So:

Between 1.4 and 14 petabytes of touch and proprioceptive information.

So we're a fairly large data problem on top of millions of years of evolution that have baked in some knowledge and abilities.


Not really. Our knowledge takes shape way, way before we hit 30 in every area you can think of. Some modular brain systems actually come to full form in the few first years, some in the few months (vision).

I would argue that the data we exposed to is not only small, but actually sparse.


If you consider brain development to be iterative over many generations of brains through human and pre-human history, the training datasets are a lot larger.


Except that doesn't make sense. The complexity of changes that can be carried over is very, very small compared to the changes which go on in a single lifetime. And the mechanism is totally different.


Do these numbers take into account that most of the information reaching the eye never reaches the brain? It is amazing how little information is transmitted from the eye to the brain - the vivid images that we "see" is to a large extent product of the brain's own imagination, with impulses from the eye to a large extent only provides limited, 'corrective' information to ensure acceptable coherence with reality...


Yes these are all based on the bandwidth of major nerves. As well as time awake.


How did you get those numbers?


No reason to be so pessimistic about quantum randomness. Quantum theory is barely 100 years old and our understanding of it is still evolving massively. Though the latter is not always appreciated by the public.


Well, local hidden variables are out the window... do you see any indication this will change?


Personally I am also not convinced that LHV theories will every describe the effects correctly. But whether the successor of QM ultimately contains true randomness or not, there is no reason to think it will forever stay an 'inexplicable mystery'.


I don't see much cause for optimism. Human intelligence is the end result of tens of millions of years of evolution. That software project you hacked on for a few months until it worked really well? Now multiply that by about seven orders of magnitude. You simply can't comprehend how much trial and error led us to the state we're in now. To think that we could reverse engineer ourselves in the span of a few centuries always seemed pretty naive to me.


A human generation is about 20 years. A software-instance generation could be a few hours.


I can honestly say that this post has revolutionized my thoughts on AI. Primarily this is because of what I perceive as the thesis statement, which is:

"<AI> is the construction of weighted tables (choices, data, meta relations, whatever) from large sets of <prior data> by <method>"

This is kind of crazy, because I think it says you could make a Turing AI by using large datasets of prior life data for humans. In essence, "<my life> is the construction of weighted tables from large sets of <life experience> by <human learning>." For example, if you had an AI that could learn through text, you could have extensive transcribed conversation logs of people and then large time-activity logs to use as your inputs.

If it could learn through video (IE, it could view images, understand objects, object relations, events in time, and assign will to the person behind actions / events) then you could instead just feed it huge video logs of people's lives. If you wanted a copy of a person, you could feed it only a single individual, and if you wanted a more general AI, then you could feed it cross sections of the population.

In addition, there's a very cool meta aspect to the large dataset concept, in that it can be large datasets for when to use, or to feed data to, specialized sub-AI's. For example, you might have a math sub-AI that has been trained by feeding it massive sets of math problems (or perhaps it can learn math through the video life logs of a person?). If its then being used as a part of a larger piece, then you'd want to know when to use it to solve problems, or when to feed it experience inputs for further learning. In essence, its tables of categories for experience types, and then grown / paired sub-AI's for those types.

I would wager that it is possible, right now, to create a chatbot that can pass Turing using the above by feeding it the equivalent of mass IRC chat or somesuch huge, human interaction by text dataset over a variety of topics. This would naturally need sub-AI's for mechanical things like grammar or parts of speech, and then possibly higher level meta-AI's for interpreting intent, orchestrating long form thought, or planning. In a way, its layers of AI based on level of thought abstraction. If it were a human, the high intensity portions of sub-AI would occupy space relative to intensity within reconfigurable co-processor zones (sight:visual cortex, sight:face recognition:occipital and temporal lobes, executive functions:frontal lobes, ect...)


Consider this simple sentence:

"Jane grew up in an idyllic rural area."

No current AI implementation, to my knowledge, can understand such a sentence nearly as well as humans do. A competent chatbot judge could suggest a novel situation, say a broken-winged black Pegasus appeared in Jane's hometown when she was seven, and ask pointed questions to find out if the interlocutor is a human or a bot.

The issue with almost all current approaches to AI is that it is either purely symbolic or sub-symbolic. The current symbolic approaches cannot completely capture preconceptual experience human use to make sense of the world. When we hear "idyllic rural area", humans use our mental imagery and sensory experiences to help us understand the sentence much more deeply than the list of words suggests.

The subsymbolic approach could potentially solve this issue, but it raises the problem of integrating all those complex, interacting parts, vision, auditory, motor control, conceptual thoughts, etc. into a unifying whole. More importantly, would we be able to control and direct the beast sufficiently well once it becomes reality?

There is now some AGI (Artificial General Intelligence) research on integrating the two paradigms. If anyone is interested, a presentation is available here: http://ieet.org/index.php/IEET/more/goertzel20130531


To quote a great scene from the movie "Waking Life" [1] by Linklater:

"When I say 'love', the sound comes out of my mouth and it hits the other person's ear, travels through this byzantine conduit in their brain, you know, through their memories of love, or lack of love, and they register what I'm saying and they say 'Yes, I understand'..but how do I know they understand because words are inert. They're just symbols. They're dead. You know, and so much of our experience is intangible. So much of what we perceive cannot be expressed, it's unspeakable. And yet, when we communicate with one another, and we feel we have connected and we think that we're understood, I think that we have a feeling of almost spiritual communion, and that feeling might be transient, but I think that's what we live for".

[1] https://www.youtube.com/watch?v=pvnQu30kQ2c


Jane grew up in an idyllic rural area.

I'm currently working on a system for understanding natural language and you might be surprised by how many assumptions one must make in order to understand just this simple sentence. The only part of this sentence that nearly all humans could understand is the concept of growing up. Everything else would have to be inferred inductively from our personal expectations and experiences. For example, from this sentence, you wouldn't know the gender, race, or even species of Jane except from your experiences of people saying similar things about people about of whom you already knew the race, gender or species.

Consider the difficulty of modeling that sentence in a computer system. If all you had were examples of texts to go by, such as the scenario in the grandparent, how would you determine that Jane was a female? Within that vast body of text, somewhere, it has to contain the statement that "Jane" is a female name. Or Jane can only appear in sentences as the referent of a feminine pronoun, like "she" or "her". Or in that vast database of images and video, all human beings that were identified as Jane have to have female characteristics.

From an even deeper philosophical perspective, how do you know which Jane this is referring to? Is this Jane supposed to be an actual person with hidden but unknown state (such as the name of her parents) or is this a purely fictional creation, for whom it would be meaningless to ask who her parents were? How do you teach a computer the concept of a fictional character? The interesting about fictional characters that they do not describe what something is, but instead, describe what something is not. In order to create a computer that could pass the Turing test, the computer would have to be capable of modeling both fictional and concrete things at a minimum. It would have to know when someone was talking about something fictional or something that is supposed to represent an actual object. If we do not destroy ourselves first, the day will come when computers will be able to make this distinction, but I think the design of such a computer will have to be evolved rather than architected from the top down. I think the problem is just too hard.


Why is it relevant if AI "understands"? The post above doesn't claim it would understand, it just claims that for all the practical purposes it would pass Turing AI test.


Because you couldn't pass a Turing test, even for practical purposes with that approach. Passing the Turing test is a problem that cannot be solved by big data alone. You have to model not just language patterns and word sequences but the prior causes for those word sequences. The prior causes for the words (and images, and videos in the original example) are ultimately desires and experiences of real human beings and truths about the universe.

In other words, a computer has to actually model the world and the changes to the state of the world as the conversation goes on in order to pass a Turing test. I didn't read anything in the original description to suggest that was happening.


Is there evidence for preconceptual experience? It seems to me that there is no experience without concepts.


This is very well and succinctly put!


What happens when you ask it to do something entirely novel, that it has never seen before?


It doesn't explain any personality differences. Or chemical changes like whether a person is hungry, or drunk.


As a postdoctoral candidate in biology, I can say that my approach to problem solving is exactly the opposite: My job is to infer as much as I can from the scant amount of data I can obtain. The goals outlined in this article are to collect as much data as you can, creating what is essentially a glorified lookup table of results. I must say the latter approach seems a hell of a lot easier.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: