Just a nit, but the author keeps talking about object recognition while what he was actually doing is image classification. Object recognition actually consists of two tasks, one is classifying the object (this is a beer bottle) and the other is also says where in the image the object is. Additionally it can/should detect multiple objects in the image. This is a more complex than classification, which only associates one category with the image.
I don't think there's a consistent terminology. In my computer vision class we called it "object recognition" when it was about recognizing one specific object (this particular car) and "object classification" when deciding the category of the object in the image (in general, like 'car', 'bottle').
One may also call the localization of the object as object detection and subsequent classification.
But I don't think it's too important how we call it as long as we understand what the task is.
Actually the tensorflow implementation he uses does both segmentation and classification and returns a probabilistic graph of objects. For his application, it's only returning the top result, so it looks more basic than it is.
No, it doesn't and there is no graph returned whatsoever. It's just a list of the top classification labels for the image (see example at the tutorial he cited https://github.com/tensorflow/tensorflow/tree/master/tensorf...). This is not the result of a segmentation but is rather a list of the top labels the model believes this could be. If you look at the top results you'll see they're usually similar/in the same family (again, refer to the example in the linked tutorial, the top 3 labels are: military uniform, suit, academic gown). This is literally the normalized output of the nodes of the last layer in the neural network (where each node corresponds to one category). If you added all probabilities together it'd sum to 1.
That's my point. With these OTS modules they are only returning on known classifiers.
The system has to segment before it classifies. That isn't returned to the user, but gradient descent is happening in the background. Like I said, it's a nitpick but important if you're trying to really build novel CV applications.
One of my gripes with people implementing pre-built modules from TF is that you don't really build any of the hard stuff, and it's pre-trained so not much learning is happening. You can't for example build RL systems with off the shelf TF implementations.
Do you understand how convolutional neural networks work? There is no segmentation involved here at all. The input are the raw pixels of the image. The output is the probability this image belongs to one of the categories the network is capable of predicting.
Also gradient descent has nothing to do with segmentation at all, I don't understand what you're talking about. Gradient descent is used to find the set of weights that minimizes the error. This is standard in training neural networks of any kind using backpropagation.
Like so many other things in the field of AI, general object recognition was the "holy grail" because it was assumed that it required AGI. Now we've figured out a way to do general object recognition without AGI.
Is there a story somewhere of AI researchers concluding general object recognition was the holy grail of AI?
I get that a lot of people downplay achievements in machine learning by saying it's nothing like AGI, but it's almost a meme now that "once upon a time everyone thought that was the holy grail and they're moving the signposts" even when 1) nobody thought that, or 2) some people thought that and some people didn't think that.
In fact, the first thing you learn in an introductory computer vision class is that Marvin Minsky assigned "computer vision" to an undergrad as a summer project in 1966 (wire up a camera to a computer and write a program to understand images). It was the opposite of a holy grail.
Even today, it's hard to make laymen appreciate the advancements in computer vision because for them "seeing" doesn't seem like a difficult thing. Even stupid chickens can see. A chess program is much more impressive to laypeople.
I - for what it is worth - would still say general object recognition, with the emphasis on general, is indeed the holy grail.
The ability to recognize objects like people do is not properly represented by current benchmarks. I can imagine that you can built a perfect robotic "bird spotter" but if you put that in a self-driving car I would not be surprised if it stops for something that's just a shadow, or if you put it on a humanoid it's unable to distinguish its own hand from that of its clone. Imagine two of them cleaning out the dishwasher. :-)
A lot of AI is still working only in lab conditions or restricted application domains. That's why I consider robots and cars so important in driving AI towards the "general" dimension.
Well I can't find any specific references, but I definitely recall getting that impression from old machine vision work. Decades of work to get models that were incredibly complex and hand crafted and barely worked. I don't know if I thought it would require AGI, but it would definitely require significant progress towards general AI.
Whenever we figure out how to do something, we stop calling it AI or AGI. If the trend continues, will we eventually have a general AI, but won't consider it anything special? Will it have been just a small incremental step by then?
I think deep, natural language processing will be unambiguous: if you create a machine that says "Yes, I am intelligent, thanks for asking" in a way indistinguishable from a human, it would be hard to disagree. On the other hand, it's entirely possible that that goal will take so much longer than others we'll have incredibly strong AIs affecting our lives before we notice.
I think it will need some kind of breakthrough. Current advancements are probably incremental as you stated, but having an AGI might need some new theory we don't have currently.
Deep learning is the opposite of incremental. For a long time it was not clear whether/how we can learn multi layer networks efficiently. ImageNet changed everything.
Machine learning people basically agree that there weren't any big breakthroughs in deep learning. The success and the hype is mostly a combination of more computing power and more data. The algorithms (convolutional neural network etc.) were invented back in the 1980s and even earlier.
There have been some improvements but they are incremental indeed. More use of ReLU, dropout etc. But it's not a new paradigm at all.
Convnets follow pretty naturally from multilayer perceptrons. Perhaps backpropagation was a breakthrough, enabling the training of ANNs on data, instead of hand-tuning.
But the idea of neural nets is very old, going back to Rosenblatt and connectionism.
I think a "holy grail" could be understanding complicated intentions and social reasoning. Like "he's only doing that so that it seems that he thinks that the other girl doesn't know that he could otherwise not do the etc. etc."
> Now we've figured out a way to do general object recognition without AGI.
I'm pretty sure we'll eventually learn to do anything without AGI, as a narrow task.
The trick with AGI is putting all those little things together. Perhaps that's the actual recipe for it, somehow. Turtles all the way down, who knows how many levels.
Are you sure? I'm convinced this time is different. Sure, it's been applied to vision first, but I believe the techniques can be applied to almost any sensory input.
Just the recognition part won't cut it. Look at AlphaGo. It too several different techniques combined to beat the best human Go player in the world. That's a step beyond object classification, but still not enough for AGI.
Author here - I guess "the holy grail" is overstated, but recognizing arbitrary objects is a tough problem that people have put a ton of time and energy into. I think to really get it right in all cases you may need general AI.
This was amazing, I am amazed at your command of both hardware and software technology. Even as a Software Engineer, I have a hard time trying to make TensorFlow do something for me.
It thinks for around 3 seconds. It doesn't use the PI's GPU - I bet that if it did it could get a lot faster. I bet you could monkey around with the compiler flags and speed it up. If anyone working on TensorFlow has some ideas, I'd love to hear them.
This was the keyboard - https://www.amazon.com/gp/product/B0179N39KS/ref=oh_aui_sear... - it's incredibly light and cheap but one month later I'm not sure I'd recommend it. A bunch of the keys were randomly mapped incorrectly and it's started to flake out on me a little bit.
Great project. Locomotion and vision are pretty advanced compared to grasping and complex handling of objects. If we could have a workable arm, it would be much more interesting in applications.
There are some affordable robot arms. uArm has a hobbyist-level arm. I have one, and I made a 6DOF force sensor out of a 3Dconnexion SpaceNavigator on the end and got it to talk to ROS. I want to hook a classifier system up to it so it can use tools like end wrenches and get them onto a bolt by feel.
Robot manipulation in unstructured situations still sucks, though. Willow Garage was making progress, but didn't last.
Dis the author publish a repo for this? It's easy getting tensorflow going for basic image classification but the hard part is actually making the robot move in a way that makes sense - using the camera and the sonar data to make decisions and then drive the motors. Or is this not autonomous?
I made an autonomous mode using the sonar sensors and a driving mode. I also made a public repo, but I didn't want to link to it, since the code is so awful (it was just a fun hobby project and I wasn't expecting people to look at it).
The mention of sonar sensors, used to prevent the car from running into obstacles, made me think the author was planning to make the robots autonomous too, but maybe just hasn't gotten that part worked out yet. I was also curious how the robot would know when to stop and try classifying something, without the operator explicitly telling it so.
Sorry for the off topic but is anyone else getting very high cpu usage from O'Reilly websites? Any known resolution or work around?
With Chrome developer tools I see one error:
"Uncaught SecurityError: Failed to read the 'localStorage' property from 'Window': Access is denied for this document."