Hacker News new | past | comments | ask | show | jobs | submit login
Open Problems in Robotics (scottlocklin.wordpress.com)
436 points by haltingproblem on Aug 19, 2020 | hide | past | favorite | 227 comments



- Motion planning: already discussed.

- Multiaxis singularities: much less of a problem than it used to be. We don't need closed-form solutions any more; we have enough CPU power at the robot to deal with this. You need some additional constraint, like "minimize jerk" when you have too many degrees of freedom.

- Simultaneous Location and Mapping. SLAM for short: Getting much better. Things which explore and return a map are fairly common now. LIDAR helps. So does having a heading gyro with a low drift rate. Available commercially on vacuum cleaners.

- Lost Robot Problem: hard, but in practical situations, markers of some kind, visual or RF, help.

- Object manipulation and haptic feedback: Look up the DARPA manipulation challenge. Getting a key into a lock is still a hard problem. It's embarrassing how bad this is. Part of the problem is that force-sensing wrists are still far too expensive for no good reason. I once built one out of a 6DOF mouse, which is just a spring-loaded thing with optical sensors. Something I was fooling around with before TechShop went down. I have a little robot arm with an end wrench and a force sensing wrist. The idea was to get it to put the wrench around the bolt head by feel. Nice problem because a 6DOF force sensor gives you all the info you can get while holding an end wrench.

- Depth estimation: LIDAR helps. The second Kinect was a boon to robotics. Low-cost 3D LIDAR units are still rare. Amusingly, depth from movement progressed because of the desire to make 3D movies from 2D movies. (Are there still 3D movies being made?)

- Position estimation of moving objects: the military spends a lot of time on this, but doesn't publish much. Behind the staring sensor of a modern all-aspect air to air missile is - something that solves that problem.

- Affordance discovery: Very little progress.

The real problem: solve any of these problems, make very little money. If you're qualified to work on any of these problems, you can go to Google, Apple, etc. and make a lot of money solving easier problems.


"The real problem: solve any of these problems, make very little money" - Just curious why have you come to this conclusion ?

Object manipulation has potential products in dishwashing and vegetable chopping - sufficiently large markets, potential billion $ outcomes for a startup which takes the early mover lead. Two robotic hands that can work in co-ordination just as human hands do. Extremely difficult to solve, but money is there.


I'm not the OP, but the real issue in robotics is two-fold and he mention both.

1. The cost & technical limitation. The cost, as some of the solution per problem OP suggest include "Lidar, visions and sensor" and all of those thing cost a lot by themselves. On top of that you need good actuators (Harmonic drives etc) for precision, and then good computing unit to handle all of it. Now you also need more power supply, and if it is mobile it needs a battery. Now your robot have a price tag only Arabic princes and well funded academic labs are interested in. And we haven't even touched the cost of try&fail iterative engineering process to make these work. In that process most companies/labs realizes the problem/condition have to be severely limited (run time, indoor/outdoor, general applicability vs made for one & one application only).

2. Human resource. Really, there's not much money in robotics, as a robotics engineer. Software robotics engineer can get a better working condition, security, salary & fulfillment in a software company. Mechanical, EE, embedded, etc is in a similar situation. Most robotics people I know are in for the passion. Application specific development (that is the norm right now) also require very niche knowledge that is hard to find.


The points you make are all true until they are not. Many useful robots should be able to work well enough for the purpose at a reasonable cost. Part of that is mass production brings prices down.

The problem is of course until we can solve the problem of making it work at all price isn't a consideration. You can go bankrupt advancing the state of the art which then is enough for someone else to take your work and (repeat the bankruptcy part many times until someone works it out), and finally someone makes a ton of money with their useful robot.


Servos are mass produced but still expensive!

You're right, but also software is weird. The world is absolutely littered with good computers that other people have already paid for, just waiting to be put to use for your project to expand their capabilities. If every new web app required its own little pocket computer to run, that you had to convince your customers to buy and keep in their home, it would be much less profitable to develop web apps.

The world is not littered with idle, high-performance robots waiting around for your motion planning algorithm to kick them into action. That makes it a lot more expensive and a lot less profitable.


Hmmm, lawyers, they will be involved. Add astronomical insurance and liability costs when things do go wrong in less-constrained and less-controlled environments.


> Two robotic hands that can work in co-ordination just as human hands do. Extremely difficult to solve, but money is there.

You will find the most automated Mc Donalds in the world in Switzerland. Mc Donalds over there has a lot more automation there than it does in the US, both in the cooking and in the order taking.

Either this is because the automation is extremely difficult to do, but is simpler in Switzerland. Or alternatively, this is because in most countries in the world humans are cheaper than the automated system, even after it has already been developed.

As someone who does research on robotics for a living. The problem is not the former, but the latter.


Can I surmise that most automation in dishwashing etc is geared for commercial enterprises ? I find the same lacking in a household. Sure you have the traditional dishwashers and vegetable choppers but they are largely clumsy to use and still take quite a lot of effort.

For a consumer, I don't know if the equation 'human labor cost << robots' holds true. It is a hassle to get human labor and there is no scope for time arbitrage. A robot can do the dishwashing job at night for example. Unlike a traditional dishwasher, you don't have to load the utensils. Just leave the utensils in a sink and you are done.


Two robotic hands that can work in co-ordination just as human hands do.

Rethink Robotics, a Rod Brooks startup, tried that, with "Baxter". Company went bust.

There's good, simple commercial hardware for high-volume vegetable chopping and dishwashing, and it can be found in most large commercial kitchens.

There's a huge amount of special purpose machinery in the world. Most of it is pretty dumb. Newer machinery tends to have more sensors and a bit of self-adjustment. Vision systems have become parts of much industrial machinery - they're cheap now. Mostly they're looking for things aligning or looking like a known good object.


They also had this obsession with low-bandwidth series elastic actuators, which make robots safer but slow and unproductive.

ABB's YuMi is perhaps a better example.


> Two robotic hands that can work in co-ordination just as human hands do. Extremely difficult to solve, but money is there.

Unless real human hands continue to cost less.


Human costs are complex. I used to work in a company that made tape robots. One customer did the math, hiring humans to change tapes was cheaper than buying a robot. Then he walked in one night and saw the kids in hockey gear, take the needed tape off the shelf, slap shot it to the "goalie" who put it in the drive. Which is why he was a customer even though human labor is cheaper.


as someone who has worked his share of minimum-wage jobs, this seems obvious. Perhaps it's a case of academics in their ivory towers not having sufficient exposure to the real world? That kid hired for $8/hr doesn't care about the success of the business. Their concern ends when the shift changes.


Robotics and embedded just doesn't make that much money. I had to turn down a job offer due to the seriously low counteroffer I was given. Or rather, I was given no counteroffer, just told to pound sand after the market rate I gave apparently insulted them.


I think a big part of the issue here is that the robotic boom hasn't happened yet. There just isn't that much demand for robotics skills, and the market is not efficient.

If robotics and machine learning can progress to the point where household robotics or general-purpose robots become a thing, there will suddenly be an explosion in demand. Salaries for robotics experts will go through the roof, and all of a sudden, everyone will want to study robotics in school. This could take another 10-20 years to happen though!


If you count autonomous driving vehicles as robotics (I know it's a stretch), the funding (and therefore pay) story is a lot better, although you get the usual startup vs BigTech debate.

Source: I work in an ADV company.


The amount of money that's gone into that area without shipping a product is insane.


The potential upside for the company that gets it right is enormous. Billions of people are tired of wasting their time driving. Entire industries can be built on the technology if it works well.

That said it's a problem that steers awfully close to needing a full real AI and that's been a showstopper for loads of potential solutions for decades now.


My observation is that the industry ("we") as a whole is slowly but steadily making progress, without throwing our hands and say we need AGI. A big part of this comes from the upscaling of test fleets, which now generate data with enough quantity and variety that you can leverage the existing data tools to analyze and give developers precise and actionable feedback. While Tesla might seem like a popular punch bag in the ADV world, they are a big practitioner of this paradigm and did get good value out of it. That, and the trend of replacing more and more hand-crafted rules with fuzzy numerical models designed to "blend together" the rules (therefore retaining explainability).


In some cases you need to invest as risk mitigation. If someone else does self driving cars and proves that it is safer than human driven cars by a large margin yiy might find yourself out of the car business when governments mandate the technology and you can't afford the license.


I would have thought the money wouldn't be there for a different reason - human hands suck compared to tools which is why we created them in the first place. It is a bit of a silly trope to have industrial work done by humanoid robots with hand tools.

The ability to manipulate destandardized sizes would be useful but precision manufacturing ate most of the lunch long ago which reduces it to more of a "last mile" task.


Well, the advantage I see with humanoid hands is that the same task that a human does can be done much faster by a humanoid hand. For instance chopping onions or other vegetables. Even dishwashing.


Kinda feels like that old life pro tip. Want to know the answer to something? Don’t ask a question, but make a false statement. Every expert in the field will rush to correct you and give you the most up to date and relevant information.


>Cunningham's Law states "the best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer."[0]

Also I'd like to point out that you incorrectly stated it was an "old life pro tip" instead of an "internet law". Anyways, here is the correct information for you.

[0] https://meta.wikimedia.org/wiki/Cunningham%27s_Law


Thanks for setting me straight. :)


I don't really understand why depth estimation using binocular vision is still a problem.

I worked on this a bit a number of years ago and I thought I had scene matching working pretty well. The problem is I was trying to make it work without actually having two cameras (ie. on a smartphone where binocular cameras were not available at the time and for the most part still aren't). I was hoping to use the accelerometers and other sensors for dead reckoning the camera's position as the user swept it over a scene, but that turned out to be too hard.


Stereo vision isn't exactly "solved," but there are a ton of very good solutions out there both open source and commercial. The "Semi-Global Block Matching" algorithm implementation in the OpenCV library[1] is very good even though the algorithm is over 10 years old at this point. I've played around with the new Intel RealSense[2] units recently as well, which are stereo cameras with onboard processing and a nice IR pattern projector. Pretty cheap, and for the most part they "just work."

[1] https://docs.opencv.org/4.4.0/d2/d85/classcv_1_1StereoSGBM.h... [2] https://www.intelrealsense.com/stereo-depth


Isn't your "that turned out to be too hard" one of the answers?

People tend to think about machine vision in perfect conditions. This is the problem, me thinks. In practice, you will get a lens flare in most important case, or lose a sensor or get some rain.

Instead of designing solutions starting from most hardcore edgecases (which are common with humans), robotics researchers tend to provide MVP that works in best situation. That is far from end-user environmnent.

This needs global fixes, if you (I don't care) want robots working in real world.


> Isn't your "that turned out to be too hard" one of the answers?

That's a "we need an expert in filter theory" problem. I had four people crash and burn on that problem when we were building a DARPA Grand Challenge vehicle. Combining GPS, accelerometer, gyro, compass, and odometer data to get position is a hard problem. All those sensors are noisy, but in quite different ways. There are off the shelf solutions now, but there were not in 2004. We could not get below 3 degrees of heading noise, and had trouble keeping the sensor map aligned to the real world.


But what I said was that the "turned out to be too hard" part had nothing to do with binocular vision.

I'm certainly not claiming that I had a fully robust solution but I'm pretty sure lens flares wouldn't be too hard to filter out and there are lots of useful things that need doing in rain-free environments.


Do you think depth estimation could be done with two cameras plus computer vision (to find markers)? I think this is more or less what we do with our own eyes. Of course you would need much more processing power, but maybe for some applications the robot's brain doesn't need to be inside its body.


Parallax methods are widely used, I think it's how Tesla's driving assist features work.

The problem with it is that humans don't just use binocular vision - we have a whole model of the world. So for instance, if I see an object, I usually know roughly how big it is supposed to be because I have a conception of "object". I also know that the straight line on both sides of an object is a wall, and that the wall continues behind the object and therefore the object is in front of the wall and the object is closer than the wall.

That means I'm not just working with my binocular vision and it can be deceptive to think that because it works for human vision it will also work well for computer vision.


Lost binocular vision for a number of years back.

You get around just fine without it. Got it back after some speciality glasses (PRISM).

Was a complete shock see depth again. Didn’t seem to help anything getting it back. Mostly just trippy. Chairs were amazing to stare at.


From a much shorter and very different route, once after an 18 hour straight Quake marathon, I looked around the my room and was startled at how everything looked. I was highly impressed by the graphics and depth perception.


This reminds me of the 'Tetris effect' in which people who play Tetris for prolonged periods of time will begin to experience Tetris-like hallucinations when they stop. I've experienced it myself and have trouble explaining it, but it's as though everything you see becomes Tetris-like in some way. You look at your dinner and see ways to rearrange the peas so they 'fit' with the mashed potatoes, or something like that. There's more to it than that though, there's a real sensation of things being moving blocks that must be fitted together. It's bizarre.

But unlike Quake, Tetris isn't providing you with a 3D experience. That makes me wonder how much overlap there is between the two experiences.


I took an art class a few years ago. Negative space drawing and perspective drawing will blow your mind after a 3 hour session.

Walking outside you see lines and shadows everywhere. Entire perspective changes.


I’m curious to hear more. What caused you to lose and then regain depth perception? Were you able to see with both eyes or did you lose vision in one eye temporarily?


Mix of minor strokes and high intercraniel pressure.

My vision would switch from one eye to the other roughly every 30 seconds. Mostly seamlessly. Took ages to figure this out. Had some minor left brain issues. So writing would go gibberish every 30 seconds. Keeping a patch on right eye meant my vision went black every 30 seconds but my writing was fine when I could see.

Left eye being patched meant writing mostly gibberish. Still had blackness every 30 seconds.

Going on blood thinners resolved the vision going black and gibberish problems.

Prism glasses fixed double vision problems.

Medication for cranial pressure removes the need for prism glasses for about 10 hours. I have several extremely different pairs of glasses depending on what my brain is currently doing.

Inhaled some caustic gas a few years ago. My blood went “sticky” from it. Apparently it activated a Latent blood disorder.


This is the most interesting thing I've read today! Thank you for sharing!


Thanks! Took 3 years of hacking mind and body while mentally deficient to figure it all out. Leaned a lot from it all. Kind of forgot a lot too, but got most of it back now.

Would take memory tests daily or more to see how “stupid” I was.

Graphing my own cognitive decline and eventual resurgence was “interesting”


Whoa, I’m really sorry to hear that. Your story is fascinating and reminiscent of Michael Gazzaniga and others’ work with split brain patients.


If prism glasses helped regain depth perception, then probably something went wrong with eye motor function, causing diplopia (double image due to misalignment of eye). Could be trauma to the eye socket? Vision loss in one eye i think can be ruled out.


Sixth cranial nerve issue. I can keep eyes aligned, but is extremely difficult and painful if cranial pressure is high.

Reducing pressure resolves the issue.

Both eyes are very healthy.

At worse it’s +24 diapors, -1 power.

When pressure is good it’s 0-2 diapors +3power.

For awhile it was -6 diapors.

At this point I have glasses for much of the range.

Temple pain tells me when it’s time to switch glasses.

Avoiding caffeine and other stimulants helps.

Taking a Diamox will drop pressure in a couple of hours.


There is either a book or a great Joe Rogan podcast in your story. Thank you very much for sharing.


We can also go up and down abstraction levels at will.

We can see a forest, then a tree, then a leaf. All different "objects", yet related. The ability to not distinguish each blade of grass in my yard and just see it all as "grass" is very useful for noticing and not stepping on what the dog left behind.


Note that the human system fails for things like the moon which look vastly different in size at different times. This is proof of your point.


The answer is 'sometimes' and it's known as stereo vision.

Downsides include being less precise at longer distances (an object 1.5 meters away becomes a lot larger if it gets 1 meter closer, an object 150 meters away barely changes) and poor performance on surfaces with fewer features, or really dense features that all look alike (e.g. running across a field while spotting bumps and dips in the grass, or measuring whether sheets of steel are flat or not)

In some cases this doesn't matter - a Roomba doesn't care if it's hard to see things 150 meters away, as rooms are rarely that big.


In the area of Motion Planning (my own area of research), the most that can be said is that practical solutions exist for a tiny subset of cases, workable methods exist for a larger subset, expensive methods exist for a still larger subset, and everthing else might as well be impossible.

- If you've got a low-dimensional problem, say 2D or 3D, without uncertainty (or at least bounded enough to pad obstacles and ignore it), search-based planners like A* and its derivatives work. Add uncertainty, complex non-holonomic constraints, limited horizons, etc and it becomes much harder.

- If you've got a higher dimensional problem, say a 6/7-DoF arm, even multi-armed or humanoid robots, and you don't have uncertainty and dynamics (or can ignore them), sampling-based planners like RRTs and PRMs and their derivatives will often practically work. Actually useful guarantees of finding a solution or trying to find an optimal solution in useful time are still very much unsolved.

- If your problem is basically open, and something approximating the "straight-line" from A to B is in the same local minima as a solution, trajectory optimization will work for a lot of problems. Motion planning is very much non-convex, though, so it's very easy to go from a problem solvable with optimization to a problem that isn't.

- Planning with non-rigid objects, significant uncertainty (effectively continuous MDPs or POMDPs), and/or complex dynamics are all very unsolved problems.

For motion planning problems in the gray area of "practically solvable", the art is figuring out how to simplify the problem as much as possible to make it tractable - highly optimized collision checking (generally speaking the performance bottelneck), combining search/sampling-based + optimization methods to get an initial solution from a global method and then refining it towards a local minima with optimization, or using special hardware or sensors to bound dynamics and uncertainty so it can be ignored.


I discovered when doing something in this space recently that even the simple case of doing optimal planning for a particle under Newtonian dynamics, with limits on eg. velocity and acceleration, is NP-Complete.


sorry for sidetracking your answer but what actually does convex mean in the context of optimization. I remember looking at a book called convex optimization.

Your statement that > Motion planning is very much non-convex,

suggests to me that you are very much talking about the same thing. I understand convexity as in a shape. Why is convex good and concave bad in terms of optimization?

I don't want you to dumb down the answer too much as I am a trained Mechanical Engineer but then my major isn't math. Hope you understand :)


Convex/ non-convex optimisation refers to the shape of the error function we're trying to optimise. In convex optimisation we can assume it's, well, convex:

  .               . ε
   \             /
    \           /
     \         /
      \       /
       '._ _.'

          ^
    Global optimum
In non-convex optimisation we can't make any assumption about the shape of the error function:

      ,--.            .--.                               ε
     /    \          /    \              ,--.
    /      \        /      \            /    \        /
  .'        \      /        \          /      \      /
             \    /          \        /        \    /
              `--'            \      /          `--'
               ^               \    /            ^
               |                `--'             |
               |                 ^               |
               |                 |               |
               |                 |               |
               `--------- Local optima ----------'

"Optimisation" means that we're trying to find an optimum of a function - a maximum or a minimum. We're usually interested in the minimum of a function, particularly a function that represents the error of an approximator on a set of training data. Generally we prefer to find a _global_ minimum of the error function because then we can expect the resulting approximator to generalise better to data that was not available during training.

If the error function has a convex shape we're basically guaranteed to find its global minimum. In non-convex optimisation, we're guaranteed to get stuck to local minima.

(Ok, the above is a bit tongue in cheek, there's no _guarantee_ of getting stuck to local minima, but it's very likely).


>Generally we prefer to find a _global_ minimum of the error function because then we can expect the resulting approximator to generalise better to data that was not available during training.

Sorry to nitpick, but is this true? We are doing optimization here and a global minimum is just a better solution than a non-global minimum. Is there a connection to generalisation here?


I'ts like cpgxiii says. You're right to nitpick though, because there are no certainties. We optimise on a set of data sampled from a distribution that is probably not the real distribution, so there's some amount of sampling error. Even if we find the global optimum of our sampled data, there's no reason why it's going to be close to the global optimum of our testing data.

But - there are some guarantees. Under PAC-Learning assumptions we can place an upper bound on the expected error as a function of the number of training examples and the size of the hypothesis space (the set of possible models). The maths is in a paper called Occam's Razor: https://www.sciencedirect.com/science/article/abs/pii/002001...

Unfortunately, PAC-Learning presupposes that the sampling distribution is the same as the real distribution, i.e. what I said above we can't know for sure.

In any case, I think most people would agree that a model that can reach the global minimum of training error on a large dataset has better chance to reach the global minimum of generalisation error (i.e. in the real world) than a model that gets stuck in local minima on the training data. Modulo assumptions.


In an ML context, the global optima may often correspond to better performance and (hopefully) a more general solution.

In a motion planning or controls context, a local minima can often mean a configuration or path that is infeasible due to collision or wildly inefficient.


Beautiful ASCII graphs!


Think of numerical optimization (e.g. gradient decent) as if you're letting a marble run down a hill. Only for some hill shapes can you be sure that the marble will reach the bottom. For other shapes, it'll get stuck on a small flat or in a small hole. Those would be called zero gradient or local minima failures.

If your hill has a convex shape, then the marble will always roll up the bottom eventually.


It's generally understood that motion planning for nontrivial problems is very non-convex, and it's basically impossible to predict if a given initial condition can be optimized to the global optima. Even simple convex 3D geometry in the environment projects to very complex non-convex geometry in the configuration space of the robot (the space you're performing optimization in).


While that is true, the critical question is whether it will be locally smooth once you get close enough to your goal. Long-distance planning is generally bad, even for humans. Short-distance planning tends to work well with A-star


The problem is that "distance" here is not necessarily in any intuitive or useful space, and knowing if you are close to the goal may be as hard as finding the full solution. You can be quite "close" to a solution in, say, Euclidean distance over joint angles, while constraints like joint limits and collisions mean that the actual solution path will be quite long.


I apologize, I hadn't thought about robot arms. All the planning stuff that I was involved with so far was either quadcopters or slowfliers or buggies, so only cases where the Euclidean norm is useful.


this was so apt and it makes so much sense. a bulb just went on in my head. thanks so much for your simple and precise explanation.


> what actually does convex mean in the context of optimization

It means there's a deterministic solution.

Longer explanation:

When you're optimizing a function in math, the function's solutions exist in N-dimensional space. If the space of solutions is a hyperplane in this space, and you want to find a value on that plane that minimizes the function, a convex plane will allow you to find the minimum every time, deterministically -- one way is to simply follow the negative gradient of the plane. A non-convex plane may have lots of holes or other shapes (e.g "saddle points"), so if you follow the negative gradient, you'll end up thinking you're at a minimum, but turns out you're only at a _local_ minimum and there may be another, better solution out there.

So whenever you see "convex" read: a best solution exists. Whenever you see "non-convex" read: an approximation is your best hope. Motion planning is non-convex.

From https://en.wikipedia.org/wiki/Convex_function: "Any local minimum of a convex function is also a global minimum. A strictly convex function will have at most one global minimum.[4]"


Is obtaining optimal solutions a requirement in motion planning for robotics?


Generally speaking, no. But there are classes of problems (ex surgical robotics) where solution optimality (or at least near-optimality) is essential to having a usable and safe system.


In control when you say optimal it usually just means that you solve your problem by defining a cost function which you try to minimize within constraints. For example in motion planning your cost could be distance traveled + time spent + energy used. This cost function would be written as a function of the system state (position, velocity, etc.) and control input (e.g. throttle). The control input that limits the cost function is what you're looking for.

Optimal shouldn't be equated to the absolute best way, because it's always a question of definition.


But we don't need optimality or even analytic solutions for practical robot motion. Do we?


Of course we don't need full optimality for most problems, that's why RRTs and PRMs have proven useful on practical problems despite (RRTs especially) providing obviously suboptimal solutions. The problem is that really suboptimal solutions are bad (take too long, require too much energy, look scary to humans, etc) and we'd often like "OK-quality" solutions that require some sort of smoothing or optimization on top of an otherwise suboptimal solution.

There is a class of problems where optimality is super-important, where feasibility isn't a binary yes/no and you must minimize some objective. For example, in surgical robotics, you'd like to plan a path that minimizes deformation of tissue, and that requires some sort of optimality.

The perfect solution, of course, is something like A* that is complete and optimal, but that remains out of reach for higher-DoF problems.


Thanks for the exegesis!


Robotics founder here.

Popular conceptions of "robots" are unrealistically general.

In industry, we do not build robots, we build automation systems.

Given the choice, would you prefer an automation system with some environmental assumptions, high speed and perfect repeatability (ie. entire industrial automation world), or no environmental assumptions, crushingly high cost, slow speed and poor reliability (eg. walking robot that accesses your fridge because cute or novel)?

It seems both the lay population and academia assumes the latter, but industry demands the former.

This is perhaps why there are so many "open problems" - because they're not actually real world problems: they're just academic nice-to-haves.


Manufacturing company founder here - you are exactly right.

I buy automation equipment.

What I'm looking for is stuff that is relatively easy to program by my team (I do not have the money for tons of engineers to sit around programming arcane stuff), can attain a reasonably good speed, and has the reach and weight capacity I need. And I want economic cost with reasonably good reliability.

Two-handed human-like arms don't interest me. Robots with human-like eyes don't interest me. Safety systems that are too sophisticated/cute/fancy aren't that interesting to me - usually cages or more recently torque sensitive movement is good enough to keep people safe around the robot.

Often the trickiest part is the design of the end of arm tool, not the robot, and I feel this is where most innovation could happen that would benefit my operations.

By the way, the above is why Rethink Robotics failed and Universal Robots (which got way less press coverage, and raised less money, and had a non-celebrity founder) was wildly successful. Universal built a co-bot with a single arm that was fast (enough) and could carry enough weight. It didn't look human at all. Rethink had an LCD panel with eyes on it and two arms and a sonar person presence detector and its animated eyes would look at nearby people to indicate it was aware of them. But it was slow and not as accurate. Universal Robots made the product manufacturers actually wanted.


Thinking about car painting robots it's clearly about cost.

You can program a car painting robot in some hours. This will be way cheaper than a robot that is aware of the car and knows how it should move to perform the best paint job.

Another example is your washing machine. It would be nice to have a washing robot that would sort your clothes and wash them. But it is way cheaper to sort clothes yourself. So a cloth sorting problem might be an open problem but is indeed a nice-to-have.


I wouldn’t say clothes-sorting is just a nice-to-have - imagine you could get a third machine that sat next to your washer and dryer, and it folded clothes for you. I bet a lot of people would buy one, if it cost a similar amount to a washer or dryer. Lots of money in that hypothetical product. The problem is that our technology isn’t good enough to build it.


A robot that is aware of the car is long term better because you don't have to spend hours programming for each car and paint scheme. However the problem is hard enough that right now we spend hours because it works and industrial automation has enough advantages that it is worth doing anyway.


That robot may be more valuable in a company that makes custom items like toys or signage. For car manufacturing, they are making hundreds of thousands of cars a year of the same exact shape, so a few hours if setup isn't to bad!


A few setup for cars adds up, which is why cars are a single color. If the robot had some artistic sense it could put stripes on the car that artfully flows with the lines of the car. No reason it can't be done today, but it isn't because that is more hours.

I believe most car parts are paint everything and don't worry about over spray which brings costs down. That only works if you have a single color.


Another open problem: taking machines apart for recycling.


This is a problem I've been passively thinking about and would LOVE to tackle full time.

Any VCs out there have piles of cash they want throw my way?


> Any VCs out there have piles of cash they want throw my way?

I think maybe you need to refine your slide deck a little


Well, most elements on this list would allow us to move beyond needing so many assumptions. Lots of cost could be saved if I didn't need to perfectly measure all my owrkoffsets or didn't have to spend lots of time on designing my workspace to avoid singularities. Also it would allow us to move from relatively narrow-purpose machines ("move this end effector to this position within +/-0.01mm") to much more general jobs, hence opening more to automation.


Thank you for putting what I have always thought into a clear paragraph!

Back when I was working on a consumer robotics system that issue (what can we expect a random consumer to do to their home/yard to simplify the robot's job) was a constant tension with marketing. Because obviously they want a product that a customer just drops into their yard and does everything including weeding and picking up dog poop with no setup.

However, if we can help the robot by modifying the environment a little, it makes a huge improvement to reliability and cost. I would love to work on robots again, it is a fun problem to try and work out!


As an aside, this reminds me of the AI vs AGI debate (which I admit I indulge in being a former academic researcher in good old-fashioned AI) - almost all of the focus in modern "AI" appears to be be focused on solving specific tasks - which is completely understandable.


It intrigues me that most people do not consider a refrigerator to be a robot. Same for a backhoe or a blender.

The distinction between "machine" and "robot" is interesting. A traditional tractor is not a robot, but a driverless tractor is?


Could it be that a refrigerator is a final-state-machine and a robot is higher in the hierarchy?

https://en.wikipedia.org/wiki/Finite-state_machine

https://en.wikipedia.org/wiki/Automata_theory



Depends what you're after, I guess. If all you want is a 'slave' that does some repeatable action with specified accuracy, then an automation system is just the thing. If you're after exploring the questions around what it means to be an autonomous, sentient being, then perhaps not so much and you really want to build a robot.


The money (and hence most of the engineering time) is in making people's lives and jobs easier, not in exploring deep philosophical questions.


Given the choice, would you prefer an automation system with some environmental assumptions, high speed and perfect repeatability, or less environmental assumptions, slightly higher cost, slower speed and poorer reliability?


Depends on what it does. A robot that drives my kids to school has different constraints from a robot that cleans my kitchen after I'm done. The latter can be made safe by putting a locked door on the kitchen (which in turn means open floor plans are out). But the kitchen robot it allowed to take hours (not days) to do what I can handle in a few minutes. The get my kid to school robot needs to have speed.


Industrial automation is boring. Sure micron level accuracy sounds cool but it's so expensive it is only accessible for industrial automation. Only a tiny percentage of the population will ever be around those "robots".

I don't understand what you mean by crushingly high cost. You've saved most of the cost by reducing tolerance requirements such as zero backlash which will cost thousands of dollars for just a single gearbox.


There are happy middle grounds which are neither the rote repetition of industrial production nor the impossibly unstructured environments a theoretical butler robot would have to work in. I work piece picking and, with a succession of unknown objects to pick up and place, the problem is both interesting and achievable.


Almost all picking is done with pneumatic suction because it's highly reliable, powerful, relatively safe and easily adaptable. Furthermore, the standard silicone nozzles on the market permit varying amounts of safety margin due to squishiness. You can solve for almost every real world picking problem without more custom hardware than perhaps a mounting plate and scale instantly to mass-produced parallelism. This is why the industry has standardized on this approach... it's practical, cheap, fast and quickly adaptable.

The other 1% of items is a problem only in the case that the thing to be picked is either particularly complicated and irregular in shape, or has some delicate physical property. The annoying case, which seems to be what you are working on, is large numbers of unknown items.

If you start with recognition that no system will work with arbitrary items (the 'best case' generalized picking system will still be necessarily limited by maximum size of the object it can pick, the reach of the picking/placing motion, the weight it can support, and its accuracy/repeatability), this is a difficult field to justify research in, as while an algorithm aiming at generalizing solutions in this space can probably find a way to use the standard pneumatic approach against, say, the largest exposed flat surface of an unseen item after collecting a point cloud or grasping using opposite surfaces or enclose convex faces with an alternate gripping mechanism, these are never going to increase the generality to an absolute extent. Sadly, with this increased complexity comes diminishing returns, increased cost (sensors, actuators, computation, latency, etc.) and counter-intuitively (unless a major paradigm shift is obtained) also therefore a diminishing market.

Logically, then, the process governing new item ingress for whatever operations you are automating may likely be a better place to spend time than chasing after a non-existent magic algorithm / gripper design to solve for arbitrary cases. For example: "Mandate square boxes" elegantly solves the problem, as per international shipping. You don't actually need enclosed boxes - stackable trays work fine too.

About the only case where random item picking makes sense outside of garbage sorting is very large scale general warehousing for hugely dissimilar items, the sort of problem only Amazon has. For almost anyone else, sacrificing storage space efficiency with boxes or segmenting in to multiple specialised systems are likely better tradeoffs.


We've generally found that the situation is a lot more complicated than that, especially on the warehouse integration side.


Can you fill us in a bit more on what you mean by picking-specific warehouse integration challenges?


Basically a customer isn't going to be willing to make large changes in their process when they're demoing your system.


With specialized automation you get a lock-in effect because they are still expensive enough that you have to use them when you buy them.


Well, that's what research is about: to solve the said problems more reliably, faster and in the end at lower cost.


One problem I encountered while working on robotics is that many commonly used algorithms yield approximate solutions with no error bounds. They work 99.99‰ of the time. This is fine from a computer science or math point of view, but very scary from an engineering perspective, specifically when there are humans nearby. A big part of me struggles to accept the suitability of algorithms coming from gaming engines or machine learning etc for real world heavy duty robots. The lack of rigour in the field is astonishing.


This is the scariest part of using machine learning as an engineer on any practical application as well.

Without an error bound, ML can’t be in charge of anything that could put human lives at risk.

This is also why I don’t understand all the hype about FSD / L5 autonomous driving. We don’t even know yet if such error bounds even exist, so we don’t even know if machine learning is even the right tool for FSD yet. All certification entities for control systems that put human lives at risk in aviation, automotive, etc. require those error bounds. So it actually doesn’t really matter if Tesla comes up with a “maybe L5” system, without right error bounds, their cars won’t be certified as L5 and drivers will need to keep hands on the steering wheel.


What do you think the error bounds are for a human?

I know it sounds like a flippant question, but for certain applications, if we can get a model that's better than human, then it doesn't need to be perfect.

And they way we currently do this in all sorts of ways is to pair a human with a computer so that they each do what they're best at. It doesn't have to be about full automation.


Human is like your ancient software that was here since forever and somewhat worked fine. So everyone is used to it.

(the difference from an actual software is that humans are based on some crazy nanotech from the future that nobody can completely control)


> humans are based on some crazy nanotech from the future that nobody can completely control

Excellent summary :D


This is another one of that "extreme tail risk" scenarios, like climate change and GMOs, that people have wildly different and contradictory reactions to.

Sure, the "legacy" intelligence / climate / food could also have extreme tail risks, it's just that it's been tested for 100s of millenia... whereas new technology might be better in the average case or even 99th percentile, but the 1% (or 0.0001%) is unknown and potentially much worse.

However, it seems to me that people resolve this more along ideological / political lines than with any kind of rational reasoning.


> "extreme tail risk" scenarios, like climate change

Climate change isn't a "tail risk". It is a hard wall our civilization is approaching fast. If we do not solve it, it will undo the conditions we depend on to live.


What about "practical error bounds", i.e. testing the system through millions/billions of miles driven?


Empirical evidence has shown that all Tesla cars have the same failure mode when a stationary obstacle is on the highway. This is not some fringe failure that happens because of wrong classification, it's how the Tesla Autopilot was designed to function. Tesla has claimed explicitly that it's the driver's responsibility to avoid such a situation.


Tesla is not a good representative of the field; they are waaaay behind Waymo (Google) and Cruise (GM).


Miles driven is a useless metric. Stick your vehicle on a treadmill and have it drive a billion miles. What does that tell you?

Edit to add: a better metric would be something like "billions of decisions made where human life was at stake".


It's pretty clear the they meant "distance driven in ordinary conditions where humans normally drive cars".


That isn't the case though. Gm and Google both tell you that their incidents per mile (I think they use 100000 miles or something) is constant, because as they get better they test in harder situations. A automatic car on a desert freeway (no traffic) is easy compared to freezing rain during the afternoon rush hour in Minneapolis. (afternoon implying most people were out and so the decision to stay home isn't available). That is just on hard situation that comes to mind, I think those doing self driving cars know of others.


This is how some hardware is already certified for use in safety critical systems. ISO 26262 calls it 'proven in use'.


> testing the system through millions/billions of miles driven?

Makes sense doesn't it?

Well, actually no. The Google cars drive the same route every day in Mountain View with no deviation, so those millions of miles are really the same 10 miles over and over. Even seen 3 in a row behind each other.

Fools the regulators, and apparently you, though.


Deaths are often measured in deaths per 100,000,000 miles driven, with a current measurement of 1.18 deaths per 100 million miles[1]. If a car is driving 60 mph 24/7, it takes 190 years for it to drive 100,000,000 miles. Let's assume that, in a decade, 1% of cars are on autopilot. There are 37,000 deaths per year in the USA due to car accidents[1]. Therefore, in order for Tesla/Google/GM to claim that it is equivalent to normal cars, it has to drive 370 * (100 million/1.18) ~= 31,300,000,000 miles, with only 370 accidents. If $COMPANY really had, say, 400 accidents, then it would have to make up the additional miles. In specific, it would have to make up 30 * (100 million/1.18) ~= 2,500,000,000. That means a fleet of (190 * 25) = 4,750 cars driving 24/7 for an entire year. That's not impossible for $COMPANY to do, but it is starting to become unreasonable. What's more likely is that $COMPANY just straight up lies about its incidence rate to regulators. But that's hard to do, considering it's going to be public if someone gets hit by a car.

I don't think its possible to fool the regulators, but even if it was possible there are dramatic consequences for lying. Volkswagon tricked regulators into thinking their car didn't emit as much as it really did in 2007-2015, and ended up having to pay 33.3 billion dollars in fines and returns[2]. Volkswagon stock went from a high of around 27 to 12. 5 years later, it's only recovered to around 16. The massive lawsuit that would be formed if Tesla's (or other automated car maker) car was found to be unsafe would likely be even larger. Furthermore, it would crush the self-driving car business semi-permanently. VW at least still gets to make environmentally safe cars, Tesla and Waymo's main value proposition is dead in the water.

Basically, even though there are incentives to lie and cheat, I think the incentives to try to make something that is safe is far, far greater. And while I'm sure there are Google cars driving the same route every day, I'm sure there are also tests being done in all sorts of conditions all over the planet. I think that practically, socially, and politically speaking, we can use billions of miles driven. The stakes are way too high to lie.

[1] https://en.wikipedia.org/wiki/List_of_self-driving_car_fatal... [2] https://en.wikipedia.org/wiki/Volkswagen_emissions_scandal


> Without an error bound, ML can’t be in charge of anything that could put human lives at risk

Humans don't have "error bounds" either, and you trust them just fine.


Two counterpoints here:

1) Humans can estimate their own uncertainty. Ask a person to show how long a meter is, and they'll give you an estimate. Then ask them to show you the "error bounds", i.e. what they're "quite certain" the meter is longer than and shorter than. You are likely to get sensible bounds. Now, humans aren't amazing at this, but the brain does have capacity for estimating how uncertain it is.

2) No, you don't really trust humans. This very fact that humans are often imperfect in estimating their own uncertainty makes us very stupid sometimes. How many times you were sure you know something for a fact, for it to turn out to be completely false. This is why society tries to not put too much responsibility into the hands of a single person, or at least to provide help and/or safety mechanisms if that is the case.


Ask a programmer to estimate how long it'll take them to code something to see if we really have error bars on complex functions. The margin will be so wide as to be completely useless.

We do have error bars on simple measurements already. They're right there in the data sheet for the sensor. What you're asking for are error bars on things several levels distant in the layers of abstraction. Humans suck at that. We only cope the same way machines do: through constant negative feedback.


Programmers can give you good error bounds. Ask what the best and worst case is for solving a problem, and 80% of the time they will be within those bounds.

The problem is that management doesn't want to hear about the worst case, because it looks bad for them politically.


Humans are capable of generating and understanding creativity and complexity that are simply impossible for non-AGI automation. Even then, we don't just let people figure things out for themselves. We put them through training, and then test them. Even after that, we make them liable for negligence.

I don't think it's an obvious conclusion that error bounds aren't important for automation because they aren't calculable for a human. They are just very different beasts.


My point is not that it's not important. If someone comes up with a rigorous way to obtain error bars, that's great! I'll take it! My point is that trust should not and will not depend on it. How do you even quantify something like this to a layperson in order to persuade them?

Let's do a thought experiment: let's say, we had a self-driving car that's verifiably 10x better than human on average, yet does not provide "error bars". I know we don't have one now, and unlikely to have one in the foreseeable future, but bear with me here, for the sake of argument.

Would you trust it, rather than a random human Uber driver?

FWIW, I'm amazed that driving cars manually is perceived as normal every time I drive one. I can easily accelerate 2+ metric tons of metal to 100+ MPH, and get distracted, launching this deadly projectile with me inside into oncoming traffic. Most roads have _no dividers_. This does happen from time to time, lots of people die. Nobody gives a shit.

Humans suck so bad at so many things that robots will be better than them at a lot of fairly unconstrained tasks in the next decade or two. And I'm pretty certain they won't have error bars while doing what they do. Humans don't.


The other poster was talking about error bounds not error bars. That means they want to have a number that tells them how often a system fails and ideally a model of how and when it fails. So they want to have a system that comes with some guarantees. It works X% of the time, fails 100-X% of the time.

Note that X doesn't need to be high, or higher than a human, for example a self-driving car with a 60% failure rate is fine as long as we know when and how it fails. It's this "when and how" that's completely missing from self driving cars.

And in machine learning in general we just don't have any guarantees that a model trained on some dataset will perform the same way in the real world.


I think a lot of the nuance has been lost here. Even throwing around phrases like "10x better than a human" is already implying a lot of very vague measurement or knowledge. In what ways, and in what environments, is it better?

I've dipped my toes into robotics a few times, and every time I end up being reminded of just how painful it is. Even when you're working in an idealized simulator, it's extremely easy to find a little edge case that causes completely bizarre behavior. And moving out of the simulator only makes things far, far worse.

It's really easy to forget about those sorts of details and brush it away as just things to be solved while developing the automation. But they don't just go away so easily. Error bounds are there to help manage these issues and ensure we know how to best use the automation.

In many instances I think people would tend to prefer the Uber driver in a moderate reading of your scenario. A human driver is likely to perform somewhat consistently and predictably. If they drift around the corner and leave a long skid mark in front of your house you can make some assumptions about how they are going to drive. If you get in and see them struggling to keep their eyes open, you can again make some assumptions about their performance. A well-rested and safe driver is extremely unlikely to suddenly throw themselves into on-coming traffic with no warning. You can refuse or stop the ride if you judge you are not safe.

Automation is a different beast. It's liable to fail in ways that a human driver would not. It may be performing wonderfully until something a human would not even notice happens, and which point it may indeed throw you into on-going traffic. For example, look at adversarial examples in deep learning. As a passenger in this case you don't have a way to judge your own moment-to-moment safety. Even if it is safer on average, the sheer unpredictability and the resulting stress is likely going to eat significantly into any gains.


A human whose life is at stake controlling a car here at this moment vs a human who is far in space and time controlling someone else's car by programming it.


Humans have a mental model of humans, they can predict what another human is going to do with reasonable accuracy. Robots are wholly artificial so there is no existing mental model we can apply to them to understand why and how they act.


You have no mental model of the crazy person who drives his car into oncoming traffic.


Sure, and that's why we try not to let people with altered states of consciousness or with unpredictable rationality (children) drive.


>> Humans don't have "error bounds" either, and you trust them just fine.

Yes, but humans are already here and doing all those dangerous and difficult things. But if we're going to replace them with something, it makes sense to want to replace them with something that's better than them. I mean us.

Otherwise, we might as well stick with the humans. Especially since we already know how to make humans (whereas self-driving cars, not so much).


I personally do not trust humans.

However, we do have inertia with humans running such situations, and until there is something provably/demonstrably better, I don't see the current situation changing.


Trust is a continuum. I don't see how you can say that you do not trust humans at least to some extent, no matter how tiny that sliver of trust is.


A good point. I do of course trust humans to an extent, but as a general rule I feel that it's been beaten into my head by enough experiences to expect for humans to mess something up.

This includes myself of course.


I find the inverse surprising: that many algorithms that work on real-life robots _do_ _provide_ error bounds and their optimality / convergence properties are proven in the papers that introduce them.

A great example of this is motion planning, where papers both on sample-based methods (such as SST), and on search based (descendants of the A* family) argue at length the theoretical optimality and convergence properties.

On another note, I think requiring more theoretical analysis as a guarantee of safety could partially be an AI-winter meme rather than practical solution. Point in case: do people run a quick check of aerodynamics maths before boarding a flight? No - they rely mostly on the engineering and regulatory process that gradually made passenger flights safer.


It seems you are taking about theoretical error bounds, that is proofs in papers with assumptions on input probabilities etc. These don't always apply to actual implementations, in physical real robots. There is a huge gap in safety between practices in aerospace engineering and robotics.


Many localisation algorithms are probabilistic, eg. 'you are here within R of X'. A couple of times I've had a manager who could just not accept the probabilistic nature: "It's just right there!"

So I to explain my manager that we just cannot do better and know for sure that the robot is really in position X, especially with the limited sensing the project would afford. Sure, you can do the classic AGV thing and add magnetic markers everywhere. Or use more sensors to get higher accuracy, but none of those were popular options.


Why bother, you'll hit the uncertainty principle sooner or later anyway.


The solution generally employed is to separate humans and robots if the robots are moving at speeds or momentums that could harm a human. There are safety regulations to be complied with and the general idea is that your robot should not be able to harm a human no matter what error your software suffers from. This is sadly impossible with self driving cars but applies to most of robotics.


Afaik there is some progress being done on motion planning and control parts with regards to that, mostly using reachability analysis for continuous systems or using things like control barrier functions.

I work in the perception/localization domain and I am not aware of any large developments in that direction. I do know that there are certain ML based perception systems that got some levels of ASIL certification.


Purely in the space of geometry and simple motion planning, there are reasonable methods with guaranteed correct solutions, provided their sensor data is correct. Of course, combining correctness (the solution is actually safe), completeness (a solution will be found if it exists), and bounded computation time (get a solution in actually useful time) is a much harder combination to achieve.


Though, probably humans fumble or bump things >1/10k times. I certainly do, anyways.


Also, replace deer with something self driving. Those things are so stupid when near the road.


99.99% is pretty fine from the engineering point of view. The buildings and other constructions surrounds us have about the same theoretical reliability considering all the uncertainties involved like weather, load and impacts, long term characteristics, material and manufacturing uncertainties. Of course this centuries long trial and error supported pretty simple science of construction engineering needs to be supported by law level standards and regulations allowing certain low level uncertainty otherwise the 'engineers want to sleep at night' aspect was forcing us to make only very expensive bunkers to live in (or only sociopaths becoming engineers). There is an accepted level of risk involved in engineering.

Usually mathematics and (proper) algorithms is the topic where everything is 100% (good quality actually finished work, not an early prototype released as final is assumed). It just may or may not to be fully relevant to our life.


> The buildings and other constructions surrounds us have about the same theoretical reliability

You are off by a few nines.

Transportation machines are expected to have 5 or 6 of them. And those are the most dangerous kind we keep around. Everything else is more reliable.


No, I, am, not. It is about 99.998% Please do not cite numbers from different fields.


By that number you mean that 2 in 10000 buildings fail at some point due to some design issue? Because building reliability is a very hard concept to define, even more in the context of machine reliability (we are discussing robots here).

Anyway, machine reliability is calculated over usage, not lifetime. It gets much larger numbers.


Everything listed in this article is framed as a software problem, but the beauty of robotics is that many problems can fall to either software, hardware, or electrical solutions.

Without further ado, a very incomplete list of hardware and electrical innovations that would push robotics forward.

- Cheaper and smaller low-backlash actuators. Motors and associated gearboxes are big, heavy, and expensive, which is a big part of why our robots have singularities in their designs, making the motion planning problem more difficult.

- Actuators with good force-speed curves. Muscles have both great torque at zero speed and great speed at zero torque. (weight lifting and throwing a baseball). Only hydraulics come close to matching both numbers, and they're heavy, expensive, and tend to leak oil on the carpet.

- Cheaper force/torque sensors. Most robots today don't even have torque sensing on all their actuators, let alone the sort of dense 6dof-sensing full-body surfaces that animal skin provides.

- Across-the-board robustness improvements. Robots more mechanically complicated than a quadcopter tend to break a lot. This makes approaches like training ML models directly on hardware difficult.

- Reliably low-latency wireless. There's been a lot of hype about cloud robotics, and there's a lot of potential in offboard sensing, but we need a cheap communication system that can deliver <30ms latency for those systems to work reliably. WiFi is great until you pass through a metal doorway and the signal degrades for 200ms.

- Cheap and light lidars: true depth sensing with long range that works outside means a lot of hard computer vision problems get easier, and means your robot can be smaller since it needs fewer cameras

The thing I love about working in robotics is that we don't need to solve all the software problems and all the hardware problems to make a system that works well in the real world. We get to pick and choose which is easier, and often the solutions in software space depend intimately on the kind of substrate they need to run on.


I work in robotics. The problem for pretty much everything is one of scale and commercialization. Every one of these issues has been addressed, but almost none have a good and bundled solution.

It feels like computers in the 70s. All the pieces are there but there aren't mass produced PCs yet somewhat because there aren't the suppliers to make things easier.

This is changing fast.

For example, SLAM has many things that work, but they require fine tuning, and most contain undocumented features that require reading the source to find. Slamcore is a company working on this, I hope there are more.

Teleportation was "possible but hard", but Freedom Robotics has a good solution now that mostly "just works"

Robotic bases work well, but they will ship them to you without things plugged in so you have to find the issue and fix it yourself. AWS Deepracer is clearly a prototype for a solid wheeled base. The documentation and build quality is an order of magnitude higher than anything else in the space. My guess is they launch it as a useful base in the next year or so.

Depth estimation is pretty good with Intel Realsense now, and the new OpenCV OAK is another attempt here. I think this is more solved and packaged than anything else.

ROS2 was only properly released in June 2020, and it is a huge step up from ROS1 for commercial applications. It probably needs another 1-2 years for the community to finalize supporting it.

I think for founders finding known robotic solutions and making them into robust commercial products is a great space to work in.

The next few years are very interesting in this space, as even 1 year ago everything was much harder and it's rapidly getting easier.


> Teleportation was "possible but hard", but Freedom Robotics has a good solution now that mostly "just works"

For somebody not in robotics, what does "teleportation" mean in this context. I assume that it doesn't mean the Star Trek style beaming that comes up from a Google search.


I think OP meant "teleoperation" - controlling a robot from afar


*yes, teleoperation (autocorrect).


What do you think about RL learning approaches? Just wondering if that stuff may work in practice. If I'm not wrong Sergey Levine said, that the only problem is sample efficiency. So you would need to simulate the real world, or let your robot brake 1 million time dishes till it learns.


Not OP, but where I work (Fox Robotics) we leave RL to demos and researchers. It is very promising, but less understood and less proven, so we aren't in a hurry to design any products around it.


I love the ever prevalent pessimism/cautious optimism in the field of robotics (industry and academics alike). It's a fresh breath of air from the ever over-hyping ML/AI field.

On a related note, an open problem I see in practicality is also: How do you manage an robotics company effectively? iRobot seems to be succeeding in this well, and so does some industrial robotics arm companies but the latter is more about industrial automation than the more general "robotics" company out there.

Some companies that go very broad general solution seems to be struggling, the application based robotics seems also fail more often than succeed.

There have emerged a lot of management methods and theories around software development (Agile etc), but what's the efficient management method for robotics?

Having been working at a few robotics company as a junior, It have always been either: 1. Someone with extensive research/engineering experience in a subfield of robotics in management, that can't manage the other subfields 2. Some one with too general knowledge and can't balance between each robotics-subfields needs including production & reliability + cost.

and both seems to do pretty bad, while the second one slightly favourable.


The pile of problems in Robotics reminds me of the challenges faced by computer vision before modern methods were developed. To generalize the issue: humans learn and perform vision and navigational tasks "below" the level of language.

To use a vivid example, you can use language to teach a child how to hold a pencil and write, how to recognize digits, and how to do two digit addition. But there's some cognitive ability "below" the level language a child needs to be capable of in order to take the hints from language and develop the desired skills. In computer vision, we now have something like this in ConvLayers.

In today's robotics we see the hyper-systematization and mathematization of natural concepts like getting an agent to recognize its own bearing. This is what I refer to as "at the level of language", and I think as long as we try to solve these individual navigation problems from mathematical first principles we'll never arrive anywhere near the performance of animal "instinct". We need a new framework that doesn't look like an enumeration of commands within an imperative program.


That's a very interesting perspective. There have been many articles complaining about the "hype" behind ML, but I too wonder if DNNs could assist with controls, especially when it comes to reacting to sensor data. After all, it's just matrix math.


Ugh. This is making a hash of it.

What is true: The understanding of laymen about what robots can do is very much out of touch with reality. Making real world robots is very hard. I think it's movies which made people think that any 6 year old can just plop together a C-3PO.

What is the hash then? (I omit the points where I lack experience.)

1; Motion planning

"Even things like a model of where the robot is, with respect to the surroundings" That's not motion planing. Motion planing starts from when you have a model of yourself and your surrounding. There are theoretical challenges (imagine a flytrap, or maze) but the real world challenges in my experience come from that under the inaccuracies and glitchyness of perception you are expected to make okay-ish decisions.

3; SLAM

"there’s always going to be new obstacles (a pair of shoes, a book)"

SLAM is about localisation. It gives you a 6dof pose from some fixed coordinate system. It does not deal with tripping hazards. That's obstacle detection. It does indeed deal with "mapping" but only in as much as to get you that pose estimate.

"not turn-key to where there would be a SLAM module you can buy for your robot."

Sure you can. The "Intel Realsense t265" is for example one such a module.

The inside-out tracking of the Oculus Quest is an other (albeit one you will have hard time buying for your robot.)

6; Depth estimation.

That one is odd. True, getting full fidelity depth maps out of monocular images is a research problem. But if in reality you want to estimate the distance to your beer bottle you will use stereo images, or a Kinect like depth sensor. It is obviously a hard engineering challenge to make it four 9 robust, but not an unheard of challenge. If his definition of an "open problem" is that there are people writing research papers about it then it will remain an open problem for a while. But if he just wants to depth estimate stuff, then there are already working good ways. Maybe cost prohibitive, maybe not robust enough for his liking.

9; Scene understanding

Humanity made insane leaps and bounds on this one. Again there are engineering tradeoffs. How much accuracy you get for how much watts. What kind of training data you need, etc etc. Our systems are nowhere as good as a 5 year old human child, but I think we can handle that "beer bottle obscured by ketchup bottle" challenge if we try.


Depth estimation

I wish it were as solved as vendors would like to believe it is. If you want a sensor for medium-range applications, say order 0.5m to 10m depth with better than 1cm accuracy, that works indoors and outdoors, with reflective and untextured objects, doesn't interfere with other sensors of its kind, it simply does not exist. Traditional active and passive stereo is OK provided you have texture, ToF so long as the surfaces aren't reflective and you don't have sunlight to worry about, structured light if you can control lighting and can otherwise control/avoid interference from other sensors. There's some promise in learned stereo matching, but collecting enough data and running fast enough inference are big challenges to practical use.

Existing depth sensors for manipulation tasks, even indoors in reasonably controlled lighting, are still mostly insufficient given that the objects they struggle to see are often the objects you want to manipulate.


I strongly disagree!

I have seen fantastic results with stereo cameras, colored lights, and self-calibration.

For most use cases, it's no problem if your robot will stop for a few seconds, rotate the camera axis around a bit, and then continue. But that appears to be good enough to calibrate the features for tracking things like a reflective and transparent glass jar.

As for the precision, I agree that 1cm at 10m distance doesn't work. But 1cm precision at 50cm distance is doable. And for a robot arm, you mainly need the precision when you're close to the object.

And yes, I am talking about what you probably meant with learned stereo matching. I would call it close to solved because we can by now do unsupervised training and achieve usable results. While I had trouble reproducing this specific paper, the general idea is valid: https://github.com/google-research/google-research/tree/mast... https://arxiv.org/abs/1904.04998

We are also seeing good results from using random YouTube videos to train AI vision.

But given that you are so sure that this is unsolved, I wonder if I should start a company to sell my depth estimate pipeline. Would you have any example image pairs that are causing problems, so that I can see visually what fails?


If you have a vision system that "just works" and produces high-quality pointclouds in an actual kitchen environment, with reflective appliances, silverware, shiny countertops, shiny ceramic dishware, and glasses, we would absolutely use it.

We have in-house research work on both learned monocular depth (so-so for robotics tasks) and learned stereo disparity (much more promising), so progress here isn't impossible, but none of the off-the-shelf products really solve the problem.


It "just works" for our use case. We put in 4K @ 60fps and receive 960x540 @ 20fps of stereo correspondence pairs. So every matched pixel is averaged over 3 frames in time and 4 pixels in every direction in space, meaning 9x9 convolution kernels. In other words, we make the video super clean by area sampling in space and time.

The specific part about our system that makes it usable for me is that for pixels that cannot be matched with a predetermined quality, it'll return a gap marker instead of guessing. For SfM, that means you can just skip those pixels that are affected by reflections moving around. Cooking pots and plates tend to have enough scratches, design, or shape markers to work OK. Wet white floor will usually be flagged as "unknown" except for the grey gaps in between tiles. As for glass, our system can return up to 2 flows per pixel, meaning for a glass mug we get both the mug and a see-through estimate.

If you look at Sintel Clean+"s0-10", you'll find that there are some learned matching algorithms that perform quite well under those conditions: http://sintel.is.tue.mpg.de/quant?metric_id=6&selected_pass=...

We're H-v3 (2nd place) and that 0.284 EPE for s0-10 (slow movements, small disparities) is quite workable, because that means you have on average less than 0.1 pixels of disparity error on the 960x540 depth maps.

As for monocular depth, I see that as mostly of a memorization task again. You train a good stereo matching, then do unsupervised learning on stereo data to get the monocular AI.

I'm at hajo.me and I currently work on stereo matching and depth mapping with the goal of improving VR. Can you disclose which company or what in general you're working on? Also, do you know any discussion groups where optical flow / stereo matching people usually hang out?


Yes, in general if you can capture multiple views you can make up for many dropouts and artifacts. The more you can move, the more likely that you'll get a complete reconstruction. Single-view artifacts are more of a problem when reaching into confined areas or during visual servoing where you don't necessarily have the freedom to move around to get a better view.

The results for the Sintel benchmark do look interesting. Do you have a report or some sort of overview of your approach? It would be nice to see a similar benchmark on real recorded scenes, especially if that provided a way to compare learned matching algorithms with available sensor hardware.

I think there's reasonable promise in learned stereo matching, especially if we include more information than just visible spectra. Human eyes are so much more than two RGB imagers, so we shouldn't limit our robots to that either. Monocular depth, I agree, seems to be mostly a memorization problem. In cases where you can effectively memorize everything, it will work quite well and the savings in hardware complexity (and physical sensor size) will be well worth the training complexity. I actually think it has much more promise as a backup depth perception method on cars, since objects in a driving context are mostly consistent in size. I have doubts about how useful it will be in manipulation tasks where you may encounter similar objects at a range of sizes.

I'm part of a research group at TRI working on home manipulation tasks although I have something of a hobby interest in outdoor robotics as well - in general I'd say the depth sensors for outdoor tasks are often better, but much more expensive and out of reach for hobbyist users.


One trick that I have seen for confined environments is to rotate the plate with both stereo cameras around its forward axis. That way, you can convert the left-right stereo to up-down stereo and especially with highly reflective stuff, there's a chance that that will be enough of a change to reduce reflections.


Hypothetically, it seems that your problem has to do surfaces being reflective in the visible light spectrum. Do you have any luck with ultrasonic, ultraviolet, or IR sensors?

Also I was also thinking about how light source estimation might be viable work around for reflective surfaces. http://www.thomaswhelan.ie/Whelan16ijrr.pdf


Near-IR, which many of the available ToF (Kinect2, K4A, Basler/Lucid/Sony), active stereo (Realsense D4xx, Mynt, plenty more), and many some structured light (all Primesense derivatives, some older Realsense) cameras use, as well as close UV frequencies, suffers much the same issues as visible light. Farther wavelengths in shortwave IR start to have different/better properties, but the costs of cameras are just astronomical (~$20K/camera). Farther UV is also possible, but eventually we're dealing with frequencies that are actually harmful.

I do think there's a lot of open possibility in using more than the traditional visible spectrum. Using polarized sensors to remove reflections, or hyperspectral cameras to reduce noise that only appears in certain bands are good ideas. The problem is that we've got lots of cheap imagers that are really well built for visible spectra, and not a lot outside that. Small hyperspectral cameras for things like agricultural drones do exist, but they tend to work by combining multiple sensors (one for each band) which works fine for the ranges a drone operates at but not very useful at the close range indoor robots need.

Ultrasonic has a rather bad name in the field, a lot of people having used absolutely terrible robots with basically useless ultrasonic rangefinders early in their careers. In theory sound could be really useful, and there are some very nice (and very expensive) imaging sonars for underwater robotics use, but I'm not aware of any high-resolution ultrasonic sensors for land robots. One minor challenge with ultrasonic sensors in real products is ensuring that they are inaudible to people and pets - when they are almost-audible they are extremely annoying or even painful.


> but I'm not aware of any high-resolution ultrasonic sensors for land robots.

Sound cameras are very cool- passive sonar, but they're audio frequency and not usually depth-sensing. There's a lot of room for improvement there, but its a relatively unknown area and definitely hasn't found its niche.

High resolution air sonar is necessarily fairly large, which makes it decidedly unsexy. It also puts a lot of strain on the potential for a low cost BOM when the pcb and chassis are ~>12" wide.

> One minor challenge with ultrasonic sensors in real products is ensuring that they are inaudible to people and pets - when they are almost-audible they are extremely annoying or even painful.

One thing that really surprised me when I got into it was that this isnt a trivial problem. Part of it is practical- the cheapest and most common sensors are around 60-80 kHz (mechanical size makes them easy to produce, with the drawback of a high-ish time constant).

Getting above the range of domestic animals isn't too hard or inconvenient, but it's surprising how extensively evolution has adapted for sound in bats. If you want to avoid harming them you need to keep above ~150 kHz, which gives you a fairly uncomfortably small bandwidth before you're into the highly-attenuating frequency ranges. Underwater it's a different story (biological structures have difficulty resonating at MHz+) but the frequencies that would give you sub-millimeter precision (350 kHz) are way more inconvenient than 40 or 60 kHz. You need totally different types of piezos and the range is sharply limited unless you're projecting volumes that would deafen at audio ranges.

Im still very optimistic about the role of sonar even as cheap lidar and depth cameras crowd the space, but its pretty difficult today to make a truly good sensor.


I'm in the same boat. I'm now leaning toward the idea that single-shot depth estimation isn't the way to go. Throwing away all of the rich motion and flow information from previous frames seems like such a waste when we could be using it to update a dense model instead.


In SLAM a pair of shoes constitutes both an obstacle to be avoided as well as an unexpected landmark that might confuse your position estimation. I wouldn't call this an unsolved problem, even textbooks will give techniques to mitigate it, but it is a problem that roboticists need to be aware of.


> SLAM ... does not deal with tripping hazards.

Do we need to add kinaesthesia to the list of open problems then?


Autonomous robots sound cool, but there are so many problems that could benefit from robotic automation that don’t require full autonomy. In controlled environments where we know the entire state of the (local) world including the position of our robotic swarm we can deterministically plan a motion and leave the world in an expected end state. Maybe use a bit CV to make sure our assumptions are right, if not hit the break else continue.

Construction, mining, infrastructure, agriculture, manufacturing, logistics all contain problems spaces that could use non- or semi-autonomous robotic automation. Still a difficult problem, but how can we expect full autonomy without controlled robotic environments first?

Yes we use robots in manufacturing and a few warehouses, but that’s it..?


As an employee of John deere I can tell you that we are currently shipping most of the automation you suggested. Humans are still in the tractor but they often are not driving them.


> Guys like Rodney Brooks seemed to accept this and built various robots that would learn how to walk using primitive hardware and feedback oriented ideas rather than programmed ideas. There was even a name for this; “Nouvelle AI.” No idea what happened to those ideas; I suppose they were too hard to make progress on, though the early results were impressive looking.

The problem was that subsumption didn't scale, but the idea was incorporated into the so-called three-layer architecture:

http://www.flownet.com/ron/papers/tla.pdf

(I am the author of that paper. AMA.)


This paper seems to pre-date behavior trees. Any comment on how TLA relates? (I skimmed but will give a closer look when I have more time.)

I was curious what Brooks now thinks about subsumption and found this from a few months ago:

The approach to controlling robots, the subsumption architecture that it proposed led directly to the Roomba, a robot vacuum cleaner, which with over 30 million sold is the most produced robot ever.¹

Like many, I got a Roomba not long after the pandemic began. I was disappointed by its poor sensorimotor system. Within a few days its IR cover was scuffed up and it was covered in scratches from getting stuck under an office chair. Brooks doesn't say that 30mm Roombas incorporate subsumption, and no doubt after a decade or two of programmers I wonder about the nature of the codebase. The Roomba's behavior is entirely unpredictable, as sometimes it will bump into something full speed and sometimes it will slow down as it approaches. There are a number of other issues too long to mention including the charging contacts and recently it started roaming around with its charger still attached for no apparent reason.

¹ https://rodneybrooks.com/peer-review/


> This paper seems to pre-date behavior trees.

Indeed. By a good 20 years :-)

> Any comment on how TLA relates?

I've been out of the field for a long time so BTs are new to me. All I know about them is from skimming the wikipedia article. But they look to me like a more formal implementation of the TLA sequencing layer.


I think one of the problems is that we don't have the right people working on the right problems.

Self-driving cars seem to have absorbed most of the world's high-level robotics efforts for the past decade or more. I was always skeptical of that application because my experience has shown that weak-AI works best when there's a human backup and/or the stakes of an error are not too high. That isn't the case with self-driving cars where a lethal mistake can occur in less than a second.

I think we would be much better off today if much of the self-driving car effort had been focused on household utility robots and/or business applications. No it wouldn't be "saving lives", but it would be saving countless life-hours spent on mind-numbing tasks that are a major reason for why life is so unpleasant for so many people.


What household tasks do you think are well-suited to robotics that aren't already addressed?

The reason so much focus has gone into autonomous vehicles is because there are trillions of dollars to be made there by the companies that solve it, and many obvious use cases will result in large profits even with imperfect solutions (where "imperfect" means it works only in specific design domains like the southwestern US during the day with no rain).


All of them - cleaning, cooking, laundry, etc. Everything that a human needs to do to maintain reasonable living conditions and would hire someone else to do if they were rich.

EDIT: I think that if you could come up with a really good solution for all of that most people would be willing to pay about as much as they do for a car.


TL; DR: Robotics is hard. Nature is impressive.

(Source: 2 years working on a humanoid bipedal robot. Now, I am constantly amazed that people can balance all that weight on two spindly little legs. Running is a miracle)


Honestly, late yesterday evening after work I was looking at floor full of toys that my little kids were playing with in yet another lockdown day thinking I wish there was a robot that I could build or buy to tidy this up.

Did some research found this research project, promising but from 2018 and looks like it didn't go anywhere.

https://youtu.be/geub-Nuu-Vw

So now I'm thinking what about the build option.


Build it. There is a global market of parents who will buy it.

But on the other hand, why not accept the toys on the floor? You are fighting entropy for no reason. You sleep at night, you will work tomorrow during the day, and when you look again, the toys are in an equally dispersed state. Why not let them stay in that state for days until you need to hover?


> Why not let them stay in that state for days until you need to ho[o]ver?

Risk of personal injury. (The Lego-on-the-stairs scenario.)

Also, some people just like a calm visual field at home.


All very reasonable arguments. What I don't understand is cleaning up in the evening. There is no visual field to perceive if you are asleep.


Perhaps you've not yet reached an age where you frequently need to get up to pee in the night (sometimes several times) and keep the lights off to avoid disturbing a sleeping partner... ;)


It ain't the sleeping in bed, it's the crusty eyelids in the morning. :-)


It's to make sure that when you walk around half-asleep early in the morning trying to get a diaper, that you don't step on a pointy lego brick.


Do you walk heel or toe first [1]? I would assume that lego bricks only hurt when moving heel first since humans had to deal with stony environments for quite some time. Not that you should stop cleaning up but this could be another technology to deal with the bricks.

[1] https://news.ycombinator.com/item?id=1086446 Barefoot Running


Heel first. Interesting idea :)


I only use my living room in the evening. When else would I want it tidy?

Well, maybe Saturday morning. But starting the day tidying is not great either.


Isn't life just one big fight against entropy?


Technically it isn't. Life needs entropy for variations and evolution. Also, life doesn't fight. It's identities that fight to maintain themselves. Life just keeps on living.


What I notice from this thread (and many similar ones) is that actually AI may replace sooner or later lawyers, some programmers, system administrators, engineers of various ilks; but the sweeper, the toilet cleaner, the janitor, the restaurant maid and the dishwasher, the delivery guy and the auto repair man, all of the relatively "low skilled" jobs are absolutely safe.


A bit shocked to realise how I'd completely forgotten about robotics whereas it had seemed the grandest challenge. Is robotics itself slipping toward being marginal as physical manipulation is becoming less important over time? The sci-fi robots don't even walk these days, they're holograms etc. I'm not a software engineer so it's not just that


> I’ll point out that the humble housefly has no problem understanding the concept of “shit in front of you; avoid,”

Surely in the case of the house fly, it would be more like 'dinner is served' than 'avoid'!

But in all seriousness, this is a good list to remind us how vastly far away from human-level AI we are.


Flys move reflexively don't they? They bump into walls and windows all the time.


The objections to neural nets as the solution to all the problems on his list are the same as the old objections to neural nets as the solution to computer vision, speech recognition, translation, playing Go, etc. The objections will fall in the face of overwhelming evidence that neural nets simply work better than other approaches to these types of problems.

For a long time software was the reason robots didn't work, but that time is ending. Hardware will be the bottleneck soon if it isn't already. No general purpose robot will ever be successful in the mass market using electric motors and gearboxes at each joint. We need simpler, cheaper, lighter, more robust, more reliable, backdrivable, force-sensing actuators.


> The objections will fall in the face of overwhelming evidence that neural nets simply work better

Perhaps? But progress is stalling[1]. We need theoretical foundations.

[1] https://www.sciencemag.org/news/2020/05/eye-catching-advance...


Wrg to your reference:

One wonders why the "modest tweaks" weren't used in the original paper - indeed because they weren't known or weren't possible.

The way you should interpet these meta-analysis is "researchers are motivated to attribute improvements in performance to 'the interesting bit' and they will run baselines without applying similar tweaking effort to them"

The progress in the field can in some sense be indicated by the huge improvement in the 2006 method due to the "tweaks" discoverd since.


Any research/potential example of such actuators ?


I would say some of the most promising work right now is in quasi-direct-drive actuators, basically low-gear-ratio brushless motors. Standalone electric motors are basically linear in terms of electrical power->torque, and even with a gearbox, provided the gear ratio is low enough, you can assume near-linear behavior. Brushless motors tend to be easier to design around specific size and shape constraints, and often run slower than brushed.

The "secret" to a lot of this actuator work is that outside of highly repetitive industrial tasks, actuators are rarely running at full power so you don't need to design them (and their cooling) for continuous max power operation. Somewhat like human muscles, you can pick smaller motors and/or lower gear ratios that cover the normal cases, and still be OK with rare overloads so long as the overload isn't long enough to cause overheating.


Force can also be transmitted with belts, cables etc, like tendons. Then you don't need necessarily need a motor in each joint.


Probably electro active polymers


>Multiaxis singularities -this one blew my mind. Imagine you have a robot arm bolted to the ground. You want to teach the stupid thing to paint a car or something. There are actual singularities possible in the equations of motion; and it is more or less an underconstrained problem. I guess there are workarounds for this at this point, but they all have different tradeoffs. It’s as open a problem as motion planning on a macro scale.

This is a math problem applied to robotics, not a unique problem to robotics. If there are workarounds, then isn't the problem solved?> Factories have robotic arms and they seem to do an adequate job making cars and other stuff in spite of singularities.


The workarounds are designed to solve particular cases. A robot arm in manufacturing is stationary, has a controlled environment, and often a fixed task (or at least task type). The task is know in advance, and humans are involved in developing the solution for each particular task. In an open environment one cannot rely on those things. When the task is not known up-front. Or the new tasks come too quickly for a human to be involved in solving it. Or with influence from other actors, some possibly uncooperative or adversarial.

A lot of 6 axis robots actually work in 3D+3D. That is, they position their arm/tooling first into a work pose, then perform their actually work in a 3D space referenced from that point/orientation. Then poses are chosen such that the space it works in is singularity free. And moving between poses there are then explicit solutions (chosen by human) for dealing with singularities.


And that's just equivalent to a human lifting a chair without hurting himself.


I thought quaternions work around gimbal lock?


In the case of singularities in robotic systems the issue is actually physical (akin to actual gimbal lock in a gimbal with only 3 axes). In certain positions it's not possible to move the end-effector in certain directions, and near those positions you may need large movements of the whole system to make small movements in the end-effector (which can be especially violent if the robot attempts to pass near the singularity at a constant velocity).

The solution, kind of akin to what quternions do, is to have more degrees of freedom than you need (like a 4th ring on a gimbal), but this doesn't eliminate singularities, it just makes them possible to avoid for most of the range of motion (extreme limits may still contain them). Actually coming up with a reasonable solution which avoids them is still not solved in general (though it's solved well enough for practical purposes in a lot of situations).

These singularities exist in humans as well, though we're quite well adapted to avoiding them in normal motions. If you've ever found yourself making an awkward movement (especially needing to readjust your grip to continue a motion), then it's likely you've encountered this without realising it.


The only solution to mentioned problems is IMO: we'll have to adapt our environment to the moving machines. At least in early stage, before they will do it autonomously. Language will adapt more by itself (to make machines pass Turing test we need change in language, just as we need change in machines - people will adapt to useful machines)


I know that there's a lot of successful work in specific controlled environments (company X's factory floor), and that a lot of environment understanding/SLAM is broadly unsolved in arbitrary uncontrolled environments, but what about specific uncontrolled environments? What if i want a beer serving robot to learn a consistent, high-accuracy model of just my house, the way it is, and I'm willing to put in some technical work? Can i do any better than completely hand crafting a 3D object model or similar?


If you have high-quality 3D sensors (or in some cases, good-enough 2D cameras) and usable IMU/odometry, you can assemble quite reasonable 3D models for many small environments. The biggest challenges in a home environment are that many items we want in our houses (stainless steel appliances, chairs with thin legs, glass anything, reflective floor materials) are almost pathologically bad from a computer vision standpoint and are very hard to see and avoid in an uncontrolled household setting. Vision-based navigation against a known environment is doable with reasonable reliability, but adding fixed markers or positioning systems goes a long way towards robustness.


Willow Garage sold a robot that did that. Only $440,000 each.

It was a vanity project of the Google founders/early employees, but they contributed to ROS (the Robot Operating System.)

https://en.wikipedia.org/wiki/Willow_Garage

https://en.wikipedia.org/wiki/Robot_Operating_System


> 6. Depth estimation

At least indoors, this was already handled pretty well 10 years ago by Kinect, and there are many somewhat robust approaches based on binocular vision. Not sure if this qualifies as "very much an open problem".

> 9. Scene understanding

The mentioned example is not the best one - anticipation of collisions has been a feature in cars for many years now (usually some warning system, sometimes with automatic emergency breaking). But it's a very constrained problem; more generic scene understanding is of course still very difficult.


What really surprises me is Apple is now 2 trillion dollars with another gajillion sitting in cash. Why are they being an optimization company and not really pushing the edge on robotics and automation? I was excited about Apple car but it seems that project is ded.

Why not invest that money in moonshot ideas ? It seems Elon musk is the crazy one with Tesla, Solar City and SpaceX.

It blows my mind how their massive rockets reach orbit and land back. Why haven’t we been making breakthroughs like this in robotics?


As difficult as it is, control for launching and landing a rocket is vastly easier computationally than even apparently simple robotics problems.

Robotics is simply hard, from algorithmic, computational, and engineering perspectives. Only some parts of it can be solved by throwing money and raw compute power at it. Even in areas where progress has been made, you generally need experts to decompose the problem at hand into something that can be tackled with existing techniques,and those experts are expensive and in relative demand.


I was really into FIRST robotics when I in HS and would love to make something at home. I thought about a VEX kit but didn't really want an erector set like. Does anyone have a recommendation for something like Misty Robotics [1]. I can only think of the Mindstorm but wanted to do some visual AI with it.

[1] https://www.mistyrobotics.com/


Is there a similar report for open problems in AI or ML?


I don't know about any report, but an analogous problem (to "get me a beer") might be something like: -

"Hey Siri, write up minutes of the meeting we just had and email them to everybody".

There are quite a few open problems in that, including analogous navigation of an ill-defined environment which nevertheless has regularities.


Well, at least two of these problems have been solved already and just need to be implemented.

For example, my cheap mirrorless cameras solves two of the problems on the list. It's phase detection sensors can both do depth estimation and position estimation quite rapidly in order to make the tracking autofocus systems work (autofocusing is really calculating the distance between object and lens).

You would really just have to do a bit of integration, it's really a solved problem already.


The parent comment was clearly written by someone with no experience in robotics. In fact, every single problem on this list is "solved" in the sense that there are examples of systems that perform it in some limited context. Generalizing those subsystems to handle the difficult edge cases and integrating those subsystems together is 99% of what makes robotics as a whole difficult, and it would be similarly deceptive to say that AI is "solved" because Microsoft Clippy exists.


I do have experience in robotics. There exists robust solutions to depth estimation and position estimation that are deployed in the real world, in difficult applications where they have been generalized to work with any object you can draw a box around. There is a difference between a problem that is solved in theory and a problem that is solved enough so that you can buy polished products that implement a solution and work essentially without failure. I don't think you can compare optical phase detection in the context of position estimation and depth detection to clippy in the context of AI. Phase detection is quite literally a closed form optical solution to the problem of "how far is this object away from me" as long as the object is within a few thousand times the physical aperture of the lens. It's mature enough that you can use it to drive a motor in response to movements or "that bird", "that teapot", "the closest object in that clump of pixels", "the farthest objects in that clump of pixels", as well as calculate the velocity of the aforementioned object in three dimensions. For all intents and purposes, if you have a problem of the order "what is the distance of that object" as well as "how is that distance changing over time", then you can solve it, and indeed it has been solved to very high reliability, using phase detection. That's what it means for a problem to be solved.

In other words, I wouldn't call an implementation where you can click anywhere on an image and receive a distance, all of the time, with almost any lens imaginable in any environment where there are enough photons hitting the sensor "performed in a limited context".

The technology simply hasn't been used in mainstream robotics, mainly because it's patented and difficult to implement from scratch, but these are all implementation problems not fundamental problems.


Structure from focus is an established technique that can produce quite good results, but it's not quite as easy as the existence of good autofocus in cameras would suggest. SfF techniques often operate the the inverse of an autofocus system - rather than focus on a specific point to figure out depth at that point, you run through a full focus cycle and then for each area of the image, you find the focus setting that made that area in focus.

The challenge there is that running through a full focus cycle and capturing frames as you go is quite time-consuming and introduces serious artifacts if anything in view is moving. Phase-detection autofocus is so fast in part because it's only using a sparse set of autofocus points, rather than the full sensor resolution.

The other major problems with SfF are the same the photographers encounter all the time - the depth value is only as accurate as the depth of field will allow, and shallow depth of field requires wide apertures (and is generally easier on larger sensors, as well). Large lenses and sensors may work well for stationary scanning applications, but struggle to be useful on robots.


A lot of these problems have been fixed by recent advancements. For example, splitting a few thousand pixels into two and using that as a phase detection sensor provides universal coverage.

As far as large lenses and sensors not working well on sensors, I think you'd be surprised just how well large lenses designed for the sensors they belong on can work. Indeed, by far the reason why photography lenses are huge is because of lens IS and zoom. Making small, light, fast normal lenses with larger stabilized sensors works perfectly fine and can be made very light.

After all, humans have two huge sensors (slightly bigger than full-frame) with f/3 lenses, and it works perfectly fine.

Now, to what I think is missing from the state of the art in robotics:

>Phase-detection autofocus is so fast in part because it's only using a sparse set of autofocus points, rather than the full sensor resolution.

This was true a decade ago, but now if you look at Canon sensors or some Sony sensors not subject to the patent, every single pixel is a phase detection point. Indeed, the pixels are cut in 2 or 4 photodiodes each only receiving light from one half or one quarter of the aperture.

This means that every single pixel can detect phase information, all 45 million of them.

Herein lies the major difference between Structure from Focus and phase detection : in a phase detection system, it is not needed at all to run a focus cycle. Instead, two waveforms are generated, one corresponding to one section of the aperture and one corresponding to another section of the aperture.

The two waveforms "match" when the incident rays correspond to the same point, that is to say, when focus is achieved. However, it really isn't necessary to actually achieve focus - focus simply offsets the phase of the two waveforms.

Therefore, by simply computing the phase difference of the two waveforms, one can instantly know, given knowledge of the lens characteristics, the distance of the subject, without having to achieve a focus cycle (!)

Indeed, phase detection actually works very similarly to parallax, in that in actuality you can use it to construct two split, offset images.

Of course, you can also use two stereo cameras, but then you have the issue of having to motorize the cameras to achieve convergence, without which stereo overlap is minimimal, wherehas per-pixel phase detection provides complete overlap and is much more precise.

If you want to see this in practice, the RAW files of a Canon EOS R5 actually encode distance information for every single pixel.

Also,

>the depth value is only as accurate as the depth of field

Yes. The depth value is precise, in a modern camera, to about one 5000th of the diameter of the aperture, which is, for a 50mm f1.8 normal lens, able to compute depth accurately for any subject within 50-60 meters. Which is better than LiDAR of the the same size, easily. And you can do so for a small fraction of the cost by simply upgrading hardware that is already necessary.

It is true that modern cameras actually do move the focus and re-calculate depth, which might give the impression that if you wanted to calculate depth you would actually need to move focus. But in reality, this is done in order to correct for the small misalignment in different lenses, as well as for the fact autofocus motors do not have accurate encoders and will very frequently miss steps. But a modern camera already knows how much the image plane needs to actually shift before even engaging the autofocus motor at all.

In essence, SfF in the State of the Art is an inversion of the state of autofocus 20 years ago, using contrast detection as the autofocus method. However, modern autofocus has progressed so much in this time-frame that it has solved almost all of the issues of SfF.


As far as large lenses and sensors not working well on sensors, I think you'd be surprised just how well large lenses designed for the sensors they belong on can work. Indeed, by far the reason why photography lenses are huge is because of lens IS and zoom. Making small, light, fast normal lenses with larger stabilized sensors works perfectly fine and can be made very light.

The current cameras and lenses on the market work very well, that's not the problem. The problem is that even "small" cameras and lenses are huge in comparison to what can be integrated into a robot. I use a couple M4/3 cinema/studio cameras for vision projects because they're the smallest affordable cameras with controllable interchangeable lenses, and even with their "small" size they are at the upper limit of anything I could hope to fit in a practical robot.

After all, humans have two huge sensors (slightly bigger than full-frame) with f/3 lenses, and it works perfectly fine.

The human eye is vastly more complex and capable than a fixed plane imager and capable of much more complex focus behavior than any existing lenses are. Camera-eye comparisons are appealing, but the focus behavior of them is really very different.

However, modern autofocus has progressed so much in this time-frame that it has solved almost all of the issues of SfF

Canon has done a very nice job with their autofocus, but I will believe they have solved depth perception with phase detection when they apply it to their industrial cameras. Canon makes very nice RV-series parts recognition cameras that do a very good job with challenging materials, but they use structured light. In theory, phase-detection SfF would be perfect for these cameras but they haven't done it.

It is true that modern cameras actually do move the focus and re-calculate depth, which might give the impression that if you wanted to calculate depth you would actually need to move focus. But in reality, this is done in order to correct for the small misalignment in different lenses, as well as for the fact autofocus motors do not have accurate encoders and will very frequently miss steps. But a modern camera already knows how much the image plane needs to actually shift before even engaging the autofocus motor at all.

The minor problem is that you generally want depth plus RGB, which is obviously solved by adding a second camera but then you face basically the same size and overlap issues that stereo cameras bring. Not entirely a free lunch.

The other problem here is political, namely that SfF via phase-detection is (at least for now) entirely in the hands of Canon and Sony, two companies (Sony in particular) with track records related to robotics and machine vision users that can best be described as ranging from "somewhat uninterested" to "useless" to "self-sabotaging". Unless one of them commits to making an actual product and ships it, we will remain in a world where phase-detection SfF is technically possible but entirely out of reach.


>The current cameras and lenses on the market work very well, that's not the problem. The problem is that even "small" cameras and lenses are huge in comparison to what can be integrated into a robot. I use a couple M4/3 cinema/studio cameras for vision projects because they're the smallest affordable cameras with controllable interchangeable lenses, and even with their "small" size they are at the upper limit of anything I could hope to fit in a practical robot.

But this is largely because modern lenses are seriously bloated, with IS for older cameras that have non-stabilized sensors, bloated focusing mechanisms to make floating elements work, bloated zooming mechanisms that make everything so much worse, and so on.

If we want to design a lens for robotics, we really don't need much more than a double-gauss normal lens with a unitary focusing mechanism.

One of the lenses I use when I need to kludge something in a project is exactly that, a 1970s Minolta 55 f/1.7 lens for a full-frame image circle. Despite having a large 32mm aperture, it has a diameter of 56mm and a length of 44mm. I don't know if that's too large for your robotics applications, but it isn't very big and it's much smaller than the LiDAR sensors I've worked with. You could likely make this even smaller with modern techniques. It would also cost about 40$ to build.

>The human eye is vastly more complex and capable than a fixed plane imager and capable of much more complex focus behavior than any existing lenses are. Camera-eye comparisons are appealing, but the focus behavior of them is really very different.

My point was moreso that in pure optical sizes, the human eye is bigger and takes up more volume than the optical system I'm proposing. But as far as focusing behaviours, if you were to mathematically model how the human eye gets depth information to drive the focus cycle its quite similar to a phase detection system, with the phase information being extracted through the parallax.

>Canon has done a very nice job with their autofocus, but I will believe they have solved depth perception with phase detection when they apply it to their industrial cameras. Canon makes very nice RV-series parts recognition cameras that do a very good job with challenging materials, but they use structured light. In theory, phase-detection SfF would be perfect for these cameras but they haven't done it.

I think this can be explained quite simply. In a fully controlled environment, using structured light but cheap lenses and small sensors is much more cost-effective. Why would they use expensive APS-C+ sized sensors, that they don't seem to be all that good at making, when you can get away with very cheap parts?

>The minor problem is that you generally want depth plus RGB, which is obviously solved by adding a second camera but then you face basically the same size and overlap issues that stereo cameras bring. Not entirely a free lunch.

But this is precisely the point of OSPDAF or split-pixel phase detection. You get both depth and RGB at the same time, because the RGB sensor doubles as a phase detection sensor. It's not like old cameras where you had to use a mirror that sent light to the dedicated phase detection sensor.

As for political problems, I agree. Canon won't ever sell you their DPAF sensors. Sony, even though they are really bad at robotics, should in theory be willing to sell anyone their sensors, and it should be possible to make a prototype. Their tech in this would be limited relatively to Canon, though.

But, to the point, a solution to a problem that's proprietary is still a solution to the problem.


There are a number of quite good modern primes that are tiny with big apertures, especially for smaller sensor sizes. Even then, though, lens plus camera is almost always much bigger than the current crop of depth cameras (K4A, Realsense D400, L500, Lucid Helios). Even those cameras is still too big for some applications, that's where PMD's tiny little cameras have a niche.

There's always going to be some room for larger, better sensors, but a lot of applications are in need of more sensors to get better coverage, and size is definitely one of the tightest limitation on where they can be applied.

I think this can be explained quite simply. In a fully controlled environment, using structured light but cheap lenses and small sensors is much more cost-effective. Why would they use expensive APS-C+ sized sensors, that they don't seem to be all that good at making, when you can get away with very cheap parts?

I mean sure, greed is a plenty good explanation of Canon's behavior. The RV-series are pretty big and plenty expensive, so I don't think using better optics would be an issue. Indeed, if single-frame PDAF depth worked out, it would considerably speed up the parts recognition cycle time, which is actually somewhat slow due to the multiple patterns required by the structured light system. The projector and structured light processing work isn't free, either, so it seems like depth they got for free would be better in every way.

Dual Pixel AF II, at least in marketing speak for OSPDAF, has been around for longer than the RV-series, so I can't help but have doubts about the actual practical applicability of PDAF SfF.

But this is precisely the point of OSPDAF or split-pixel phase detection. You get both depth and RGB at the same time, because the RGB sensor doubles as a phase detection sensor.

Yes, that's true, but the problem of making a good RGBD camera is that you generally want a deeper depth of field in RGB so as much of the image is in focus as possible, while maximizing PDAF SfF depth quality requires as shallow a depth of field as possible. It's not impossible to solve this, but the fundamental optical features you want out of the depth part are the opposite of the features you need from the RGB part. You could address this by changing aperture between each depth/RGB capture, although you'd definitely have to use an electromagnetic aperture to avoid lifetime issues. You'd still end up with a slower framerate taking successive depth/RGB frames, but that's doable for many applications.


Motion planning is also an Open Problem for Humans!

Let's say we are standing on a high hill and I point to another hill and say: "Walk over there". Do you expect any human to find a reasonably good path by themselves? I would personally try to use a map. How do military robots solve this? They use satellite images.

And in general, this article seems very pessimistic to me. My home-built computer vision pipeline can do localization and mapping, loop detection, object segmentation and depth estimation at levels that are "good enough" for indoor drone flight. So I would assume that someone with a generous serving of financial resources would be able to solve most of those problems, except for 2 issues:

1. You need to memorize how the environment works. That's why newborn kids take stupid decisions, they lack the stochastic priors. Lucky for us, memorizing is AIs strong point.

2. You need to have mechanics that are more forgiving. If I accidentally position my hand the wrong way, I might accidentally squeeze someone a bit, but I won't crush their bones, because the mechanics of my arm are flexible. We'll need better accentuators and elastic casings.

And just for the sake of discussion, here's my replies for each problem category.

Simultaneous Location and Mapping: There are libraries that work well enough for your robot to localize itself in a building-sized environment with just a single camera. I'd consider this solved. https://github.com/raulmur/ORB_SLAM2

As for the obstacles, also for humans it is mostly guessing if you want to step on that blanket or if there'll be something fragile or slippery inside.

Lost Robot Problem: Most SLAM solutions are good enough that you could just regenerate the map from scratch every time that there was a gap in your perception. ORB-SLAM2 also has a loop and merge detection module so that if you reset its tracking and then it walks into a known environment, it can merge the old data into its new state.

Depth estimation: It works well enough in practice. https://www.stereolabs.com/zed-2/

Scene understanding: I don't know about you, but when I drive the highway, I sometimes have dead flies on my windshield. Apparently, they aren't that clever after all.

Position estimation: It works exceptionally well for VR markers. In general, those solutions tend to be called "Visual Odometry" https://www.youtube.com/watch?v=fh5dLF3dmr0

Affordance discovery: This is mostly a memorization problem, so a perfect candidate for AI.


Alas, your problem #1 is not really about memorization, it is about understanding. Take a human to the house they have never been before and tell them: "make me some tea". Now try that with any robot you want.

It is you being optimistic, not the article being optimistic.


Give a human kid a piece of paper and an envelope and say "make it fit" and roughly 1 out of 5 will fail because they have not yet had the opportunity to watch their parents fold a letter.

For your example, I see an upfront memorization component, which is that your request only works with humans that have previously seen how tea is made. That would be an unsuperwised AI which watches a youtube tutorial and then reduces the task to "get hot water + get tea bag + get container + combine"

Please note how by cultural memorization is again implicitly added. I might use a trash can as the container, but due to our shared culture we'll agree that a mug works best. So this gets reduced with more unsupervised AI to resolve "container" to a list of tolerable objects.

Next comes an exploration phase where human and robot just randomly open cupboards to see what is inside. YOLO should be good enough to recognize the water cooker, the tea bags, and the mug.

Next up comes again memorization. Kids cannot reliably turn on a machine that they have not seen before, so intelligence is probably of little use. Instead, they learn by imitating. An AI would probably again crawl random YouTube videos, check that the cooker looks similarly, then try to imitate that.

I hope I have illustrated that a lot of what we think of as understanding is not much more than repeating a similar situation which we have previously experienced.

That would also be my theory as to why meditating and thinking about an action can actually improve our skill at doing it. We're memorizing a fantasy simulation.


> Depth estimation: It works well enough in practice. https://www.stereolabs.com/zed-2/

I have no experience with the zed 2, but the zed 1 did not convince me and it had similar fancy marketing visualization as featured by the page you linked.

Don't trust anyone who shows you

- depth estimations from the viewpoint of the camera

- color coded depth estimation

In neither case can you judge the accuracy

Vision-based depth estimation is hard for the general case, where you need to cover a non-tiny depth range to cover, work indoors and outdoors and cannot rely on great texture. All vision-based methods I know of break under circumstances that are not too uncommon for robots.

Laser works much better, but the information density is not that great.

Radar works well in some applications, badly in others.

So I'd consider it "solved" when you throw a very costly combination of sensors on the problem.


Classic line from this piece

This was before the marketing departments of google and other frauds made objective thought about this impossible.


There are some really interesting, if maybe ranty and not-politically-correct comments on that blog.


> We all know emergent systems are super important in all manner of phenomena, but we have no mathematics or models to deal with them. So we end up with useless horse shit like GPT-3.

Says the author who confesses his lack of domain knowledge in the intro.

So much entitlement. Why didn't he invent something better?


I’m excited to see gpt-3 or similar technology applied to these problems.

A robot can now know lots of common sense. Beer is kept in the fridge. The fridge is in the kitchen. Cans should not be shaken, etc, etc. so if something like gpt-3 can help with a high level plan, and regular computer vision can handle the low level stuff like obstacle avoidance, fridge and beer identification, etc

We could have something really interesting.

(Get in touch if you want to work on something like that. I think it would be a blast!)

Edit. What makes me think gpt-3 has some good common sense. I saw someone ask it “can I do a bench press with a cat?” And it said “no, the cat will bite you.” It’s kind of like we’ve achieved what that common sense database project was going for.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: