I have a very opinionated opposing view of this research. A lot of research in this direction is working on raising the floor. Basically they just want the robot to handle a large variety of simple tasks and environments. Fair enough most industrial robots can't handle the smallest changes. But many times they implicitly make the assumption that raising the floor will also raise the ceiling. They assume, if it can generalize at 90%, it might also be able to do far more dextrous tasks that humans can. I think this is completely false, at best if we could do dextrous tasks in 1 environment this can transfer them to other environments with a presumably lower efficiency.
On the other hand, I think a more promising direction is to raise the ceiling of robot arm manipulation sky high. OpenAI kind of did this with Dactyl but I would like to see more of it. Can we get robotic arms to tie a shoelace, knit, perform pottery etc (with an arm like morphology, no special mechanisms). I think this can actually then lead to large scale generalization, kind of what we are seeing happening with NERF's now. I would like a robot arm NERF, overfit to one hard task but reproduces it with human like precision and dexterity. Deepminds approach to me (with GATO, robocat) seems like a red herring, they will never reach the kind of results we want from our arms.
Deepmind is google, and google suffers from chronical dabbling and never shipping products, I'd be surprised if they even care much about generalizing to useful tasks.
And about floor vs ceiling, what's really important is robustness, only robust robots can be deployed in the wild, at this time Dactyl with all its fingers is still too difficult to control, RoboCat got the grippers right, the problem really is they are again doing cute things with large models instead of raising robustness.
I had a more general noobish version of it in my head how all this plays out.
Now that AI based task planning can learn from so few examples and can do extremely general tasks like say the instruction:
"Summarize my gmail twice a day at 7:00 AM and 5:00 PM filtering spam and stuff I don't read" would spawn an agent with a plan to do exactly what you want. And if it could not you could show it once like "recording" a super smart "macro" only this time with AI agents instead of Selenium.
This would be an "inversion" of API way to do things. Pretty soon people are writing tons of these "bots" and multiple cases land in the some(supreme?) court over what is a bot and what is personal property.
I mean I am not blind, there are plenty of no code platforms that work on B2B space. But nothing in the B2C for general purpose domain that do this in a privacy safe manner, at least not one that will shake up the tech companies.
It could, but I bet it won't. The UI is not the limiting factor - the SaaS providers are. Automation like this should be easy with regular APIs and code (or no-code tools). It isn't, because service providers don't want you to do it. A skilled dev can get their way with Selenium or some other way of browser automation - this is tolerated, because approximately no one actually does it.
If anyone could easily get an LLM to automate this, you can bet service providers will scramble to find ways to defeat it (justifying it with some bullshit reason like "security", so it's not blatantly obvious they're artificially constraining their product, to prevent you walking around their most profitable toll booths). In general, it doesn't matter what new technology is promising in the abstract - if anything, it's just marketing noise. The technology will deliver only the things its wielders can make money on, and will not deliver things that don't have a good business case. It's the standard tech industry bait-and-switch.
Want proof that this is more than just cynical rambling? Take a look at IFTTT, Zapier, et al. See what you can actually do with them. Notice that it's entirely limited by what the integrated services expose to those automation services. See how carefully those integrations are crafted, so that you can't exactly do much with them. If you want anything non-trivial, you'd better be a business, because individuals aren't supposed to empower themselves with technology. And if you're a business, well, anything is possible if you're willing to pay enough.
The few of us who know how to operate advanced automation tools, and whose frustration outweighs the cost of micromanaging brittle DIY hacks, can beat SaaS tools into submission. Sometimes. But only as long as it stays below the noise floor of vendors' financial metrics.
I had that exact conversation with Chamberlain about MyQ garage door openers. They make it difficult to integrate with and their stated reasons are all about security.
But they'll constantly try to get you to work with their (presumably paying) partners that have far more privileges than you as the lowly product owner.
There isn't a great business case for the latter, so it languishes.
Hostility to automation is a consequence of the “free to use” double-sided market trap that have been flourishing since Web 2.0.
If you are paying for a service, which adequately charges you at least how much it costs to run + margin, the service should not care whether you are giving them your eyeballs, and you are entitled to request features like automation friendliness and APIs and e2e encryption because you can vote with your wallet by changing suppliers.
If advertisers are the ones paying, and specifically paying for your eyeballs, then naturally your interests are at odds.
(There are edge cases where the service may in fact charge you, but 1) their billing structure does not let them cover operational costs, 2) they are afraid to alienate users by raising prices or otherwise hostage to current billing structure, and 3) they are incapable of optimizing performance to turn profit with current billing structure; in those cases crippled API and automation-hostile GUI may serve as a throttle.)
Surprisingly, I don't believe it has anything to do with advertising. SaaS model is, by its very nature, hostile to its users - even the paying ones.
> If you are paying for a service, which adequately charges you at least how much it costs to run + margin, the service should not care whether you are giving them your eyeballs,
They may not care about your eyeballs, but their understanding of "adequate charge" is "what the market can bear". More specifically, if they allow for some automation, and that automation saves their customers a noticeable amount of money, then they'll try to capture as much of those savings as possible. This is business as usual - the entire market economy works like this. The problem with SaaS is...
> and you are entitled to request features like automation friendliness and APIs and e2e encryption because you can vote with your wallet by changing suppliers.
...that you can't actually vote with your wallet. Software resists commoditization, and SaaS companies are especially good at this. Whatever the service you use, it's highly likely there is no good alternative for it. There may be some competitors whose offering partially overlaps with what you use now, but it's highly likely there's some unique aspect that, combined with hassle of moving your data and overall inertia, will make switching service providers an act of last resort.
Those sticking points often aren't software-related at all! E.g. Netflix and HBO Max and Disney+ aren't really substitutes, because their catalogs do not overlap. Tech-wise, Spotify is mostly a shitty audio player (that gets worse every update) - their entire value was always in the deals they signed with the labels and artists. Etc.
Combine this with your typical SaaS targeting a global market, and you can see the power imbalance. They don't give a shit what you want. You ain't gonna "vote with your wallet". Hardly anyone does. And you're just one user of a hundred thousand, or a million - a rounding error. You're not worth caring about, not unless you're in a position to start a cascade of users leaving - which might happen if they try to screw you over and you're quick to Twitter, but which definitely won't happen just because they made their UI suck on purpose, or put basic automation behind a high price tier.
> 3) they are incapable of optimizing performance to turn profit with current billing structure; in those cases crippled API and automation-hostile GUI may serve as a throttle
That's a possibility in some cases, but I think that, if the "normal" mode of user interaction involves a website or a mobile app written using modern frameworks and development techniques, then it's probably a wash - increased utilization of the core service may be canceled out by reduction in the flood of requests their bloated GUI generates.
I think streaming services is an invalid example, because their value is their catalog. Perhaps it was different at the dawn of Netflix but now they mostly sell series and movies, not a web service for accessing series and movies.
All actual services are interchangeable, or should be—and if not then there is network effect and/or user lock-in due to double-sided market malpractice. In all other cases there is a niche ripe for a competitor to come in and profit!
> increased utilization of the core service may be canceled out by reduction in the flood of requests their bloated GUI generates
I think in most cases a swarm of automated requests to core service would outweigh overhead from a front-end GUI, architecture must be really mismanaged otherwise.
> You ain't gonna "vote with your wallet". Hardly anyone does. And you're just one user of a hundred thousand, or a million - a rounding error.
Why not? This is a defeatist attitude that either ignores how markets work or assumes users are ignorant and never learn.
People do vote with their wallets. Once bitten, you choose services with full export functionality, start reading ToS and looking at who runs it. Word of mouth helps people make better decisions and pick better vendors, even if those vendors are more expensive. (The only problematic ones are “free” vendors, for a customer making a jump from 0 to N is much more difficult than from N to M. So we’re back to ad-supported double-sided markets ruining everything.)
And yes, there is information asymmetry, but there is also regulation that could compensate for it. For example, the government could require services larger than certain size to provide fully featured APIs.
I really really really want to play with robotics at home, but I've discovered it's super expensive to get things like robotic arms at home. Are there robotics platforms that are DIY ready? I really just want to teach a robotic arm to make my coffee in the morning. And I'll apply my life's effort to achieving this only if the cost of the robotics platform is stupid low.
My exact thought as you. I actually have done robotics research in Academia and I find it ridiculous that an arm costs as much as a car when it's just servos in series. I guess economies of scale.
Now that I'm leaving Academia I was trying to figure out how to get a robot arm so that I can continue doing stuff at home. I think an approach like the one here would be good: https://www.trossenrobotics.com/aloha.aspx. A single robot would cost 5k. You could also just buy Dynamixel servos and build it on your own, shouldn't be too hard with 3D printing ~ closer to 3.5 or 4k with that.
You could try to go cheaper by building a dynamixel servo on your own. I would not recommend this, the main difficulty is the controller which requires a lot of tuning, which is what you're paying for when you buy Dynamixel. But the raw parts of a servo are only around 50-100$ I'd guess.
Lastly any arm you build this way is going to be position control which is very different from the human arm that is torque controlled. Trying to get torque control on your arm is all the rage on Youtube these days, perhaps because the motion seems so lifelike. To do that you need to build a Quasi Direct Drive Actuator which is a BLDC servo with a low gear ratio and then try to centralize all the motors to reduce inertia. There is no accepted solution to do this, but the best results come from cable driven mechanisms like AmbiDex. Basically building a human like robot arm is a huge research problem on its own, and if you want to focus on intelligence best avoid this.
Will MKS SERVO42 going to work? Installs onto back of NEMA17 stepping motors, works in closed loop using magnetic encoder, has to be stepped but infinite rotating and $20 apiece with a motor. Latest 42D variants seems to work with CAN or RS485 too.
There is FEETECH on ali-express. About 0.5-0.25 times Dynamixel price. I've never tried it though. You're right very few of us want High Torque and High Accuracy which is the problem
The cheapest is to strap things onto a 3 axis 3d printer. They can be had for < $200, which is much less than you could ever build one for.
Beyond that, it's DIY, with a set of strong servos and a few budget microcontrollers/motor controllers, from Alibaba, and...a 3d printer to manufacture the rest, for something like an arm.
Elephant robotics has arms starting around ~$500. I've had good success with their myCobot280-pi. The reach and payload are a bit limited but it's fine for playing around.
* After observing 1000 human-controlled demonstrations, collected in just hours, RoboCat could direct this new arm dexterously enough to pick up gears successfully 86% of the time. *
Interesting, but this doesn’t sound like something that can be useful. For practical industry applications, you need high success rates. Assume you are running operations where dropping/flips/damages to the items that a robot handles are costly and not acceptable.
Its only for a hobbyist pick-and-place, dropping a Lego block or wasting some components is not a big deal.
Plenty of manufacturing environments that involve heavy items. Even with a lift assist, they'll still need 2-3 people to help lift or guide something to the next cell or next place in the line. Lift assists are still expensive, and if that could be replaced with a robotic arm and cut out 2-3 employees at the same time? There's going to be a big market for this. Certainly big enough to keep a company afloat while they continue to refine the product and reduce costs.
I would assume bridge cranes handles the majority of the cases to replace human effort. It seems that "successfully 86% of the time" and "heavy items" usually means "destruction". Some secondary system to verify grip would be required.
> Assume you are running operations where dropping/flips/damages to the items that a robot handles are costly and not acceptable.
Such operations are rare and in those cases you're probably not picking them out of a bin to start with. For most items dropping or flipping the object a few times is perfectly acceptable, indeed that's probably how they got into their current position to start with.
Performance on so few examples is impressive, paired with generalizability across broader tasks, and multiple embodiments + environments (and from just visual goals rather than complex verbal instructions) is quite a jump from where we saw Gato at last spring. If representative, seems a strong step toward meaningful autonomous skill acquisition/transference in realistic settings.
"Robots are quickly becoming part of our everyday lives"
Are they really?
This capability will help accelerate robotics research, as it reduces the need for human-supervised training, and is an important step towards creating a general-purpose robot.
The future people are striving to build just seems so fucking creepy to me. I don't really understand the enthusiasm for it. Maybe I would if I saw said future and maybe it would be awesome and I'd regret saying this, but right now, I just don't get it.
When the operator was demonstrating "perturbations", they were reaching into the scene while the robot was very close to grabbing objects they had their fingers near - isn't this dangerous? What if the robot clamped down on their fingers?
This looks like a cobot (Collaborative robot). They are meant to work alongside humans - chance of injury is low because their speed/maximum force exerted is limited.
Everyone seems to dismiss the 'Terminator' scenario in AI. Mainly because robots are lagging, or not having as remarkable break throughs as GPT, (even Boston Dynamics, is great, but watch their 'behind the scene' on all the trial and error). This seems like a step in direction. Hook up some of this, some self-driving car tech, some GPT-4, starting to look a lot more like 'Terminator' is doable, or at least not a complete fantasy.
What more would it take for google to succeed at something like:
"robot can plant, water and harvest food that only poor laborers from other countries did, replacing whole job categories in agro business along with other things like making blue collar immigration moot"?
Unitree sells robot similar to Boston Dynamics Spot for $2700. If you add robotic gripper to it can do things like picking strawberries cheaper than minimum wage worker. Say if robot costs $3500 and minimum wage is $10/hour then robot needs to only complete 43 8-hour shifts over it's lifetime to displace human worker.
That is the basic version. The EDU version which opens more API/interface to the user cost much more. Also training a whole-body (quadruped + robotic arm with gripper) to do something like picking strawberries remains challenging today.
It's challenging but looks like Boston dynamics is able to do that kind of tasks. For instance they have a build in feature for spot with robotic hand that can open a door by turning door handle. And apparently it works with most doors.
Not sure how helpful that would be, I don't think most of the cost of robotics manufacturing is assembly. Precision machining would be part of it, as would actuators and batteries, all of which require their own specialised manufacturing chains that I doubt DeepMind can improve by an order of magnitude. And then of course there's the chips, which are very important and unlikely to drop in cost significantly from anything DeepMind can do. Of course your floor is the cost of materials, which is going to be based on labour and exchange value from mining and refining, particularly of lithium and aluminium, neither of which I think is particularly susceptible to any advances DeepMind is likely to make.
If someone from twenty years in the future appeared in my living room and informed me with certainty that robots are 100x cheaper in 2043 and asked me how that could be true, my first guess would be that there are significant energy breakthroughs making aluminium and lithium much cheaper as raw materials, and that robots took off as a very popular consumer product and are now sold in quantities between 100 million and 1 billion units every year, driving prices down through economies of scale. I'm not certain that would be enough on its own to get prices down that far, but it would be my best guess.
Why say “self-improving” when “learning” would suffice? Or is there a nuance I am missing?
I’ve heard people even use the term “self-learning” and wonder why !
On the other hand, I think a more promising direction is to raise the ceiling of robot arm manipulation sky high. OpenAI kind of did this with Dactyl but I would like to see more of it. Can we get robotic arms to tie a shoelace, knit, perform pottery etc (with an arm like morphology, no special mechanisms). I think this can actually then lead to large scale generalization, kind of what we are seeing happening with NERF's now. I would like a robot arm NERF, overfit to one hard task but reproduces it with human like precision and dexterity. Deepminds approach to me (with GATO, robocat) seems like a red herring, they will never reach the kind of results we want from our arms.