Home Education MIT Technology Review Is robotics about to have its own ChatGPT moment?

MIT Technology Review Is robotics about to have its own ChatGPT moment?

0
MIT Technology Review
Is robotics about to have its own ChatGPT moment?

Silent. Rigid. Clumsy.

Henry and Jane Evans are used to awkward houseguests. For greater than a decade, the couple, who live in Los Altos Hills, California, have hosted a slew of robots of their home. 

In 2002, at age 40, Henry had an enormous stroke, which left him with quadriplegia and an inability to talk. Since then, he’s learned tips on how to communicate by moving his eyes over a letter board, but he is very reliant on caregivers and his wife, Jane. 

Henry got a glimmer of a unique type of life when he saw Charlie Kemp on CNN in 2010. Kemp, a robotics professor at Georgia Tech, was on TV talking about PR2, a robot developed by the corporate Willow Garage. PR2 was an enormous two-armed machine on wheels that looked like a crude metal butler. Kemp was demonstrating how the robot worked, and talking about his research on how health-care robots could help people. He showed how the PR2 robot could hand some medicine to the tv host.    

“Unexpectedly, Henry turns to me and says, ‘Why can’t that robot be an extension of my body?’ And I said, ‘Why not?’” Jane says. 

There was a solid reason why not. While engineers have made great progress in getting robots to work in tightly controlled environments like labs and factories, the house has proved difficult to design for. Out in the actual, messy world, furniture and floor plans differ wildly; children and pets can jump in a robot’s way; and garments that need folding are available different shapes, colours, and sizes. Managing such unpredictable settings and varied conditions has been beyond the capabilities of even probably the most advanced robot prototypes. 

That seems to finally be changing, largely due to artificial intelligence. For many years, roboticists have kind of focused on controlling robots’ “bodies”—their arms, legs, levers, wheels, and the like—via purpose-­driven software. But a brand new generation of scientists and inventors believes that the previously missing ingredient of AI can provide robots the flexibility to learn latest skills and adapt to latest environments faster than ever before. This latest approach, just possibly, can finally bring robots out of the factory and into our homes. 

Progress won’t occur overnight, though, because the Evanses know far too well from their a few years of using various robot prototypes. 

PR2 was the primary robot they brought in, and it opened entirely latest skills for Henry. It will hold a beard shaver and Henry would move his face against it, allowing him to shave and scratch an itch by himself for the primary time in a decade. But at 450 kilos (200 kilograms) or so and $400,000, the robot was difficult to have around. “It could easily take out a wall in your home,” Jane says. “I wasn’t a giant fan.”

More recently, the Evanses have been testing out a smaller robot called Stretch, which Kemp developed through his startup Hello Robot. The primary iteration launched through the pandemic with a way more reasonable price tag of around $18,000. 

Stretch weighs about 50 kilos. It has a small mobile base, a keep on with a camera dangling off it, and an adjustable arm featuring a gripper with suction cups on the ends. It may be controlled with a console controller. Henry controls Stretch using a laptop, with a tool that that tracks his head movements to maneuver a cursor around. He’s in a position to move his thumb and index finger enough to click a pc mouse. Last summer, Stretch was with the couple for greater than a month, and Henry says it gave him an entire latest level of autonomy. “It was practical, and I could see using it on daily basis,” he says. 

Henry Evans used the Stretch robot to brush his hair, eat, and even play along with his granddaughter.
PETER ADAMS

Using his laptop, he could get the robot to brush his hair and have it hold fruit kebabs for him to snack on. It also opened up Henry’s relationship along with his granddaughter Teddie. Before, they barely interacted. “She didn’t hug him in any respect goodbye. Nothing like that,” Jane says. But “Papa Wheelie” and Teddie used Stretch to play, engaging in relay races, bowling, and magnetic fishing. 

Stretch doesn’t have much in the best way of smarts: it comes with some pre­installed software, similar to the net interface that Henry uses to manage it, and other capabilities similar to AI-enabled navigation. The fundamental good thing about Stretch is that individuals can plug in their very own AI models and use them to do experiments. However it offers a glimpse of what a world with useful home robots could seem like. Robots that may do lots of the things humans do in the house—tasks similar to folding laundry, cooking meals, and cleansing—have been a dream of robotics research for the reason that inception of the sector within the Fifties. For a very long time, it’s been just that: “Robotics is stuffed with dreamers,” says Kemp.

But the sector is at an inflection point, says Ken Goldberg, a robotics professor on the University of California, Berkeley. Previous efforts to construct a useful home robot, he says, have emphatically failed to fulfill the expectations set by popular culture—think the robotic maid from . Now things are very different. Because of low cost hardware like Stretch, together with efforts to gather and share data and advances in generative AI, robots are getting more competent and helpful faster than ever before. “We’re at a degree where we’re very near getting capability that is absolutely going to be useful,” Goldberg says. 

Folding laundry, cooking shrimp, wiping surfaces, unloading shopping baskets—today’s AI-powered robots are learning to do tasks that for his or her predecessors would have been extremely difficult. 

Missing pieces

There’s a well known statement amongst roboticists: What is tough for humans is simple for machines, and what is simple for humans is tough for machines. Called Moravec’s paradox, it was first articulated within the Nineteen Eighties by Hans Moravec, thena roboticist on the Robotics Institute of Carnegie Mellon University. A robot can play chess or hold an object still for hours on end with no problem. Tying a shoelace, catching a ball, or having a conversation is one other matter. 

There are three reasons for this, says Goldberg. First, robots lack precise control and coordination. Second, their understanding of the encircling world is proscribed because they’re reliant on cameras and sensors to perceive it. Third, they lack an innate sense of practical physics. 

“Pick up a hammer, and it’s going to probably fall out of your gripper, unless you grab it near the heavy part. But you don’t know that should you just have a look at it, unless you realize how hammers work,” Goldberg says. 

On top of those basic considerations, there are numerous other technical things that should be good, from motors to cameras to Wi-Fi connections, and hardware could be prohibitively expensive. 

Mechanically, we’ve been in a position to do fairly complex things for some time. In a video from 1957, two large robotic arms are dexterous enough to pinch a cigarette, place it within the mouth of a lady at a typewriter, and reapply her lipstick. However the intelligence and the spatial awareness of that robot got here from the one who was operating it. 

""
In a video from 1957, a person operates two large robotic arms and uses the machine to use a lady’s lipstick. Robots have come a great distance since.
“LIGHTER SIDE OF THE NEWS –ATOMIC ROBOT A HANDY GUY” (1957) VIA YOUTUBE

“The missing piece is: How will we get software to do [these things] routinely?” says Deepak Pathak, an assistant professor of computer science at Carnegie Mellon.  

Researchers training robots have traditionally approached this problem by planning all the things the robot does in excruciating detail. Robotics giant Boston Dynamics used this approach when it developed its boogying and parkouring humanoid robot Atlas. Cameras and computer vision are used to discover objects and scenes. Researchers then use that data to make models that could be used to predict with extreme precision what is going to occur if a robot moves a certain way. Using these models, roboticists plan the motions of their machines by writing a really specific list of actions for them to take. The engineers then test these motions within the laboratory repeatedly and tweak them to perfection. 

This approach has its limits. Robots trained like this are strictly choreographed to work in a single specific setting. Take them out of the laboratory and into an unfamiliar location, and so they are prone to topple over. 

Compared with other fields, similar to computer vision, robotics has been in the dead of night ages, Pathak says. But which may not be the case for for much longer, because the sector is seeing a giant shake-up. Because of the AI boom, he says, the main target is now shifting from feats of physical dexterity to constructing “general-purpose robot brains” in the shape of neural networks. Much because the human brain is adaptable and might control different points of the human body, these networks could be adapted to work in numerous robots and different scenarios. Early signs of this work show promising results. 

Robots, meet AI 

For a very long time, robotics research was an unforgiving field, affected by slow progress. On the Robotics Institute at Carnegie Mellon, where Pathak works, he says, “there was once a saying that should you touch a robot, you add one yr to your PhD.” Now, he says, students get exposure to many robots and see ends in a matter of weeks.

What separates this latest crop of robots is their software. As a substitute of the normal painstaking planning and training, roboticists have began using deep learning and neural networks to create systems that learn from their environment on the go and adjust their behavior accordingly. At the identical time, latest, cheaper hardware, similar to off-the-shelf components and robots like Stretch, is making this type of experimentation more accessible. 

Broadly speaking, there are two popular ways researchers are using AI to coach robots. Pathak has been using reinforcement learning, an AI technique that enables systems to enhance through trial and error, to get robots to adapt their movements in latest environments. It is a technique that Boston Dynamics has also began using  in its robot “dogs” called Spot.

“EXTREME PARKOUR WITH LEGGED ROBOTS,” XUXIN CHENG, ET AL.

“EXTREME PARKOUR WITH LEGGED ROBOTS,” XUXIN CHENG, ET AL.

“EXTREME PARKOUR WITH LEGGED ROBOTS,” XUXIN CHENG, ET AL.

“EXTREME PARKOUR WITH LEGGED ROBOTS,” XUXIN CHENG, ET AL.

Deepak Pathak’s team at Carnegie Mellon has used an AI technique called reinforcement learning to create a robotic dog that may do extreme parkour with minimal pre-programming.

In 2022, Pathak’s team used this method to create four-legged robot “dogs” able to scrambling up steps and navigating tricky terrain. The robots were first trained to maneuver around in a general way in a simulator. Then they were set loose in the actual world, with a single built-in camera and computer vision software to guide them. Other similar robots depend on tightly prescribed internal maps of the world and can’t navigate beyond them.

Pathak says the team’s approach was inspired by human navigation. Humans receive details about the encircling world from their eyes, and this helps them instinctively place one foot in front of the opposite to get around in an appropriate way. Humans don’t typically look down at the bottom under their feet once they walk, but a couple of steps ahead, at a spot where they need to go. Pathak’s team trained its robots to take an identical approach to walking: every one used the camera to look ahead. The robot was then in a position to memorize what was in front of it for long enough to guide its leg placement. The robots learned concerning the world in real time, without internal maps, and adjusted their behavior accordingly. On the time, experts told the technique was a “breakthrough in robot learning and autonomy” and will allow researchers to construct legged robots able to being deployed within the wild.   

Pathak’s robot dogs have since leveled up. The team’s latest algorithm allows a quadruped robot to do extreme parkour. The robot was again trained to maneuver around in a general way in a simulation. But using reinforcement learning, it was then in a position to teach itself latest skills on the go, similar to tips on how to jump long distances, walk on its front legs, and clamber up tall boxes twice its height. These behaviors weren’t something the researchers programmed. As a substitute, the robot learned through trial and error and visual input from its front camera. “I didn’t imagine it was possible three years ago,” Pathak says. 

In the opposite popular technique, called imitation learning, models learn to perform tasks by, for instance, imitating the actions of a human teleoperating a robot or using a VR headset to gather data on a robot. It’s a method that has gone out and in of fashion over many years but has recently grow to be more popular with robots that do manipulation tasks, says Russ Tedrake, vice chairman of robotics research on the Toyota Research Institute and an MIT professor.

By pairing this system with generative AI, researchers on the Toyota Research Institute, Columbia University, and MIT have been in a position to quickly teach robots to do many latest tasks. They imagine they’ve found a option to extend the technology propelling generative AI from the realm of text, images, and videos into the domain of robot movements. 

The thought is to start out with a human, who manually controls the robot to reveal behaviors similar to whisking eggs or picking up plates. Using a method called diffusion policy, the robot is then in a position to use the info fed into it to learn skills. The researchers have taught robots greater than 200 skills, similar to peeling vegetables and pouring liquids, and say they’re working toward teaching 1,000 skills by the top of the yr. 

Many others have taken advantage of generative AI as well. Covariant, a robotics startup that spun off from OpenAI’s now-shuttered robotics research unit, has built a multimodal model called RFM-1. It may accept prompts in the shape of text, image, video, robot instructions, or measurements. Generative AI allows the robot to each understand instructions and generate images or videos referring to those tasks. 

The Toyota Research Institute team hopes it will someday result in “large behavior models,” that are analogous to large language models, says Tedrake. “Numerous people think behavior cloning goes to get us to a ChatGPT moment for robotics,” he says. 

In an identical demonstration, earlier this yr a team at Stanford managed to make use of a comparatively low cost off-the-shelf robot costing $32,000 to do complex manipulation tasks similar to cooking shrimp and cleansing stains. It learned those latest skills quickly with AI. 

Called Mobile ALOHA (a loose acronym for “a low-cost open-source hardware teleoperation system”), the robot learned to cook shrimp with the assistance of just 20 human demonstrations and data from other tasks, similar to tearing off a paper towel or piece of tape. The Stanford researchers found that AI can assist robots acquire transferable skills: training on one task can improve its performance for others.

TOYOTA RESEARCH INSTITUTE

TOYOTA RESEARCH INSTITUTE

TOYOTA RESEARCH INSTITUTE

TOYOTA RESEARCH INSTITUTE

While the present generation of generative AI works with images and language, researchers on the Toyota Research Institute, Columbia University, and MIT imagine the approach can extend to the domain of robot motion.

That is all laying the groundwork for robots that could be useful in homes. Human needs change over time, and teaching robots to reliably do a wide selection of tasks is significant, as it’s going to help them adapt to us. That can also be crucial to commercialization—first-generation home robots will include a hefty price tag, and the robots have to have enough useful skills for normal consumers to want to speculate in them. 

For a very long time, a number of the robotics community was very skeptical of those sorts of approaches, says Chelsea Finn, an assistant professor of computer science and electrical engineering at Stanford University and an advisor for the Mobile ALOHA project. Finn says that almost a decade ago, learning-based approaches were rare at robotics conferences and disparaged within the robotics community. “The [natural-language-processing] boom has been convincing more of the community that this approach is absolutely, really powerful,” she says. 

There may be one catch, nevertheless. With the intention to imitate latest behaviors, the AI models need plenty of information. 

More is more

Unlike chatbots, which could be trained through the use of billions of information points hoovered from the web, robots need data specifically created for robots. They need physical demonstrations of how washing machines and fridges are opened, dishes picked up, or laundry folded, says Lerrel Pinto, an assistant professor of computer science at Latest York University. At once that data may be very scarce, and it takes an extended time for humans to gather.

top frame shows a person recording themself opening a kitchen drawer with a grabber, and the bottom shows a robot attempting the same action

“ON BRINGING ROBOTS HOME,” NUR MUHAMMAD (MAHI) SHAFIULLAH, ET AL.

Some researchers try to make use of existing videos of humans doing things to coach robots, hoping the machines will give you the chance to repeat the actions without the necessity for physical demonstrations. 

Pinto’s lab has also developed a neat, low cost data collection approach that connects robotic movements to desired actions. Researchers took a reacher-grabber stick, just like ones used to choose up trash, and attached an iPhone to it. Human volunteers can use this method to film themselves doing household chores, mimicking the robot’s view of the top of its robotic arm. Using this stand-in for Stretch’s robotic arm and an open-source system called DOBB-E, Pinto’s team was in a position to get a Stretch robot to learn tasks similar to pouring from a cup and opening shower curtains with just 20 minutes of iPhone data.  

But for more complex tasks, robots would wish much more data and more demonstrations.  

The requisite scale can be hard to succeed in with DOBB-E, says Pinto, since you’d principally need to steer every human on Earth to purchase the reacher-­grabber system, collect data, and upload it to the web. 

A brand new initiative kick-started by Google DeepMind, called the Open X-Embodiment Collaboration, goals to vary that. Last yr, the corporate partnered with 34 research labs and about 150 researchers to gather data from 22 different robots, including Hello Robot’s Stretch. The resulting data set, which was published in October 2023, consists of robots demonstrating 527 skills, similar to picking, pushing, and moving.  

Sergey Levine, a pc scientist at UC Berkeley who participated within the project, says the goal was to create a “robot web” by collecting data from labs all over the world. This could give researchers access to greater, more scalable, and more diverse data sets. The deep-learning revolution that led to the generative AI of today began in 2012 with the rise of ImageNet, an enormous online data set of images. The Open X-Embodiment Collaboration is an attempt by the robotics community to do something similar for robot data. 

Early signs show that more data is resulting in smarter robots. The researchers built two versions of a model for robots, called RT-X, that may very well be either run locally on individual labs’ computers or accessed via the net. The larger, web-accessible model was pretrained with web data to develop a “visual common sense,” or a baseline understanding of the world, from the big language and image models. 

When the researchers ran the RT-X model on many various robots, they found that the robots were in a position to learn skills 50% more successfully than within the systems each individual lab was developing.

“I don’t think anybody saw that coming,” says Vincent Vanhoucke, Google DeepMind’s head of robotics. “Suddenly there may be a path to principally leveraging all these other sources of information to bring about very intelligent behaviors in robotics.”

Many roboticists think that enormous vision-language models, that are able to research image and language data, might offer robots necessary hints as to how the encircling world works, Vanhoucke says. They provide semantic clues concerning the world and will help robots with reasoning, deducing things, and learning by interpreting images. To check this, researchers took a robot that had been trained on the larger model and asked it to point to an image of Taylor Swift. The researchers had not shown the robot pictures of Swift, but it surely was still in a position to discover the pop star since it had a web-scale understanding of who she was even without photos of her in its data set, says Vanhoucke.

""
RT-2, a recent model for robotic control, was trained on online text and pictures in addition to interactions with the actual world.
KELSEY MCCLELLAN

Vanhoucke says Google DeepMind is increasingly using techniques just like those it could use for machine translation to translate from English to robotics. Last summer, Google introduced a vision-language-­motion model called RT-2. This model gets its general understanding of the world from online text and pictures it has been trained on, in addition to its own interactions in the actual world. It translates that data into robotic actions. Each robot has a rather different way of translating English into motion, he adds.  

“We increasingly feel like a robot is actually a chatbot that speaks robotese,” Vanhoucke says. 

Baby steps

Despite the fast pace of development, robots still face many challenges before they could be released into the actual world. They’re still way too clumsy for normal consumers to justify spending tens of hundreds of dollars on them. Robots also still lack the type of common sense that will allow them to multitask. And so they have to move from just picking things up and placing them somewhere to putting things together, says Goldberg—for instance, putting a deck of cards or a board game back in its box after which into the games cupboard. 

But to evaluate from the early results of integrating AI into robots, roboticists should not wasting their time, says Pinto. 

“I feel fairly confident that we are going to see some semblance of a general-purpose home robot. Now, will or not it’s accessible to most of the people? I don’t think so,” he says. “But by way of raw intelligence, we’re already seeing signs straight away.” 

Constructing the following generation of robots may not just assist humans of their on a regular basis chores or help people like Henry Evans live a more independent life. For researchers like Pinto, there may be a good larger goal in sight.

Home robotics offers top-of-the-line benchmarks for human-level machine intelligence, he says. The proven fact that a human can operate intelligently in the house environment, he adds, means we all know it is a level of intelligence that could be reached. 

“It’s something which we will potentially solve. We just don’t know tips on how to solve it,” he says. 

Evans in the foreground with computer screen.  A table with playing cards separates him from two other people in the room
Because of Stretch, Henry Evans was in a position to hold his own playing cards for the primary time in twenty years.
VY NGUYEN

For Henry and Jane Evans, a giant win can be to get a robot that simply works reliably. The Stretch robot that the Evanses experimented with continues to be too buggy to make use of without researchers present to troubleshoot, and their home doesn’t all the time have the dependable Wi-Fi connectivity Henry needs in an effort to communicate with Stretch using a laptop.

Even so, Henry says, one in every of the best advantages of his experiment with robots has been independence: “All I do is lay in bed, and now I can do things for myself that involve manipulating my physical environment.”

Because of Stretch, for the primary time in twenty years, Henry was in a position to hold his own playing cards during a match. 

“I kicked everyone’s butt several times,” he says. 

“Okay, let’s not talk too big here,” Jane says, and laughs.

LEAVE A REPLY

Please enter your comment!
Please enter your name here