OpenAI Robot Hand: Today Rubik’s Cube, Tomorrow the Real World?

Synced
SyncedReview
Published in
6 min readOct 15, 2019

Humans practiced for thousands of years to master the games of Go and chess, but today’s AI-enabled machines can beat our best after just a few days of training. We’ve lost the brain race, but humans still have unmatched dexterity, right? Wrong. OpenAI’s humanlike five-fingered gripper Dactyl just single-handedly solved a Rubik’s cube.

The feat builds on the San Francisco-based research company’s 2018 “Learning Dexterity” work, in which Dactyl first learned to perform complex in-hand manipulations. Peter Welinder, an OpenAI research scientist on this project, told Synced when the researchers were selecting tasks for Dactyl to work on they wanted something concrete that could also have general purpose applications.

The OpenAI researchers used reinforcement learning algorithms to formulate a control policy so the robot hand would make the right moves based on the state of the cube and its fingers. They employed 64 NVIDIA V100 GPUs and 920 worker machines with 32 CPU cores each for training Dactyl, for a rough equivalence of 13,000 years of experience on the task.

The system was trained entirely in simulation — it starts off knowing nothing about its robotic hand, the cube, or how the two might physically interact and to what end. Researchers gave the algorithm rewards when it managed to rotate the face of the cube or flip the cube over. Eventually, it learned how to perform the right moves required to solve the Rubik’s cube.

To enable the robot hand’s “sensing” capability to capture the cube state, OpenAI researchers trained a convolutional neural network for state prediction, given rendered camera images from three different angles.

Whereas for humans solving Rubik’s cube is mainly a mental challenge, aligning the colours is child’s play for robots. The real hurdle for a computer system is learning the complexity of the gripper hardware and how to correctly coordinate the movements of fingers and joints relative to the cube. For Dactyl to handle the real-world complexity, researchers had to ensure the simulation covered as many scenarios as possible.

They did that using an automatic domain randomization (ADR) method, an extension of the domain randomization technique they used in last year’s block rotation task. “The way it works is basically, in simulation, you randomize all kinds of things. For example, you might make it harder to move one of the fingers of the hand, or you might make the cube heavier or bigger, or you might make it more slippery,” says Welinder.

Researchers used an off-the-shelf physical simulator, MuJoCo, to generate the parameters for randomization. Says Welinder, “We extended this domain randomization to make it automatic, where how much you randomize all these parameters increases automatically.” Researchers applied ADR to the training for both control policies and cube state estimations.

Another OpenAI research scientist on the project, Lilian Weng, says it took months of work before the robot hand was robust enough for the real world. “We follow a curriculum process to increase the randomization and assimilation, for example, if we set the friction to a certain range, or cube size to a certain range, we don’t just train the model on the hardest cases. Instead, we sample all the range, so the model has to handle all the cases within this range. The model has to be able to solve a broader and broader distribution of randomization.”

While its dexterity is impressive, Dactyl still struggles with speed. Unlike chess or Go where the robot directly confronts an opponent, mastering Rubik’s cube is all about the time required to perform the task. “It doesn’t really matter whether you’re competing with humans or robots,” says Weixing Zhang of the World Cube Association.

Dactyl does not yet pose a threat to cubers like Zhang. It currently requires four to seven minutes on average to solve a 3x3x3 Rubik’s cube, while Zhang’s one-handed record is 12.59 seconds and a new single-hand world record of 6.82 seconds was just announced from last Sunday’s Bay Area Speedcubin’ 20 2019. (Last year, a robot developed by two MIT engineering and computer science students solved Rubik’s cube in 0.38 seconds, but this was a task-specific machine and not a humanlike gripper with wider real world applications.)

Zhang says he’s not at all worried about robots prevailing in speed cubing competitions, which he treats as an opportunity to hang out and have fun with like-minded friends. “I doubt that any AI can do the latter very well!”

Welinder meanwhile stresses OpenAI’s goal is not to create a cube champion, but “to see how far we can push dexterity. Ultimately, we want to build robots that are more general purpose, that can do tasks that are more useful to humans.”

Dactyl was trained on an ever-growing distribution in randomized environments with a memory-augmented policy, which enabled robust performance even in scenarios it was unfamiliar with. In some experiments for example researchers intentionally interfered with the robot hand by binding some of its fingers or sheathing it in a rubber glove. Even in such situations, which the robot had not experienced during training, it still soldiered on and managed to solve the Rubik’s cube.

The reinforcement learning algorithm researchers used to train Dactyl enables the neural network to learn through trial and error in simulation. This is the same algorithm OpenAI used to train the bots that beat human pros in the computer game Dota 2 last year. By adding the new ADR method, the researchers aim to develop a general approach that can be applied to a wider variety of problems. “We don’t want to kind of focus on just one problem, to build a hacky solution to do only that,” says Welinder, “the core thing that we care about is this generality.”

Zhang is not only a regular at Rubik’s cube competitions, he’s also a machine learning engineer with autonomous driving startup comma.ai. Although Zhang recognizes ADR’s potential for generalizing models to other complicated tasks, he suggests a limitation of ADR training is that it can only augment explicitly known and specified environmental variables. “For example, no one is solving self-driving cars without real data, since encoding all relevant environment variables into a simulator is highly impractical, if not impossible,” he says.

The OpenAI paper Solving Rubik’s Cube with a Robot Hand is available here.

Journalist: Yuan Yuan | Editor: Michael Sarazen

We know you don’t want to miss any stories. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.

Need a comprehensive review of the past, present and future of modern AI research development? Trends of AI Technology Development Report is out!

2018 Fortune Global 500 Public Company AI Adaptivity Report is out!
Purchase a Kindle-formatted report on Amazon.
Apply for Insight Partner Program to get a complimentary full PDF report.

--

--

Synced
SyncedReview

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global