Understanding OpenAI’s Robot Hand That Solves The Rubik’s Cube

AI breakthrough or failure?

Published in

Applied Data Science

9 min readOct 30, 2019

OpenAI is a research-oriented company with an ambitious mission: the creation of general artificial intelligence that benefits all of humanity. Setting the bar that high has admittedly proven a great motivation, as the company’s projects have recently dominated the AI media world. Their AI agents playing Hide and Seek surprised everyone with their performance and inventiveness, while just a few days ago, OpenAI stole the spotlight when they publicised their work on a robotic hand that solves the Rubik’s puzzle one-handedly.

The work has been met with mixed reactions. While the demos presented on OpenAI’s blog post impressed both public and experts with the elegance and robustness of the robotic arm, some concerns have been voiced from the AI community over content that can be misleading readers. Indeed, a shallow reading of the blog post can give a wrong impression, especially to non-experts, regarding the purpose and outcome of this work. The post is accompanied by a research paper that has been uploaded on arxiv, an online non-peer reviewed repository, and can help with getting a better picture. It is safe to assume, however, that the greatest part of the impact so far is due to the blog post and, primarily, to its title and accompanying demos.

Analysing OpenAI’s work and its ramifications is harder than it looks at first sight; sort of like solving the Rubik’s cube. In this post, we will dive into a deep reading of this work and try to separate the different underlying discussions: the motivation behind this experiment, the scientific contributions and the subtle relationship between research and publicity.

Most importantly, we want to deviate from the internet culture of quickly passing judgement on a matter based on some one-sided analysis. For this reason, we will try to explore the multiple dimensions of this work by going a bit back in history of the field, examining the contemporary state of AI and trying to read into the future of this work.

A brief history of Rubik’s cube

The alleged protagonist of this work was invented by Ernő Rubik, a Hungarian professor of architecture who enjoyed intellectual challenges:

Space always intrigued me, with its incredibly rich possibilities, space alteration by (architectural) objects, objects’ transformation in space (sculpture, design), movement in space and in time, their correlation, their repercussion on mankind, the relation between man and space, the object and time. I think the CUBE arose from this interest, from this search for expression and for this always more increased acuteness of these thoughts …

It wasn’t after the cube was materialised that its creator realised how big of a puzzle it was: with 43,252,003,274,489,856,000 possible configurations and only one correct solution, Rubik’s cube attracted both game-enthusiasts and scientists. From a mathematical perspective, this puzzle belongs to the area of combinatorial optimisation. After many developments, Herbert Kociemba devised an algorithm that is widely used today to solve the cube, as it can find the correct configuration in 20 moves, known as the God’s number (a catchy way to say that even an omniscient player could not solve this puzzle quicker).

To best understand OpenAI’s work, it is easier to first get rid of some of the misconceptions many people ended up with after reading the blog post (don’t worry if you are one of them, experienced researchers have also fallen victims).

What did OpenAI not do?

1.Use artificial intelligence to solve the puzzle.

Contrary to what can be understood by the title of the post and paper, OpenAI did not use reinforcement learning algorithms to find which moves the robotic hand should perform to solve the puzzle. Instead, Kociemba’s algorithm was used to determine the moves and reinforcement learning to control the robotic hand. While this may strike as disappointing, there is in fact no practical motivation (except perhaps for some intellectual curiosity) for using AI in this task. We already possess an algorithm that can find the optimal solution in the smallest possible number of steps without requiring cognition. In contrast, reinforcement learning is a machine learning approach to solving tasks by interacting with the environment. It is very inefficient when compared to many other techniques, but brings the benefit of operating in the wild: no training data or model of the problem is necessary. For this reason, it has found a fertile ground in robotics applications, where robots need to interact with the real-world. OpenAI’s choice of using it to control the robotic hand is, therefore, an anticipated choice.

From OpenAI’s blog post: Neural networks are solving the Rubik’s Cube

From OpenAI’s research paper: Kociemba’s algorithm is solving the Rubik’s cube, whereas the OpenAI works on the problem of controlling the arm

2. Manipulate the cube using computer vision.

As solving the cube is a visual task, one could imagine that the reinforcement learning algorithm controlled the hand using only video as input. However, the employed cube was “smart”, meaning that it was equipped with sensors and communicated with the arm using Bluetooth. This makes the control problem easier, as computer vision algorithms could introduce delays and uncertainty, making manipulation harder.

From OpenAI’s blog post: OpenAI presents the types of cubes it experimented with. In the last row, we can see a classical Rubik Cube that can be only manipulated using computer vision algorithms

From OpenAI’s research paper: success rate for the classical cube using only computer vision (last row) is a disappointing 0%

3.Choose to solve the task one-handedly to make it harder.

From the perspective of a human, solving Rubik’s hand with one hand is much harder than solving it with both of them. But one should not forget that artificial intelligence differs significantly from the intelligence of humans: while handling objects does not require from us putting conscious effort, AI algorithms perform intensive geometrical and control calculations to manipulate an object.

The problem only becomes harder when the number of hands increases: the number of parameters requiring control doubles, while the hands need to learn how to cooperate. Not to mention that the task of the learning algorithm now becomes puzzling: instead of just learning how to react to the environment, which includes the cube and any other interference, the algorithm needs to also learn the dynamics of the other hand. But the latter are dictated by the learning algorithm itself, which means that the algorithm will need to learn how to adapt to its own adaptation. This self-referentiality has been giving headaches to RL researchers for decades. One can deduce therefore that, although OpenAI could end up solving the puzzle quicker and more robustly using two hands, training them could become virtually impossible.

Media titles can be misleading by laying focus on the “wrong” aspects of a scientific discovery. From https://techcrunch.com/2019/10/15/watch-openais-human-like-robot-solve-a-rubiks-cube-one-handed/

What did OpenAI aim to do?

Removing the above misconceptions should not create the impression that OpenAI’s work lacked a purpose, but prepare us to appreciate the actual contributions. In fact, OpenAI’s work was pioneering. This is what probably made it hard to make it understandable to the public and led them to mispresenting it.

The objective was that of creating a general-purpose robotic arm. The hand could have been performing any sort of task; solving the Rubik’s cube just made for a well-defined problem that required quick reflexes and skilful manipulation.

This research area is today little-understood and unexplored. While deep learning has proven its efficacy in applications such as computer vision, natural language processing and discovering patterns in big data, robotics are still far from achieving human-like behaviour. Acting in real abstract and unpredictable environments with uncertainty is considered out of reach for today’s robotic systems. If you are not convinced, here are some robots from the 2015 DARPA robotics challenge:

What did OpenAI achieve?

Naturally, there is a disparity between the cherry-picked videos of successful use cases and the averaged performance metrics. As reported, the arm had a success rate of 60% in easy tasks and 20% when the cube required 26 rotations to be solved. When reading these results one should keep in mind that a failure was not an inability to solve the puzzle, but was due to the hand dropping the cube or taking an intolerable amount of time to perform a move.

At a first glance, these numbers may sound discouraging. However, considering how young this scientific area is, it can be seen as an important first step that can encourage future research.

The main theoretical contribution of this work towards general-purpose robotics was a technique called adaptive domain randomization. This technique builds upon two older concepts in the AI literature:

Domain randomization: As conducting real-world experiments is often expensive and dangerous for the equipment, robots are traditionally trained in simulated environments. A problem that can arise here is the disagreement between the simulated environments and the real-world. Although the learning algorithm is allowed a lot of time for experimentation, when transferring the learned behaviour to the real-world performance can be unexpectedly disappointing due to imperfections in the model and uncertainty in the environment. Using domain randomization, researchers allow for some randomness in the variables used to simulate the environment and train the reinforcement learning multiple times. The main motivation behind this idea is that, if enough randomness is encountered during simulations, then the robot will be able to handle any real world setting.

Curriculum learning: another long-standing technique in the AI literature, curriculum learning employs the idea that a machine learning algorithm can learn more efficiently if it is presented with a problem of increasing difficulty. The learning algorithm starts by solving the easier problems, which gives it enough knowledge to solve the more difficult ones.

In adaptive domain randomization, the reinforcement learning algorithm is trained on randomly generated environments, with randomness increasing with training time. This procedure is automatic, which means that the practitioner does not need to tune the level of randomness.

This technique appears to be a promising way towards improving the efficiency of simulations used for learning; only the test of time will tell if it will have a large impact on robotics. Of course, we should not expect that it will solve all our problems, as there exist many robotics applications, such as autonomous vehicles, where simulations are not adequate and the algorithm needs to learn by interacting with the real world.

Before we leave the scientific analysis of this work, we should answer a question that many readers perhaps have: why is this hand an example of a general-purpose robot? We would all certainly be more convinced by a robot hand that solves the Rubik’s puzzle, then writes a novel and then proceeds to prepare a dry Martini.

The, slightly disappointing, answer is that we are very far from achieving this kind of behaviour. The ability of this hand to operate under unexpected conditions, such as wearing a rubber glove, having some fingers tied or being obstructed by a plush giraffe, is already a long step from today’s highly specialised robotics.

A plush giraffe certainly qualifies as unexpected conditions.

Research and ethical publicity

Negative reactions from the academic community are not aiming at the work of OpenAI itself, but at its presentation in the blog post.

The Batch, the mailing list of deeplearning.ai, also comments:

“Although OpenAI’s report focused mostly on robotic dexterity, critics accused the company of overstating its claim to have taught the robot to solve Rubik’s Cube. Kociemba’s algorithm is more than a decade old and doesn’t involve learning, they pointed out, and the cube included Bluetooth and motion sensors that tracked its segments. Moreover, despite training for virtual millennia, the robot can’t do anything more than manipulate the puzzle … It’s important to set realistic expectations even as we push the boundaries of machine intelligence.”

Admittedly, a blog post is a different form of communication from a research paper. But crossing the boundaries between simplifying and misleading is not hard, and appears to be the case here.

Being a research company, OpenAI is naturally aiming at attracting publicity, and, as a result, funding. This is a complex ethical area. Of course, attracting publicity is not bad in itself, as research often needs disproportional funding, compared to its output, in order to take off. One thing is however certain: preferring short-term publicity over long-term trustworthiness should be on no one’s agenda.

In order for research to have meaningful impact, it needs to be understood. At times when too much hype over a discovery can cloud the discovery itself, researchers need to put extra effort into making their findings clear.

If anything, the community needs to be conscious of its past and remember that, setting the expectations too high and attracting funding that did not materialise into progress, is exactly what brought the previous AI winters.

Applied Data Science Partners is a London based consultancy that implements end-to-end data science solutions for businesses, delivering measurable value. If you’re looking to do more with your data, please get in touch via our website.