Demystifying the Next Generation of AI-Enabled Robotics

Osaro AI

Published in

AI Software for Industrial Automation

8 min readJan 3, 2019

by Bastiane Huang

People tend to imagine a wide range of different machines when they talk about robots: Pepper, a social robot from Softbank; Atlas, a humanoid that can do backflip made by Boston Dynamics; the cyborg assassin from the Terminator movies; and the lifelike figures that populate the television series West World. People who are not familiar with the industry tend to hold polarized views. Either they have unrealistically high estimations of robots’ ability to mimic human-level intelligence or they underestimate the potential of new researches and technologies.

Over the past year, my friends in the venture, tech, and startup scenes have asked me what’s “actually” going on in deep reinforcement learning and robotics. They wonder: how are AI-enabled robots different from traditional ones? Do they have the potential to revolutionize various industries? What are their capabilities and limitations? These questions tell me how surprisingly challenging it can be to understand the current industry landscape, let alone make predictions for the future. I am writing this article with a humble attempt to demystify AI and deep reinforcement learning enabled robotics, topics that we hear a lot about but understand superficially or not at all. To begin, I’ll answer a basic question: what are AI-enabled robots and what makes them unique?

Robot Evolution: From Automation to Autonomy
“Machine learning addresses a class of questions that were previously hard for computers and easy for people, or, perhaps more usefully, hard for people to describe to computers.” — Benedict Evans, a16z.

The most important difference that AI brings to robotics is enabling a move away from hard-programmed automation to true self-directed autonomy. You don’t really see the difference if the robot only does one thing. However, if the robot needs to handle a wide variety of tasks or respond to humans or changes in the environment, it needs certain levels of autonomy. We can borrow definitions below that are used to describe autonomous cars to explain the evolution of robots.

Level 0 — No Automation

People operate machines and no robots are involved. Robots are generally defined as programmable machines capable of carrying out complex actions automatically.

Level 1 — Driver Assistance

Single automated operation. A single function is automated but does not necessarily use information about the environment. This is how robots are used traditionally in the automotive or manufacturing industries. Robots are programmed to repeatedly perform specific tasks with high precision and speed. Until now most robots in the field have not been capable of sensing or adapting to changes in the environment.

Level 2 — Partial Automation

Machine assists with certain functions using sensory input from the environment to make decisions. For example, robots can identify and handle different objects with a vision sensor. However, traditional computer vision requires pre-registration and clear instruction for each object and lacks the ability to deal with changes, surprises, or new objects.

Level 3 — Conditional Autonomy

Machine controls all monitoring of the environment but still requires a human’s attention and instant intervention.

Level 4 — High Autonomy

Fully autonomous in certain situations or defined areas.

Level 5 — Complete Autonomy

Fully autonomous in all situations.

Where Are We Now In Terms of Autonomy Level?

Today most robots used in factories are open-looped, or non-feedback controlled. That means their actions are independent of sensor feedback (level 1). Few robots in the field take and act based on sensor feedback (level 2). A collaborative robot, or cobot, is designed to be more versatile and able to work with humans; however, the trade-off is less power and lower speeds, especially when compared to industrial robots. Although the cobot is relatively easier to program, it’s not necessarily autonomous. Human workers need to handhold a cobot every time there’s any change in the task or environment.

We’ve begun to spot pilot projects with AI-enabled robots (level 3/4). Warehouse piece-picking is a good example. In shipping warehouses, human workers need to pick and place millions of different products into boxes based on customer requirements. Traditional computer vision cannot handle such a wide variety of objects because each item needs to be registered and each robot needs to be programmed beforehand. However, deep learning and reinforcement learning now enable robots to learn to handle various objects with minimal help from humans. There might be some goods that robots never encountered before and the machines will need help or demonstration from human workers (level 3). But the algorithm will improve and get closer to full autonomy as the robot collects more data and learns from trial and error (level 4).

Like the autonomous car industry, robotics startups are also taking different approaches. Some believe in a collaborative future between humans and robots and focus on level 3. Others believe in a fully autonomous future and want to skip level 3 and focus on level 4 and eventually level 5. This is one reason why it’s so difficult to assess the actual level of autonomy. A startup could claim that it’s working on level 3 human-centered artificial intelligence (e.g. teleoperation) while the solution is actually mechanical turk. On the other hand, startups targeting level 4/5 automation cannot achieve desirable results overnight, which could scare early adopters away and make data collection even more difficult in early stages.

The Rise of AI-Enabled Robots: Warehouses and Beyond

The bright side is that unlike cars, robots are used in a lot more use cases and industries. As a result level 4 is more accessible for robots than it is for cars. We will see AI-enabled robots up and running in warehouses first because warehouses are semi-controlled environments and piece-picking is a critical but fault-tolerant task. Autonomous home or surgical robots will roll out much later in the future because more uncertainties exist in the operating environment and some tasks are not reversible. We will see more AI-enabled robots being used across more scenarios and industries as the precision, accuracy, and reliability of the technology improves over time.

Currently there are only around three million robots in the world, most of which are used for handling, welding, and assembly tasks. So far, almost no robot arms are used in warehouses, agriculture, or industries other than automotive and electronics. The main reason is the limitation of the traditional robot and computer vision mentioned above. For the next few decades, we will see explosive growth and a changing industry landscape brought by next-generation robots as deep learning, reinforcement learning, and the cloud unlock the potential of robots. Not all industries adopt automation at the same pace because of incentives of current players and technical complexities mentioned above.

Next Generation AI-Enabled Robotics Startup Landscape

What are some of the growth opportunities in the AI-enabled robotics sector? And what are the different approaches and business models taken up by startups and incumbents in this market? Below you will find an overview of some example companies in each segment. This is by no means a landscape that includes all companies and I welcome input and feedback to make it more complete.

Next Generation AI-Enabled Robotics Startup Landscape

Vertical vs. Horizontal

The most interesting findings I have discovered by looking into the startup scene is that there are two fundamentally different approaches. The first one is vertical. Most startups in Silicon Valley focus on developing solutions for specific vertical markets such as e-commerce fulfillment, manufacturing, or agriculture. This full-stack approach makes sense because the technology is still nascent. Instead of relying on others to supply critical modules or components, companies build the end-to-end solution which is faster and gives them more control over the end use cases and performance.

However, scalable use cases are not that easy to identify. Warehouse piece-picking is low hanging fruit with relatively high customer willingness to pay and technical feasibility. Almost every warehouse has the same needs for piece-picking. But in other sectors like manufacturing, assembly tasks could vary factory by factory. Tasks carried out in manufacturing require higher degrees of accuracy and speed than those carried out in warehouses. Even though machine learning allows robots to improve over time, robots that operate through machine learning still cannot achieve the same accuracy as close-loop robots because they require learning from trial-and-error. This is why startups such as Mujin and Capsen robotics choose to use traditional computer vision rather than deep reinforcement learning. However, traditional computer vision requires every object to be registered so the training time, flexibility, and unit economics don’t really make sense. And once deep reinforcement learning reaches the performance threshold and becomes the industry mainstream, this kind of traditional approach could become irrelevant.

Another issue with these startups is that their valuation tends to be high. We often see startups raising over tens of millions of dollars in Silicon Valley without the promise of any significant revenue stream. It’s easy for entrepreneurs to paint a rosy future of deep reinforcement learning, but the reality is that it will take us years to get there. Venture capitalists bet on teams with good talents and technologies even though these companies are still far away from making revenue.

A more practical but rarer approach is to go horizontal, building tech stack and enablers that can be used across different industries. We can simplify robotics technology stacks into three components: sensing (input), processing, and actuation (output). And there are development tools in addition to these. I use the term processing loosely here to include everything not in sensing or actuation, including the controller, machine learning, operating system, and modules for robots. This is the segment I think has the most potential for growth in the near future. One pain point for robotics customers is that the market is extremely fragmented. All robot makers have their proprietary languages and interfaces, making it difficult for system integrators and end users to integrate robots with their systems. As the industry matures and more robots are used beyond automotive and electronics factories, we need standard operating systems, protocols, and interfaces for better efficiencies and shorter time to market. A number of startups in Boston are working on this modular approach. For example, Veo Robotics develops safety modules to allow robots and human to work together and Realtime Robotics provide solutions to accelerate motion planning.

These are just a few observations I have from working and talking with experts in the industry. I am looking forward to hearing more about your thoughts and exchanging notes with more entrepreneurs, experts, and investors in this space.

Bastiane Huang is an MBA student at Harvard Business School. She has worked for Amazon in its Alexa group, as well as Osaro, a deep reinforcement learning robotics startup in San Francisco. She has also worked with Harvard Business Review and the university’s Future of Work Initiative. Follow her here on Medium.

Version of this article featured in the Robotics Business Review

Demystifying the Next Generation of AI-Enabled Robotics

Where Are We Now In Terms of Autonomy Level?

The Rise of AI-Enabled Robots: Warehouses and Beyond

Next Generation AI-Enabled Robotics Startup Landscape

Next Generation AI-Enabled Robotics Startup Landscape

Written by Osaro AI