Deep Reinforcement Learning: From Toys to Enteprise

6 min readNov 1, 2017

Reinforcement learning is an increasingly popular machine learning technique that is particularly well suited for addressing problems within dynamic and adaptive environments. When paired with simulations, reinforcement learning is a powerful tool for training AI models that can help increase automation or optimize operational efficiency of sophisticated systems such as robotics, manufacturing, and supply chain logistics.

However, moving from the games commonly used to demonstrate these techniques into real-world applications isn’t always straightforward. Structuring solutions to move beyond purely data-driven training introduces all sorts of new complexity, requiring you to consider things like how to use simulations to target your learning objectives, what kinds of simulations are applicable, how to deal with long-running simulations, how to incorporate ongoing training refinement once deployed, how to account for scaling and performance, and ultimately how to bridge from simulation to the real world.

I was recently able to talk about how to effectively leverage reinforcement learning in real-world use cases at the O’Reilly AI conference in San Francisco. You can see my talk in full below, or keep reading to learn more about deep reinforcement learning and the problems it can solve.

What is Deep Reinforcement Learning?

Let’s first understand what we mean when we talk about deep reinforcement learning. Deep reinforcement learning (DRL) is different from supervised learning in that you have an agent interacting with environment. Once it interacts with the environment, it gets an assessment of a reward function for its interaction with that environment and that then drives subsequent behaviors.

The challenge with DRL is different because you don’t know what the correct answer is. With supervised learning, it’s learning because you’re telling it the right answer. But RL models learn by exploration. The system has to explore the environment, and understand what moves it can make in order to achieve the outlined reward objective. You don’t tell the system, “at this point in time, the right move to make is X.” Instead, you ask the system, “Did you achieve the overall end objective that I set out for the agent to accomplish?”

Deep Reinforcement Learning + Games

It’s very natural to think of games when you think of deep reinforcement learning. Games are, by construction, environments where the players have to interact with the game.

You’ve probably seen reinforcement learning models playing games like Lunar Lander. Training a DRL model to play Lunar Lander is actually part of the getting-started tutorial in the Bonsai Platform. Games are a great way to get a feel for reinforcement learning technology and understand how it works.

The Bonsai Platform used DRL to train an AI model to play Lunar Lander

Enterprises, however, face different problems when trying to apply this technology to real-world systems. That’s what I want to talk about today, how we make the leap from games to the environment of the enterprise.

What is Industrial AI?

If we’re going to talk about “Industrial AI”, we should define what we mean by that term. Industrial AI techniques help enterprise companies, both commercial and industrial build control and greater optimization into their physical operations or systems. There are a number of use cases where industrial AI techniques are applicable — but if you look at the chart below you’ll see a few things that are different from the pure database scenarios that you run into with supervised learning.

For example, you have a lot of devices. Frequently the environment you’re working in is not a device, it’s a whole set of devices that need to interact.

Another thing to highlight is that these use cases typically require reinforcement learning technology and simulations or digital twins.

The AI Use Case Spectrum

The AI Use Case spectrum seen above the broad spectrum of problems to which AI technology is being applied today. But when we talk about industrial AI, we focus on the business problems on the left side of the chart. These problems tend towards optimization and automation of control systems, and away from the pure data analytics and prediction problems further to the right. Industrial AI use cases are rarely scenarios in which you go in with a large, curated, and labeled dataset. Instead, you’ll have physical equipment that you actually want to control or for which you want to optimize behavior.

Unique Requirements and Challenges of Industrial AI

Through this lens, you start to see a progression of how AI is being applied to these systems, and what that means for the business. It progresses from monitoring to maintenance, then to optimization and ultimately automation of those systems. This is a sequence regularly followed as enterprise engagements build more sophisticated AI into their industrial systems, and net greater return from its capabilities.

But building AI into systems requires unique techniques and technologies as industrial AI applications are fundamentally different in a lot of ways:

The state spaces are inherently large; sufficiently large that the reason enterprises turn to AI is because traditional programming techniques, even traditional dynamic programming techniques, are insufficient to solve these problems.
There’s a huge reliance on subject matter expertise. Every organization has domain expertise; it doesn’t make any sense to ignore that and make systems learn from scratch. Having techniques and capabilities that enable you to capture and use that subject matter expertise becomes highly important.
These systems are highly regulated
Safety matters
Downtime is a huge expense.
The stakes are incredibly high

From Games to Industry

If you want to watch streaming video and the recommendation is not to your liking, that’s not the end of the world. But if the AI system monitoring your airplane maintenance system to gauge when to replace engines gives you a false positive, you’ve cost yourself $200,000. The predictions of these systems are high stakes.

On top of that, you don’t want systems to get into states where things break. That’s equally expensive and equally damaging. You can’t simply deploy live AI models straight to the realsystem. First, you need to set up and connect a simulation or digital twin to build reinforcement learning models that are capable of solving real-world problems.

I’ll talk more about the landscape of simulators, and why they’re so important for enterprises building AI into industrial systems, in subsequent posts. Until then, you can view my O’Reilly AI talk below or download our whitepaper, “AI for Industrial Applications.”