Reinforcement Learning for Home Robotics pt. 1

A ROS Based Example

Gene Foxwell
Coinmonks
Published in
7 min readOct 14, 2018

--

Teach an Old Dog new Tricks?

Introduction

In the previous articles, I’ve covered the basics behind the Q-Learning algorithm. In the this series of articles I want to take what’s been done so far in this blog and combine them together using Reinforcement Learning to create simulated example of a home robotics platform in an environment simulated in Gazebo.

This may or may not be entirely successful. Part of the fun in robotics is experimenting with new ideas, and new ideas sometimes fail. However, at the very least this set of articles will introduce plenty of useful concepts for working with ROS, Machine Learning, and how we may be able to use RL for a practical project without having to resort to training a DeepRL model.

Note: Please don’t get the impression that I am saying there is anything wrong with DeepRL — using Deep Learning with RL is an amazing subject, and one that I may yet cover in this blog — its just not the technology I will be using for this specific experiment.

Approaches to Q-Learning

From this point there are now several directions we can take. What seems to be the traditional direction is to start expanding the RL approach towards ever more complex systems — for example learning from scratch how to play an Atari game, or teaching a robotic arm to pick up a box with no prior information on how to control the arm.

These approaches are interesting, and generate a lot of research (especially in the domain of using Deep Learning as applied to RL), and are rightfully getting a lot of attention in modern research. This isn’t, in my opinion, the only direction we can go, and before we start to look at Deep Learning (which would need to be its own set of articles to begin with), I want to look at an example of what we can do with the tools described so far.

Let’s start this process by looking at yet another imaginary world — this time, keeping with this blogs theme of investigating the design of home service robotics, we will start with a house.

Marvin’s Home

A virtual Home for Marvin

In the above image we can see a mock up of a floor plan for an imaginary home. This house is, at the moment, fairly sparse — it has three people (represented by stick figures), a can of soda (represented by the red obloid) and Marvin represented by the little robot icon in the top left hand corner of the image.

Marvin is our house robot in this case. In keeping with the goal of this blog to work towards a usable home robotics platform, Marvin will be given some responsibilities around the home consistent with those assigned to it in previous articles:

  1. It should be able to come when its called — so if a user signals its desire for the robots presence it will come towards them.
  2. It should be able to deliver (and in our case also pick up) objects for the user.
  3. It should leave any room where a user has signaled its not wanted (perhaps for privacy reasons).
  4. It should return to its home station (the upper left hand corner) when in need of charging.

Its pretty easy to see that there are many situations where these goals could be considered directly competing. Our goal for the this article and the next few after that is to try to use Reinforcement Learning to teach the robot how to handle any situation that comes up.

Let’s get started!

Assumptions

For this experiment I am going to assume a robot that has been built using the tools I’ve introduced thus far in this blog. So — before applying RL we already have the following features available:

  • A ROS enabled robot with the ROS Navigation package fully configured.
  • The robot is capable of responding to simple verbal requests.
  • The robot is capable of creating its own map.
  • The robot is capable of creating and maintaining a “Blackboard” for collating information collected by various ML and sensory systems.

Essentially, with the notable exception of the “subsumption” systems, we are assuming a robot with all the capabilities demonstrated for the “Marvin” robot so far described in this blog.

Environment

With the assumptions out of the way, let’s take a look at the environment from the RL Agent’s perspective. It is perhaps tempting to simple take the house itself as the environment for our RL agent, but I submit that this would cause difficulties that would be hard to overcome with the tools currently at our disposal*.

*: It might be possible to achieve this with a traditional Deep RL approach, but that is exactly what I am trying to avoid with the next few articles.

Instead, our agent is going to “live” in the graph world generated by our Blackboard subsystem. From this point of view, once mapping is complete, our world be better represented by the graph below:

A “Blackboard”

So the above “Blackboard” represents different reachable areas of the home as nodes on the map. Each node is then decorated with the list of objects that the robots various image recognition algorithms have recognized as being present. For the purposes of this experiment, we are not interested in how Marvin achieves this (maybe a magical genie tells it which objects are present and what the map looks like) only that it does. Marvin’s location on the “Blackboard” is represented by the green node.

Agent

Home Services RL Robot Robot Architecture V0

The primary issue for this formulating our RL problem this time will be the Agent itself. In the image above I have provided an example of the architecture I want to start working towards. This places the RL Agent in the “middle” of the robots decision making system, which input first being processed (perhaps by Deep Neural Networks, or classical Machine Learning algorithms) and the results of that processing placed on a shared Blackboard. The RL Agent then has access to the post-processed data on the Blackboard from which to make its decisions.

At the level of abstraction shown here, the RL Agent does not have direct access to any of the Robots low level systems — it controls the robot by deciding which sub-process should be active, and by extensions which ones should be inactive. These sub-processes will then control the robot.

For the first stab at this problem, lets go with the following behaviors:

  1. The Agent can go into “Fetch” mode, in which it identifies a pop-can in the node its in and picks it up.
  2. Agent can enter “find person” mode, in which it searches for the nearest user.
  3. Agent can enter charge mode, in which it returns to its charger.
  4. Agent can enter wait mode, in which it waits for instructions.

Its possible that these processes may be too complicated for Q-Learning to learn from, but that is part of the point of the upcoming experiments, to find out just how far we can abstract the RL Learning process from the physical robot.

We can reasonably find ways to give a standard home service robot most of these behaviors — the bit about picking up the soda can could cause some difficulty, but not enough to be of concern for this experiment. Deciding to navigate to a user, go to the pop-can node, or waiting for instructions can all be handled by providing reasonable parameters to the ROS Navigation stack and going forward we will assume that the system is doing just that.

Furthermore, at least for the first iteration of this problem we will assume that the world the robot lives in is static — so our “people” are basically non-moving manikins placed in the world to give our robot something to do.

Based on what we have alluded to so far in this article, the “actions” our agent can take amount to activating or deactivating the robots pre-programmed behaviors. On this basis the RL algorithm is taking the place of subsumption was in the previous design.

Plan

So that’s the problem we are going to try to tackle in the next few articles. Here’s the plan we’ll try to follow:

  1. We’ll build a new gazebo world for Marvin to exist in.
  2. We’ll build an “RL” node for the Marvin bot that will allow us to control Marvin and train it.
  3. We’ll add additional nodes to Marvin’s setup in order to get the information we need: presence of coke can, presence of individuals etc.
  4. We’ll train Marvin using the Q-Learning algorithm presented in the previous articles I’ve written on this subject.

Again, its worth noting that this might not create a successful robot — it could behave in all sorts of undesirable ways. The goal of this series isn’t to be successful, but to explore if this approach could work.

In the next article I’ll guide you through creating an environment in Gazebo that approximates the world described so far.

Until then,

Share and Enjoy!

Get Best Software Deals Directly In Your Inbox

--

--

Gene Foxwell
Coinmonks

Experienced Software Developer interested in Robotics, Artificial Intelligence, and UX