Reinforcement Learning for Robots - Part 1

Matthew Brown
4 min readFeb 7, 2019

--

Who doesn’t want to see robots learning to do things? Well I do.

I am going to document my journey to get a robot to learn to move. The blog wont be a tutorial but it will be detailed enough that you could follow along.

Most of the programming will be in Python but there will be some Arduino and other odds and ends. Maybe some cad and 3d printing if I get really motivated.

The Idea

For reinforcement learning I needed a robot and a way of monitoring its progress. To be able to use reinforcement learning the robot would have to be able to attempt the activity and then reset itself, 1000s of times.

The high level idea is to use a RGB camera to view the pen. The robot will have wheels and move around a the pen chasing a virtual ball. Eventually it would be good to see if he could learn to navigate around obstacles and even move an actual ball round. Easy right?

The Monitor

I am using a Raspberry Pi with a Pi Camera attached. After putting the Pi and Camera in some 3d printed cases I blu-tacked the Raspberry Pi and camera to the wall. I didn’t realise how far it would need to be away!

The Robot

The requirements for the robot started with it been cheap. It should also be controlled remotely and be precise in how far it moves each step. The other challenge is that the robot will need to run for hours and hours so it needs to be connected to mains power or charge itself.

For the main parts I settled on:

  • Wimos D1 mini: An arduino compatible WiFi chip that can run a basic web server. I wish I chose one with more IO because I ended up adding 2 shift registers and a analogue multiplexer.
  • 18650 battery: including a basic charge circuit off eBay and a buck-boost converter to get to 5v.
  • 28BYJ-48 2003 Stepper Motor: These are the cheapest stepper motors you can find on eBay.

I designed the robot in cad and 3d printed it. Putting it together was challenging as it was way to complex with 3 levels I could pull apart.

Meet Soccer Bro!

First Steps — Get the video stream on my PC

So while I would like to get all of the runtime on the Raspberry Pi in the end I don’t really want to be building ML models on a Pi. To start with I need to stream the video from the Pi to my PC.

After some searching I found RPi-Cam-Web-Interface it was super easy to install following the steps on the site and provides you with a web interface with a live stream. If you navigate to your Pi’s IP address you will see something like the following.

There are no instructions on how to use python to view the video stream but after some time in chrome developer tools I discovered the browser calls the following url several times a second. The url returns a image which is then used to update the video stream on the site.

http://192.168.0.11/html/cam_pic.php?time=1549542272770&pDelay=200000

We can do this in Python using OpenCV!

So the below code downloads the image, converts it to a numpy array, converts it to a OpenCV image, resizes the image to the correct size and then shows the image. It then does this over and over again.

import numpy as np
import cv2
import urllib.request


while True:
url = "http://192.168.0.11/html/cam_pic.php"
with urllib.request.urlopen(url) as url:
image = np.asarray(bytearray(url.read()), dtype="uint8")
image = cv2.imdecode(image, cv2.IMREAD_COLOR)

image = cv2.resize(image, (1920, 1080))

cv2.imshow("Image", image)

cv2.waitKey(100)

So we have a image! But it doesn’t look great. So what we can do is go to the web interface and modify a few settings, specifically:

  • ISO: 100
  • Preview Quality: 100
  • Preview Width: 1920

Now we have a half decent video stream to work with! These calls are actually just get requests to the pi so we could add these to our above script so that every time we start the program we make sure it is configured correctly.

A crop of the stream on my PC

That is it for this blog. Next blog I am going to try and find the location of Soccer Bro. This will be important as we need to know how close he got to the virtual ball in order to work out how to reward him.

--

--