Build a Self-Driving Q-Bot

All it takes is a 3-d printer, some Python, and lots of trial and error

(see also the GitHub repo for Raspberry Pi or BeagleBone Blue)

You don’t need to build a self-driving car to create something that drives around, learning as it goes. All you need is a simple reinforcement learning algorithm like Q-Learning, a handful of parts, and a little Python. Our self-driving vehicle may not travel far, but it does learn its way around using a laser rangefinder…

…and OK, it has rubber bands for tires.

To experiment with reinforcement learning, the robot just needs a task, like: There’s a box on the floor. It’s around here someplace. Go find the box. Or, in general: learn how to find the nearest object.

The basics of Q-Learning

Q-Learning is all about trial and error (for background, check out this article or this Python course). The main idea behind Q-Learning is to represent our robot’s state, then based on its state, choose an action. If our action achieves a goal we get a reward; if we make a mistake, we get a penalty. Our cumulative rewards for each pair of states×actions get stored in an array called a q-table, allowing us to learn from past mistakes. Some clever math lets us find our way toward a reward, even if the reward is several steps away. After that, it’s just trial and error.

The algorithm works great, as long as we don’t have too many dimensions to worry about. Problems like Tic-Tac-Toe are OK — having just under 200,000 total possible states×actions— but backgammon is too complicated.

Designing a Q-Bot

OK… so what hardware setup lets us start out simply, then ramp up the complexity, without having to rebuild anything?

Remember: to take advantage of Q-Learning, the robot needs to see the world in terms of states and actions. To find a box somewhere on the floor, a helpful state would be a scanner that senses distance, in a 360-degree sweep around the robot. Actions are easy: we just turn the wheels. To keep things q-friendly, we will reduce continuous states or actions into discrete states or actions… more on that below, once we finish the build.

Our design (which is on here GitHub) has:

  1. A tiny linux computer like a BeagleBone Blue, or a Raspberry Pi Zero-W or B+. Those systems run Python and can control the rest of the hardware.
  2. Two DC drive motors to turn the wheels independently.
  3. A really amazing laser rangefinder that we got from SparkFun… or an ultrasonic range finder, which isn’t as amazing but saves around $15.
  4. A small stepper motor to create a rotating mount for the range finder.
  5. A battery because, batteries.

Don’t underestimate that (pretty simple) build. Because we can program the rangefinder to rotate through any set of arcs, we can configure any number of states. Because we can program the drive wheels to turn any which way, we can configure any number of actions. We can create countless fascinating time-wasting crazy configurations without changing the hardware. Ask me how I know.

Discrete v. Continuous State and Action Spaces

Let’s design our Q-Bot inside out, starting with the math, so we don’t build something that turns out to be too complex to train. To fit neatly into a q-table, states and actions need to be discrete, not continuous. A possible range of values is called a space; a continuous space can contain infinite values, while a discrete space reduces the world to a finite number of choices:

  • continuous emotional space: I was fine until I went home and found that the dog had stolen one of my shoes, so I couldn’t go for a walk like I planned, but he’s a good dog, so all things considered, on a range of 1 to 10, my mood was like a 6.12641343…
  • discrete emotional space: [0=sad, 1=happy]; mood > 0.5 = happy

Or, in the case of the robot, if it were to scan for distances in 60-degree increments through a full 360-degree arc to detect the closest object:

continuous and discrete states
continuous =          ----   ----   ----   ----   ----   ----
sensor position 0 1 2 3 4 5
sensor degrees 0 60 120 180 240 300
distance reading 11.4
3.6 10.2 14.3 12.8 9.5
discrete = 1 ...the closest object is at sensor position 1

By reducing the dimensions from continuous=infinite to discrete=6, we can build a q-table with a manageable size… as long as the state contains enough information to allow the robot to achieve its goal. What would be the minimal discrete states and actions to find a box somewhere on the floor?

first design goal for our Q-Bot: minimal discrete states & actions
states: my robot's sensors can tell me if the box is generally... 
(1) in front
(2) to the left
(3) to the right
actions: my robot can...
(1) go straight ahead
(2) turn left by 30 degrees
(3) turn right by 30 degrees
that's 3 states and 3 actions, so states x actions = 3x3 = 9

Reward Function

A q-table is a place to accumulate rewards or penalties, so our only other math requirement is a reward function that encourages the robot to achieve its goal. As simple as that sounds, it’s incredibly powerful: if you want the same robot to solve a new problem, just change the reward function. There’s no need to code new solutions; just let the robot re-train itself by trial and error.

In our case, the reward function is the reduction in distance to the target object (so it’s a positive value when you get closer):

# reward function: getting closer to object is good, farther is bad
prior_distance = distance to object from step n-1
current_distance = distance to object from step n
reward = prior_distance - current_distance

The q-table has discrete dimensions but contains continuous values, so there’s no need to worry about the reward function having a continuous output.


Q-Learning trains in iterations, by trial and error. Like many machine learning algorithms, Q-Learning explores possibilities by using random guesses, then exploits past actions by recording which lead toward a solution. If a problem is simple enough, we can just turn the robot loose on the floor and let it sort itself out; if a problem has more than a trivial number of dimension, it’s better to train using a simulation, then drop those results onto the physical robot.

(GitHub includes a single environment for either a virtual or physical robot)

Let’s look inside a q-table as the QBot trains. For starters, here’s how our virtual Q-bot responds to training with the minimal 3×3 set of states×actions:

Sensor sweeps in 120-degree increments

After 100 iterations, our robot accumulates a q-table that shows which actions are most rewarding from each state. The values are relative scores, where the highest value wins:

Contents of 3x3 Q-Table after 100 iterations

If the robot were guided by the q-table (above), it would favor these actions for each state:

  1. Object is in front→Go Forward (score = 51.26)
  2. Object is to the left→Turn Left (score = 48.30)
  3. Object is to the right→Turn Right (score = 47.94)

That may seem too simple to be interesting, but the you can increase the complexity of the problem (and therefore the size of the q-table) without changing any code. Q-Learning is one way to create an adaptable robot; rewards, states, and actions fundamentally define the problem.

For example, let’s try to find a more direct route to the object. We could take a greater number of senor readings per revolution — say 12, instead of 3 — and see how that goes:

Sensor sweeps in 30-degree increments

Good news: the robot is a lot more efficient than it was in the first trial, because it has a clearer view of which direction is which. Bad news: it takes more iterations to complete its training, because it now has a q-table that’s 12×3=36 (states×actions).

(the other good news is that the GitHub repo includes code for both a virtual and a physical Q-Bot, so you can experiment without killing the battery)

…and now the fun starts.

The minimal Q-bot kind of begs the question: what states or actions could you add, to allow the robot to complete more tasks?

  • actions are easy: how about move backward? move fast? turn sharply?
  • states are more subtle: what could you do with multiple obstacles? how about an accelerometer, a gyroscope, a look-down sensor, or a GPS?

The fundamental limitation will always be the curse of dimensionality. If we introduced too many states or actions, the size of the q-table will explode. Still, you could build some wacky machines and use Q-Learning to make them go.

How about a robot that looks for cats?

We would need a neural net for that.