Autonomous Mars Rover using Raspberry Pi, Arduino and Pi Camera

Rishav Rajendra
7 min readApr 26, 2019

For the 2019 IEEE Region 5 Autonomous Robotics Competition, my friends and I from the UNO Robotics Club built an autonomous robot powered by a Raspberry Pi 3B+, an Arduino Mega 2560 and Pi Camera V2. We used the Tensorflow Object Detection API for real-time object detection. In this article, we discuss some of the major elements to reproduce our robot.

We also released all our 3D models, source code along with the data we used to train our model and the auto-labeling tool we used to label images.

Note: no prior knowledge of machine learning is required to read this article.

Robot in Action

The Competition

The robot runs on an 8x8 board with obstacles, targets and the mothership placed in random locations. The targets are marked with capital letters to distinguish them and every target has a designated space in the mothership. We were tasked to locate and identify targets around the board, pick them up and deliver it inside the mothership in correct spots without hitting any obstacles.

Overview of the field

To map all the obstacles, targets and the mothership, we opted to do a 360 spin at the start of every round and map every object on an 8x8 tile grid system with each tile being 12x12 inch square. To navigate around the tiles with objects, we use the Grassfire Algorithm (breadth-first search) due to the relatively small size of the field. To double check our mapping algorithm during the initial scan, we tried to map objects before every movement and remap missed objects.

Hardware

View from the onboard camera

Image Processing

As a Raspberry Pi 3B+ takes around 30 seconds to load a Tensorflow model and allocated time per round is only six minutes, we decided to use a single detection model to perform all the tasks rather than have specific models for specific tasks. As this model had to perform a variety of tasks, we had to train it with a large number of labeled images. Twenty-three thousand three hundred and twenty-five to be exact. Labeling such a large number of pictures would have taken a very long amount of time. To significantly reduce the time spent labeling images, we developed our own labeling tool. We detail how this labeling tool works below.

Auto-labeler process

PyAutoLabeler

Labeling images is the worst part of developing your own custom machine learning model but also one of the most important. Our main idea behind this labeling tool is to train a slow object detection model with a high mAP and use that model to label additional images and increase the size of the dataset. Our aim is not to completely remove manual labeling but to work hand in hand with popular labeling tools like LabelImg and significantly cut down the overall time taken. To get the best results, label around 100 images for each class and train the slower VGG16 based model. Use this VGG16 based model to then label thousands of images for you in a few minutes. After all the images have been labeled by the tool, you can go through each image in LabelImg and check for any errors or fix the bounding box placements. Now you have thousands of images with which you can train your Mobilenet SSD models which can be deployed on less capable hardware like the Raspberry Pi or smartphones.

Tensorflow Object Detection API

We performed several steps before training our Tensorflow model so that we could extract maximum performance from the model. We limited all the image sizes to 300x300 to reduce the input dimension to drastically cut down the number of learned parameters. That simplifies the problem and accelerates both training and prediction time. We experimented with a lot of detection models to find the perfect balance between accuracy and speed. For us, we got the best results from the SSD Mobilenet V2 model. To validate the hyper-parameters choice, we split the dataset into 3 subsets: a training set (70%), a validation set (20%) and a test set (10%). We kept the model with the lowest error on the validation set and estimated our generalization error using the test set. We highly recommend tutorials by EdjeElectronics and Sentdex on Tensorflow Object Detection API.

Software

The Raspberry Pi is the brain of our robot and controls all the image processing and movement. It communicates with the Arduino with specific movement instructions like move forward 10 inches or turn left 30 degrees via serial communication.

Distance Calculation

In order to calculate the distance to the objects from the front of the robot, we use the size of the object in real life and the data we got from our detection model. We also know the focal length(f) of our camera. Since the Pi Camera V2 has a pin-hole sensor, the distortion in images is pretty substantial. To take this distortion into account, we use OpenCV’s camera calibration function to get the focal length times scaling factor (f_x and f_y) with which we can calculate the pixels per millimeter (m) on the image sensor by dividing f_x and f_y with f and taking an average.

Distance formula to calculate the distance to the object

The above implementation looks like this in 50 lines of Python:

Picture divided into two quadrants for angle calculation

Angle Calculation

Calculating the angle to the object from the center of the front of the robot was relatively easier than calculating the distance. We divided the pictures into two equal parts vertically from the center of the picture such the left part is the negative quadrant, the right part is the positive quadrant, and the center is 0 degrees. To calculate the angle of the object to the center of the picture, we follow the following steps:

  • Draw a line from the bottom center pixel of the picture, (x_max, y_max), to the upper center pixel of the picture,(x_min, y_min).
  • Calculate the slope (m1) between the two points using the formula (y_max-y_min)/(x_max-x_min).
  • Similarly, calculate the slope(m2) from the mid_point the object detected to the bottom center of the picture using the formula (y_mid-y_min)/(x_mid-x_min).
Formula to calculate the angle between the center of the robot and the object
Camera view close to the mothership

Mothership Alignment

The mothership was our Achilles heel for this project. It was frustratingly difficult to properly map and align with the mothership. To solve this issue, we used the perception of depth for our camera to our advantage. As you can see from the figure, as the letters got farther, they went higher giving us enough information to calculate the angle between the front of the robot and the mothership. For example, for the figure above, we would draw a line from the mid-point of blockC to the mid-point of block B and a line from the mid-point of blockC to the left corner of the picture making a right triangle. We then use the same formula we use for angle calculation above to calculate how much the robot has to turn to align perfectly with the mothership.

Python code for the mothership angle calculation

Conclusion

This was a very hard and emotionally draining competition. Lack of clarification of rules and constant rule changes made it very hard to have a steady approach. But in my opinion, we ended up with a very good robot which can be replicated easily on a low budget to perform a magnitude of activities. We enjoyed building this robot very much and hopefully, it will help you, dear reader, to expand on our project and build exciting new things.

Huge shoutout to Benji Lee, Michael Ceraso, James Winnert and Wade Rivero . Without these beautiful gems, this project would not have been as fun as it was.

We hope you enjoyed this article. Share it with people you think will find it useful. If you have any questions or remarks, feel free to contact us. Check out UNO Robotics’s website for free robotics workshops or to show your support.

--

--