DIY Telepresence Robot: Part 4

Published in

Meseta builds Robots

8 min readAug 10, 2020

The software that drives it

In this final, section, I talk briefly about the software that drives the robot. It’s been 4 years since I built it, so unfortunately I can’t do a deep dive into the specifics, and today I would do things differently.

The software control has five systems:

Motor drive — containing the code to move the wheels of the robot
Auxiliary systems — containing the code for all the other bits of the robot: the head unit, the lights, the power control to the tablets, and the laser
Camera stream — the webcam streams from the head unit, and the downward-pointing navigational camera
The webserver — the webserver that serves the interface page, and the websocket server that acts as a bridge from control data to the rest of the components
Web frontend — the javascript frontend for allowing the user to control the robot.

All of these units are tied together using a ZeroMQ pub-sub, allowing me to decouple each of the components. For those unfamiliar, zmq is a very lightweight messaging library that helps different processes communicate. It’s much easier, and probably better to use a messaging library such as zmq to allow different software systems to communicate with each other, and helps with separation of concerns: instead of having to maintain a big complex application that has to deal with multithreading so that the various bits of it can work at the same time as various other bits, I can simply write several much smaller programs whose jobs are very well-defined, and let the messaging library deal with getting control signals between them. This is very much the same argument for microservices in web backend stacks.

Motor drive program

The motor drive is a very simple python script whose only job is to read the joystick input from the zmq pub-sub, and translate that into motor1 and motor2 speed controls, which controlled how fast each motor turned.

There was an interesting question of how do I map the joystick axis data to actual motor movement. Since the base of the telepresence robot is basically a tracked tank vehicle, when I put the stick to the right, I want the robot to turn right on-the-spot by sending the left motor forward, and the right motor backward. It turns out that with a control scheme like this, the math worked out quite elegantly.

In the diagram above, the joystick is full-forward, and that maps to both left and right motors throttled to 100%, moving the bot forward.

With the joystick in the top left corner, the left motor remains at 100%, but the left motor is at zero, this puts the robot into a right-hand turn.

With the joystick to the right, the left motor remains at 100%, but hte left motor goes into full reverse, this causes the robot to pivot to the right on the spot without moving forward. This turns out to be very useful for a teleprescence robot as it has a zero turning circle, allowing easy manoeuvrability in the office. Though the design of the RC base turned out to be an issue when doing these kinds of turns — turning on the spot causes the tracks to have to slide sideways while turning, which sometimes causes the track to pull itself off the wheels. This is particularly problematic while driving on carpet.

The calculated Motor1 and Motor2 is simply sent to the RoboClaw driver board by the serial port protocol that they define.

The RoboClaw driver also was able to read the battery voltage, so the python script does that and sticks that onto the pub-sub as well, as well as a heartbeat message every second so that I could debug whether the control system was running or not.

Auxiliary control program

The auxiliary controls are even simpler, the joystick input can be simply mapped to the yaw/pan angle of the head gimbal, so that putting the joystick in the centre would look straight-on, pulling the joystick to the bottom left would rotate the camera to look bottom-left. Mapping this data to servo1 and servo2 position on the servo board is straightforward, and done using the serial port protocol provided by the Maestro servo board.

A small switch was added to the frontend to switch the single joystick between driving the robot, and rotating the head camera. Had I added gamepad control, it would have been possible to use twin-stick control, letting me drive and look freely at the same time.

The laser and tablet controls simply turned on or off the relays via their own serial port protocol.

The light strip was pre-programmed with a few different patterns, that could be overlaid on top of each other:

All lights off
All lights set to white for maximum night-time illumination
Left/right yellow blinkers, which flash yellow lights in the front and back corners when turning
Headlights — white lights at the front, red lights at the back like a car
Reversing — an extra white light at the back for when reversing, like a car

The auxiliary control program just had to send a single byte to the light strip control to indicate which mode was desired.

Like the main drive, the auxiliary control also emitted a heartbeat message every second so I could tell if it was still alive.

Camera stream

The camera streams were handled by simple USB camera streaming software, an open source program (I don’t recall which one unfortunately) that allows a USB camera to be streamed as an mjpeg stream on a browser. This was ideal as it was a simple case of setting up this stream, and then loading the mjpeg in the frontend from its URL. There was no need to write any code to decode streams, this could work pretty much out of the box.

Web server and Web frontend

The webserver was a straightforward Python Tornado server, serving the static files over HTTP, and allowing the control interface over websockets, receiving what were basically JSON packets. Since this was a very simple application, the server simply read incoming JSON packets, and bridges them onto the pub-sub so that the other components could read them. Simultaneously, any voltage status and heartbeats from the pub-sub were forwarded onto any connected frontends

The web frontend was in fact extremely straightforward, containing just a couple of displays for the webcam (the browser can read mjpeg streams, so these are simply HTML <img> tags pointing at the mjpeg stream UR; a javascript joystick, that simply provided an x/y value for where the stick was; and some buttons and text box for debug messages. Nothing complex here, just some vanilla HTML, CSS, and Javascript, with websockets to send/receive data in real-time.

However, the web frontend didn’t handle audio, for that I set up Skype on the tablet with its own account, and set it to auto-answer calls. In this way, I would log into the web frontend, hit the “wake tablet” link which would pulse the power to the tablet once, waking it from sleep; then I would dial the bot’s skype account, giving me a video call to the tablet.

Usage Impressions

After a few weeks of using the robot, and my colleagues becoming less freaked out by my face on a tablet slowly looming up behind them (the motors were very quiet when at low throttle), it became immediately clear that using the robot to attend meetings and standups gave me some sense “presence” in the office that was missing from regular video calls. Being able to pivot the head camera to point at specific people gave others a sense of where and who I was looking at. I would deliberately re-centre my camera on people I were talking to when there were more than one person in front of the robot, to provide these almost human-like queues.

What’s more, the gimballed head camera turned out to be an unexpected means to convey unspoken emotion — I could nod or shake my “head”; look up in exasperation, or very slowly turn to look at someone. There was a surprisingly wide range of emotion possible with a 2-axis camera at the top of the robot that a human might intuitively consider the robot’s head. For future telepresence robots, I would consider a gimballed head to be essential to convey more emotion or human connection.

Overall, the telepresence robot was a success — I gained an ability to drive up to people and start a conversation remotely, and have better visibility on whether people were available for a chat; and I could join in on “water-cooler chat” which was an aspect that was missing when working remotely. While for others, interacting with the robot, while weird at first, was a lot more natural than having to call me by video call all the time, and more natural and convenient than me setting up an always-on conference call with a TV in the office.

Improvements that I would make

It’s been 4 years since I built that robot, and in the intervening time, I’d spent some time working in the robotics industry (a job position that I may have this particular robot to thank for getting), as well as improved my programming skills. There are several things I’d change about the robot.

On the hardware front, I’m reasonably happy with the component choices. There’s not much I can do about the tank base, that component choice is a lot about the price compromise, there aren’t really good alternatives at its price point.

For the control system, today, I would have gone with an x86 mini computer rather than an ODroid, simply due to having a wider choice of software. And I would have switched out as much of the serial-port modules as I could for USB ones, to simplify the software’s serial port detection, and make them more reliable (USB-serial ports tended to have connect/disconnect issues).

On the web frontend, I would switch from a pure JS/HTML option to Vue.js, a framework I had picked up in the past few years.

On the robot stack is where I would make the most changes — I would switch from my own zmq pub-sub system to ROSpy. A lot of what I ended up implementing through fumbling in the dark turned out to be similar to some of the formal concepts implemented by ROS. Having worked with ROS, I now understand better a lot of the concepts that I had scraped the surface with with that simple robot stack I built.