How I built a robot in my studio during WFH — under $300
Some tips on math, code, and hardware, distilled from a few months of work during a summer.
To go straight to the robot building part feel free to skip the first two paragraphs, at the cost of never knowing about a short period in the life of someone who, for most of you, is just another stranger on the web.
I started my PhD in 2020, right when the world was discovering what moving entire industries to Work(ing) From Home (WFH) looked like. My university was no different, and I was informed that for the foreseeable future I was supposed to stay at home. Not a terrible deal, given that the majority of documentation from the university where my name appeared briefly described the topic of my PhD as “Computer Science”, a field that had been associated with terms like distributed, remote, connection, cloud, video call and email, and whose most influential people where often depicted on a laptop sitting in some random, non office-like place, supposedly changing the world.
A more accurate description of the topic of my PhD would however recite “Robot Learning”, with half of those words actually linked to one of the most hardware-focused fields of Computer Science. Not having access to those robots at all was more of a Big Deal than in other fields, given that I had spent the previous years working at different labs that gave me the opportunity to see my ideas actually running on those machines. Most of the people in the field, if unable to interact with those physical robots, moved more or less willingly to physical simulations, where robots and environments could be replicated as a long list of lagrangian-newtonian-eulerian physical formulas and efficient collision-detection algorithms. However, while I was working on writing the first paper of my PhD, I really wanted to see those lines of code controlling a physical robot. Relatively fresh out of an MSc in AI and Robotics, I thought it would have been a nice occasion to bring a series of notions I learned about how to model and control robots to real life and, as Feynman used to say, “what I cannot create, I do not understand”. Last but not least, I stumbled upon a robot that was both within my budget and also Really Orange (a chromatic decision that comes as less surprising considering it is manufactured and designed by an italian company, Arduino, that is borrowing from a long history of colorful and playful italian design, whose visual vocabulary was then of strong inspiration for a series of companies that wanted to differentiate themself from the clear Braunian teachings of other tech companies, a vocabulary that is now Cool Again. I will spare you from another two paragraphs about how the small little orange tab inside of the Apple Vision Pro headband is absolutely not casual.)
Anyway, most of you are here for a tutorial (this is not even a tutorial per se but a write-up of something I did years ago that may or may not be useful to some of you) with clear numbered steps so here we go, number 1, buy or build a robot.
1) The robot
I decided to purchase a robot that could be relatively easily assembled not to overcomplicate my experience, whose chances of success were already not enormous, considering I was also running against the clock as the deadline for a conference where to submit a paper approached. This particular robot, called Arduino Braccio (arm in italian) can be purchased for under $300. I have no ties with the company and frankly do not care too much if you decide to buy another model. However, a few of the things I will explain later relate to this particular robot structure. In particular, one thing you should pay attention to is the number of motors your robot is composed of, and their geometrical arangment. They relate to the degrees of freedom (DoFs) of your robot, and represent the kind of poses your robot will be able to reach. Six or more DoFs generally allow robot arms to reach arbitrary poses in space, as 3 are used for positions and 3 for angles. This particular robot has 5 DoFs (I do not consider the gripper’s DoF), which will not allow it to reach any desired pose, but still provide good capabilities to handle objects on a flat surface in front of it. Therefore, I suggest you to go for a similarly actuated robot.
The robot is then controlled by an Arduino microcontroller connected to another “shield”, another board that can be connected to the former, specialised in the control of the motors.
The robot is controlled via code that runs on the Arduino Uno board. It can be programmed through a minimal UI, in a C like language. You can connect and communicate with it via USB, and in turn it is connected to the motors via cables and pins, and can set the desired angle for each motor to go to by modulating voltage signals. If you arrive at this point, you will realise that controlling a robot by setting manual angles for 5 servomotors is not the most intuitive way of tackling tasks. Realistically, what you would like to communicate is “go at these coordinates, at this angle”. To do this we need some math.
2) The math
A robot can be seen as a chain of motors linked together by rigid bodies. The majority of robots are composed entirely of revolute/rotational joints. While the effective controls we have are the absolute angles or angular velocities of these joints, we are often interested in cartesian poses of the gripper, or end-effector, or other parts of the robot. To go from the list of angles to a pose in space of the gripper, we need to concatenate a series of algebraic homogeneous transformation. This part can be referred to as modelling the robot, that is finding the series of transformations that model its movement in space. Doing so for any robot is a quite straightforward process, following the Denavit-Hartenberg (DH) convention. What we need to do is to simply measure the distances between the motors, once the robot is assembled.
An excellent guide to this process is provided in the Robotics: Modelling, Planning and Control (Siciliano et al.) textbook. What is often more useful however is the inverse: given a goal pose of the gripper, finding the joint angles that would allow the robot to reach that. While the former problem is straightforward and always computable, the latter, inverse problem is quite more challenging.
There are two possible solutions: the first is having the gripper always “look down”, a natural pose to grasp objects, and therefore reducing the gripper’s degrees of freedom to 4 (3 for xyz position, one for angle around z). We will now see a mathematical solution for this scenario. We will model the robot as a 3-joints “anthropomorphic” arm, as seen below. If you are uninterested in the mathematical steps, you can scroll for a bit an will find a code snippet to compute the angles, snippet that wraps up the computations I am about to show.
I will borrow some parts from the aforementioned textbook. In the following, p_Wx, p_Wy and p_Wz refer to the desired coordinates of the wrist of the arm, i.e. the part at the end of the third link, as displayed above. In the previous image of the Braccio robot with the two metric measurements, it is the part where the second measurement ends. The values a_2 and a_3 refers to the lenght of the links represented above. With s_x or c_x the book refers to sin(x) and cos(x), with x being the relative joint angle. Furthermore, c_xy refers to cos(x + y).
Forward kinematics gives us the following way to find the coordinates of the wrist given the angles of the joints. As we want to solve the inverse problem, we will start from here to derive a series of equations.
The value of c3 can be found as follows, given we have all the other desired or measured values.
The relative sine is straightforward to find, and has two possible solutions.
We can therefore find the value of the third joint angle as follows.
To find the values of the sine and cosine of the angle of the second joint, the following equations can be used.
That allows us to find the angle of the second joint
Notice how the above choices of signs for s2 and c2 will lead to a series of different angles. The following figure illustrates what these different solutions, given the same desired position of the wrist, would look like.
Finally, the angle of the first joint can be found as follows.
The above described a series of steps and equations to find, given a desired position of the wrist, the angles for the first three joints of the robot. We are however interested in interacting with objects using the gripper. Given the distance between the wrist and the tip of the gripper, assuming the gripper will be facing downwards, we need to add this distance to p_Wz, as the wrist should end up above the desired gripper position. The 4th gripper angle will make sure that the gripper is indeed aligned with the z axis, and facing downwards:
Finally, the fifth angle can be chosen arbitrarily, based on the desired rotation around the world z-axis of the gripper. You must however take into account the angle of the first axis, too, that will also rotate the gripper around such axis.
Here’s a compact representation of the above as a Python function.
a2, a3 = 13., 13.
def ik3(x, y, z):
c3 = (x**2 + y**2 + z**2 — a2**2 — a3**2) / (2*a2*a3)
s3 = -np.sqrt(1 — c3**3)
theta3 = np.arctan2(s3, c3)
c2 = ( +np.sqrt(x**2 + y**2)*(a2 + a3*c3) + z*a3*s3 ) / (a2**2 + a3**2 + 2*a2*a3*c3)
s2 = (z*(a2 + a3*c3) — np.sqrt(x**2 + y**2) *a3*s3) / (a2**2 + a3**2 + 2*a2*a3*c3)
theta2 = np.arctan2(s2, c2)
theta1= np.arctan2(y, x)
return 90 — theta1*360/(2*np.pi), theta2*360/(2*np.pi), 90 — theta3*360/(2*np.pi)
The above analysis is valid if we want the robot to have it gripper facing downwards, as mentioned before. This is a useful setting that is often adopted in robotics scenarios. However, we can generalise the analysis to a case where we want the gripper to reach a larger set of possible poses. As we discussed, the robot is inherently limited in the range of poses it can reach by its degrees of freedom.
Given a desired position of the gripper we want to reach, pGx, pGy, pGz, we can now further select a desired wrist “pitch” angle. Let’s call this theta_d (measured in the world frame). We can also select a desired wrist “roll” angle, in the gripper frame, the will be directly angle of joint 5. Let’s also call the distance between the wrist axis of rotation and the gripper tip d_wg. After setting these two joint angles directly, we need to find the value of the remaining 3 joints. We can reuse the steps above, but we need to compute pWx, pWy, pWz from pGx, pGy, pGz. We can do so following the equations below.
With these values, we can now compute the entire list of joints we need, setting theta_5 as defined above and theta_4 as theta_d — (theta_2 + theta_3). Notice, for configurations outside of the robot’s workspace, these solutions might not exist.
This was a quick tour of the math behind inverse kinematics, a tool that can be particularly useful when there’s the need to control a robot in a cartesian space, and not in the manifold of joints angles. We will now dive a bit more into additional hardware you might want to use to give your robot vision.
3) The camera
Without vision, a robot can only repeat movements blindly. For this project, I chose a very tiny and inexpensive camera, the ESP£
I glued the actual sensor to the body so it would not move. This camera needs an FTDI adapter to be setup the first time, and you can find an easy guide here. After that, the camera can connect to your local Wi-Fi, and can stream data wirelessly. This is particular useful to avoid additional wiring on the robot. However, the camera will need a source of power. You can buy a few lithium-ion batteries and attach them to the camera.
I attached the camera to the wrist of the robot, so that it could closely see the objects the robot would interact with. Furthermore, I suggest using a fisheye lens, that increases the field of view of the camera substantially, albeit introducing strong distortions especially at the borders.
Once the camera is ready, you can either access a livestream by connecting to its IP with your browser, or request images with a few lines of Python.
import cv2
from PIL import Image
import requests
from io import BytesIO
import numpy as np
url='http://172.20.10.8/cam-hi.jpg'
while True:
response = requests.get(url)
img = Image.open(BytesIO(response.content))
img = np.array(img)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
cv2.imshow('test',img)
if ord('q')==cv2.waitKey(10):
exit(0)
If everything worked correctly, you should see something like this.
4) The control
We have all the ingredients to now actually control our robot. For this particular model, I loaded the official Braccio library on the Arduino, but you can use any library or code for your particular robot, assuming it will receive joint commands from some other process and send it to the motors.
A very easy way to control the robot without any additional hardware needed is to use a computer keyboard, similarly to a videogame. In this case, the PyGame library provides the tools to read key strokes in loop and convert them to numerical value, so that we can send them over USB to the microcontroller. Given the above ik3 function, the following code reads keyboard commands, translated them into poses by accumulating deltas, and sends the joints to the Arduino.
import pygame, time
from pygame.locals import *
import serial
import struct
import time
import numpy as np
from cv2_ip_cam import *
a2, a3 = 13., 13.
def rot(x, y, theta):
s, c = np.sin(theta), np.cos(theta)
x = x*c + y*(-s)
y = x*s + y*s
return x, y
arduino = serial.Serial(“/dev/ttyACM0”, 9600, timeout = 0)
keys_to_act_x = {113: 1, 97: -1, 119 : 0, 115 : 0, 114:0 , 102 : 0, 104 : 0, 101: 0, 100 : 0}
keys_to_act_y = {119: 1, 115: -1, 113 : 0, 97 : 0, 114 : 0 , 102 : 0, 104 : 0, 101: 0, 100 : 0}
keys_to_act_w = {113: 0, 97: 0, 119 : 0, 115 : 0, 114: 10, 102 : -10, 104 : 0, 101: 0, 100 : 0}
keys_to_act_z = {113: 0, 97: 0, 119 : 0, 115 : 0, 114: 0, 102 : 0, 104 : 0, 101: 1, 100 : -1}
pygame.init()
screen = pygame.display.set_mode((300, 300))
pygame.display.set_caption(‘Pygame Keyboard Test’)
pygame.mouse.set_visible(0)
x, y, z = 14, 0, 12
oldx, oldy, oldz = 13, 0, 13
hClose = False
th_w = 90
actions = 0
while True:
for event in pygame.event.get():
if (event.type == KEYDOWN):
actions += 1
if event.key == 104:
hClose = not hClose
th_g = 150 if hClose else 20
dx = keys_to_act_x[event.key]/2
dy = keys_to_act_y[event.key]/2
th_w+=keys_to_act_w[event.key]
z+=keys_to_act_z[event.key]/2
dx, dy = rot(dx, dy, th_w — 90)
x, y = x+dx, y+dy
th1, th2, th3 = ik3(x,y,z)
if np.isnan(th1) or np.isnan(th2):
x, y, z = oldx, oldy, oldz
th1, th2, th3 = ik3(x,y,z)
print(“Out of values!”)
th1, th2, th3 = int(th1), int(th2), int(th3)
if th1 <= 0 or th2 <= 0 or th3 <= 0:
x, y, z = oldx, oldy, oldz
th1, th2, th3 = ik3(x,y,z)
print(“Out of values!”)
if th1 < 0: th1 = 0
if th2 < 0: th2 = 0
if th3 < 0: th3 = 0
if th1 >= 181 or th2 >= 181 or th3 >= 181:
x, y, z = oldx, oldy, oldz
th1, th2, th3 = ik3(x,y,z)
print(“Out of values!”)
th4 = 90 + 360 — (th2 + th3)
if th4 <= 0: th4 = 0
th1, th2, th3, th4 = int(th1), int(th2), int(th3), int(th4)
arduino.write(struct.pack(‘>BBBBBBB’,255, th1,th2,th3,th4,th_w,th_g))
oldx, oldy, oldz = x, y, z
time.sleep(0.3)
im = get_picture()
cv2.imshow(“im”, im)
cv2.waitKey(1)
Controlling the robot manually can be a good way of providing training examples to have it learn new skills. When working on this robot, I wrote a paper (you might now recognize the robot) on using self-collected data to learn skills with little human intervention, that was eventually published at the Conference of Robot Learning (CoRL) 2021. By letting that method run for a bit, the robot could learn servoing-like behaviours like the following, from raw pixel and without the need to calibrate the camera.
Conclusion
Looking back at it now I realise there’s not a lot of structure in this blogpost, but I really wanted somewhere to dump the information that I gathered and discovered in a lonely spring/summer of 2021. I assumed someone could find this remotely useful, even just as a way of understanding that this effort is, if I was able to pull it off, clearly easier than it looks like. It also made me realise how incredible it is, once you are building something, that you can just keep ordering stuff online and they will deliver it to your door the day after. I was very close from trying to set up a solar grid system. I even bought an XR headset and tried to use multiple cameras to emulate a Vision Pro style passthrough, but this is not the right place to talk about it.
Finally, I wanted to write this as a sort of time capsule for a particular time in my life in a different home in a different city in a different country. Things change. Things end. Things end to make room for other things. The other things are not always better. But they are new, and they become the present, and that’s where everything is.
Anyway, I hope you found this useful, or entertaining in any measure. You can get in touch on X @normandipalo.