Making my own Spot Mini — 2/…?

6 min readMay 7, 2019

Welcome to part 2 of my journey building an affordable, open, robot with Boston Dynamics Spot Mini-like capabilities. In my last blog, we started with movement, reverse engineering Reach Robotics’ super cool Mekamon robot and writing an API in Python that allows us to programmatically control it.

In this blog, we’re going to build our first autonomous movement for the Mekamon.

Figure 1: Boston Dynamics Spot Mini — semi autonomous navigation

Going back to our inspiration and performance target, we can see the Boston Dynamics Spot Mini creating a 3D point cloud of its environment as it walks. From the green and red point colors above it looks like the Spot Mini is classifying points into horizontal and vertical planes, respectively. It’s not just using a static map, these points are generated in real-time as the Spot Mini is moving. From the photos below, it looks like Spot Mini is using depth-sensing cameras (think Microsoft Kinect) along each side of the robot, which are generating the 3D point cloud in Figure 1.

In the navigation map above, we can see a worldview-type point cloud, where the Spot Mini appears to be following a pre-determined course (gray) and locating itself in 3D space (pink). This is called Simultaneous Location and Mapping (SLAM) and is very similar to how self-driving cars from Waymo and Uber localize themselves on a pre-existing 3D map.

So… how are we going to build this ourselves? First we need cameras to let the Mekamon perceive and navigate through its environment, and enough compute power to process that data into a 3D map. It also needs to be lightweight and inexpensive. Lightweight, because anything more than 150 or so grams is going to constrain our Mekamon robot’s movement. And it needs to be inexpensive, because my budget goal for this project is $1k.

Hardware for Mekamon’s computer vision

Our Mekamon does not have any depth sensors or cameras, as it’s designed for a human operator. Looking at options, we have a couple ways we can go to add computer vision. The most obvious is a depth sensing camera like Microsoft’s Kinect, which creatives and hackers have done some awesome work with. Technology has advanced significantly since Kinect came out in 2010 (wow, I’m old), and Intel is literally just now releasing its T265 RealSense depth camera, which is like a Kinect but is only 4x1x.25 inches, has an IMU for 6DoF, is capable running SLAM algorithms on-board, can work in any light, and has a 163 degree viewing angle. Bad@##. I just pre-ordered one, and it should ship sometime this month!

So, what else? It occurred to me that I have a powerful, lightweight Augmented Reality capable computer right in front of me, my iPhone X. With iPhone 8+, Apple released their ARKit framework. With a combination of the rear camera, an IMU, and some really awesome algorithmic work, ARKit offers many of the same features as a depth camera including SLAM and object detection. Let’s check it out!

Mekamon + phone mount + ARKit = everything you need for autonomous navigation

Computer vision with Apple’s ARKit

ARKit’s ARWorldMap is meant to power multi-user augmented reality and persistent experiences where multiple users can see the same view in an AR game or application. To learn more about how it works, I wrote a quick ARKit-based application for my iPhone that activates a world tracking AR session with "rawFeaturePoints” enabled, and periodically runs getCurrentWorldMap(completionHandler:) iterating through each of the ARPointCloud feature points and streaming to my computer over Bluetooth LE.

Raw feature point cloud streaming from Apple’s ARKit ARWorldMap

The CSV file above is an export of theARPointCloud from my iOS ARKit application in 3D space (x,y,z). I thought it would be cool to sample the colors that my camera was seeing at each point, which are stored as in red-green-blue format as (r,g,b). In practice, I’ve found it more useful to color the point cloud as a color gradient based on the height of the points in the ARWorldMap.

First, load the data

To visualize the point cloud data being streamed from the ARKit sample application on my iPhone to my computer, I wrote a browser-based visualization. I had been meaning to catch up on what’s new with JavaScript visualization libraries, including the excellent D3.js and Three.js libraries. Here’s is the D3.js code to asynchronously load data from the CSVs to your browser.

Putting our data to good GPUs

We’ll use the excellent Three.js to visualize our point cloud data. As each point has its own attributes such as position and color in 3D space, with potentially hundreds of thousands of points being rendered at the same time, we’ll need a really efficient data model with GPU support. For speed, we’ll use BufferGeometry instead of the typical Geometry functions, which includes GPU support and is super fast.

Mapping my upstairs hallway

For the test visualization below, I asked one of my girls to walk down the upstairs hallway with my iPhone and iOS test application running ARKit, which streams data back to my computer over Bluetooth that I’m visualizing in the browser. Super happy with the results. In the future, we can use maps like this stored in the cloud to both help our robot place itself in the environment, and also to avoid objects in its way.

Finding walls and floors

Apple’s ARKit has some really helpful features built in that we’ll leverage to detect walls, floors, and obstacles. The first is planeDetection, where ARKit analyzes the video stream to classify points from the video stream into horizontal and vertical surfaces that our robot could walk on or run into, and the second is a ARHitTestResult, a helper in ARKit that tells you the distance to the closest point or plane for given pixels in your video feed.

Below is a top-down visualization of a room I scanned with ARKit’s plane detection enabled. Even with a quick mapping session, ARKit is pretty good at finding horizontal surfaces and walls. We can clearly see the floor, two desks, some walls, and even my computer monitor. However, we can also see that ARKit had a difficult time detecting the walls in parts of of my office, which have a uniform color and not a lot of visual patterns to create anchors.

Top down view — Using ARKit’s plane detection to find walls and floors

Navigating through 3D space

To navigate through environments, we need to map horizontal surfaces that the robot can walk on (check), and objects and vertical surfaces to avoid contact with (mostly check, as ARKit had trouble finding walls sometimes), and finally localize the robot in the 3D space we’re mapping. Recording where we’ve been might be useful in the future, so I added a new data stream below to record paths that we have successfully traveled (rendered in white below).

Simultaneous Location and Mapping (SLAM) walking through my house

Click here to explore this 3D map using Three.js and D3.js.

Putting it all together

Now for some autonomous navigation, powered by ARKit! To build out basic autonomous navigation for Mekamon, I added a stream to my iPhone app to send ARHitTestResult distances sampled from the iPhone’s front camera using ARKit to a Raspberry Pi. On the Pi, a script processes the results and uses the Python API we reverse engineered in the last blog to continuously walk the Mekamon robot towards the most open part of the hallway. Now our Mekamon’s got a mind of its own!

Next steps

In my next blog, we’ll use the Mekamon to help out with a practical use case around my home.

Check out the API and point cloud visualization on Github! iPhone app and autonomous control code coming soon. https://github.com/zredlined/control-my-mekamon/tree/master/pointcloud_visualization