We could probably make BB-8 for real

If you’ve seen the latest Star Wars and you’re anything like me and millions of others, you’ve fallen in love with its newest robot character named BB-8. It rolls around on a spherical base and has a camera unit on top. It can’t speak English, but can convey quite a lot in beep and boops. And it can really hold its own! (Fear not, I’ll stay away from any and all spoilers.)

The latest Star Wars robot: BB-8 (by Sphero)

I was surprised to learn that there are already several BB-8 toys available for purchase, from Sphero ($149.99), Hasbro ($79.99), and others. Surprised because the mechanics of getting the base to roll around properly seemed like it would be complex, but it turns out a relatively obscure robotics company had properly solved this problem a while ago: Sphero. They were able to re-purpose the technology they’d developed for an older robot to use for the base, add the head camera unit, and quickly launch a compelling BB-8 toy to the market.

Using an app or a controller, you can direct the toy’s movements and roll it around using joysticks like you would a drone or a remote controlled car. It’s really quite a cool (if expensive) toy for any kid, and apparently the kids agree — it sold out right away. Between you and me I’ll be honest: I want one too. But compelling as it is, and as well as they’ve gotten the mechanics to work, it’s not really BB-8. It’s you. When you move your fingers on the controller, the BB-8 toy moves in that direction. In the movie though, BB-8 is autonomous; it’s in control of its own movement and actions. What would it take to bridge that gap, to make a real BB-8?

I’ve been doing a lot with virtual reality (VR) lately, and that has inspired a fascination in all things computer vision. It’s being used for some truly incredible things, one of which is light field reconstruction so you can change the perspective or focus of your pictures and videos. But there’s also lot of interest in using computer vision to infer the position of objects in the world. By analyzing the camera’s video frame-by-frame, finding recognizable things across frames and tracking their movements over time, we can infer not just its position, but also the structure, geometry and color information of the world around it. This is Simultaneous Location and Mapping (SLAM).

Simultaneous Location and Mapping on a mobile phone

Turns out that it’s not just the VR industry that’s researching this stuff. Self-driving cars also need to understand and map the world around it. Drones need to be able to detect and avoid obstacles in their way, and to basically see where the ground is. Point being, millions of dollars from across several industries are pouring into computer vision research in order to better solve SLAM and surrounding problems. Even today we can map a room quickly on relatively cheap mobile hardware and off-the-shelf sensors, and it’s only going to get faster and cheaper from here.

Ok, so we have Sphero’s BB-8 mechanics working well, and now we know that mobile hardware and cheap sensors can be added to map a room and understand its location. What else would we need to make BB-8 real? It’ll need a microphone so it can listen when people talk to it, a speaker so it can communicate. These are no problem to find for cheap these days.

In the past, saying things like “listen when people talk to it” would fall under the category of wishful thinking. Yet again though, thanks to millions of dollars from the industry’s largest companies like Apple and Google, and helped by the breakthrough of some new machine learning techniques, voice recognition algorithms are now very effective and can be done on mobile hardware. Better still, all those highly tuned algorithms, produced by all those millions of dollars of research, have been exposed for developers to use — completely free of charge. Voice synthesis is improving too, but BB-8 can’t really talk, so we get to punt on that one.

Amazon Echo does a great job listening and responding to voice commands

With the base, the vision sensors, the mic, speaker, and mobile hardware, the speech recognition software, and the environment mapping algorithms, now what’s left is to pull all this together with some control software. Again we’re saved by existing stuff: Robot Operating System (ROS) has been building a platform that ties all of this robot hardware together and exposes its functionality to software developers. Now it’s up to the control software to load up on ROS, look at all that sensor data, and decide how to act.

Now there’s the snag. We can’t yet simulate a real consciousness, and that’s definitely a good thing. If we could, we’d have to worry about all kinds of ethics questions to which we don’t have clear answers. Luckily we don’t need to — we humans have this optimistic tendency to see life in lifeless things, we even name our storms like ourselves. If we can just give BB-8 autonomy and the illusion of emotions, our brains will happily fill in the rest. It’s clear this software will take some effort to develop, but it doesn’t seem like anything intractable.

What kinds of things could you do with a robot BB-8? It could follow you around when it’s bored, or run away from you when it’s mad. Maybe it rolls in circles around you when it’s feeling hyper or rolls over your toes. It’d be fun to play a game where you kick it away from you and it rolls back, like a variant of fetch with a dog. It knows the room so can also just wander around on its own. Maybe BB-8 dances when you play music! Of course when it’s tired, it should roll on over to its wireless charging station and go to sleep. Where it really gets crazy (and lucrative) is when you think of how BB-8s could interact together with each other.

The point I’m trying to make here is that this stuff isn’t science fiction, it’s today’s computer science reality. We could build BB-8 now if we wanted to, and while it’d be expensive today, within months or years that expense will drop enough for it to be feasible to commercialize. If my hunch about this is right, then there’s enough money to be made that once it can be done, it will be done. If so, we’re at the beginning of a new era not just for toys, but for our relationship with technology.