CODEX

How I turned my vacuum cleaner into a semi-autonomous camera operator.

Published in

CodeX

9 min readMar 28, 2021

When the director is a drunkard, the musicians are junkies, the programmer went berserk and your camera operator is a vacuum cleaner.

Yup, no jokes around — Roomba was indeed forced to provide me with some raw video materials, but first things first. How did I even end up living like that? A couple of years ago I was making it through my darkest hour, so I needed something that I’d put all of my thoughts and energy into instead of being up to constant ill-thinking. And YouTube was found.

Never had a plan or any will to become a youtuber, but my friends been really pissed off when I’d sent yet another video of myself covering whatever band. They begged to either stop or create a channel. I’ve chosen the latter and that eventually forced me to push the quality bar, and so challenge was accepted. It was in making it live every time, no matter how many instruments are there and how shitty I am performing them. To do it solo:no one else’s involved (except probably the fair amount of alcohol). And of course to stay creative from one video to another. So at some point I reached out my creative capacities doing static pictures. Some creative motion were needed while staying on the make-it-yourself track and so the Roomba that was.

The result.

The AI approach.

Very first idea to use some AI. Why not? The idea is simple — to force the creature to follow me while I am performing. It’s not quite big deal to force the creature to move, but to make it that way..

Firstly, I did a research on how to even approach the video part. Turns out I can use OpenCV + some neural nets for that matter. Found some easy examples I can use, tried some myself and ended up being recognized as more of a cat and less of a human.

Not very reliable, but fine, works. That was my laptop’s cam, but I needed to analyze RPi’s camera stream. Let’s cover what are the camera options you have in the first order:

raspivid
raspimpeg
RPi Cam Web Interface

I ended up using the latter. Eventually it sits on top of the raspimpeg but provides you with a very nice set of controls over your camera via the very old-school looking UI delivered to you by the old-school PHP.

It might seem looking ugly for millesimals, but it works.

Unfortunately it has no API, has it? It bloody has If you’re stubborn enough! This is what cam.php has inside it:

<?php  
  header("Access-Control-Allow-Origin: *");  
  header("Content-Type: image/jpeg");   
    if (isset($_GET["pDelay"]))   
    {      
        $preview_delay = $_GET["pDelay"];   
    } else {      
        $preview_delay = 10000;   
    }   
    usleep($preview_delay);   
    readfile("/dev/shm/mjpeg/cam.jpg"); 
?>

Turns out there’s an endpoint which essentially gives you the current frame stored in /dev/shm/mjpeg/cam.jpg. Like I said before, the project sits nicely on the top of raspimpeg and this is to where it stores what’s being captured from an RPi’s cam. In a sesnse it’s a slideshow, if you will. But I don’t give a damn what it is as long as it’s reachable over HTTP. And it is! Sweet. A perfect fit.

The next thing was to change the example’s server’s code so that it would analyze my stream, instead of capturing webcam’s stream. Figured out how the example works, in a nutshell: it takes frames from whatever source and puts them into a queue from which another thread reads them and sends to a neural net which, in respect, returns you a map with labels and a probability of their possible presence in the frame. The neural net was pre-trained, so I haven’t done anything on that matter. I’ve just taken the stream class which is used to capture frames from a webcam and mocked it so that it would have the same interface, but would capture frames from my RPi’s cam via the luckily found endpoint.

class RPIStreamInput:    URL = "http://{rpi}:8083/html/cam_pic.php"    def __get_frame(self):
        res = requests.get(self.URL)
        arr = numpy.asarray(bytearray(res.content), dtype="uint8")
        frame = cv2.imdecode(arr, -1)
    
        return frame    def __init__(self):
        self.stopped = False
        self.frame = self.__get_frame()
        self.grabbed = self.frame is not None    def start(self):
        # start the thread to read frames from the video stream
        Thread(target=self.update, args=()).start()
    
        return self    def update(self):
        # keep looping infinitely until the thread is stopped
        # if the thread indicator variable is set, stop the thread
        while True:
            if self.stopped:
                return
    
        self.frame = self.__get_frame()
        self.grabbed = self.frame is not None    def read(self):
        # return the frame most recently read
        return self.frame    def stop(self):
        # indicate that the thread should be stopped
        self.stopped = True

What it does is it just calls my RPi to get the frame on demand. The rest is just a clone of what I was able to google and forced to do the right thing for me.

Now, to force the creature to do its thing means not only to be able to recognize me as a human being in the frame(not a cat!), but also to determine how far I am to make a move. Or not to make — that would depend. Here I got back to OpenCV focuses. Read some papers on the matter and found the approach. Let me pretend to be smart now, the shit is called a triangle similarity. In a few words: if you know the size of an object, you know the distance in advance, you can come up with an equasion which eventually would give you the multiplier you’d further be using in calculations.

I failed to get stable results every time I tried to count the distance to the object of known size and which the net were able to recognize. Being stubborn enough, I switched to the plan B.

I know the size of the frame. I also know where I am in the frame. So how far I am is nothing but how small I am in the frame percentage-wise. Which means nothing stops me from taking this percentage of me in the frame and if, say, it reaches some threshold — time to move.

And that eventually worked! As you can see, the more far I am — the closer the creature’s trying to get to me and vice versa.

What bothered me was that my laptop were frying itself while doing the AI stuff. I kid you not. It had a plastic holder for a locker almost fully melted down due to excessive amounts of temperature coming from the bottom of my laptop. Not sure If I’d be able to explain to my that-time-employer why the heck the corporate asset had melted down and how did that happpen? And also it was still quite hard to control, for the lack of the better term, a creativity in camera’s motion. Less fancier but more straightforward thus easy-to-use solution were needed badly.

Roomba motion.

Before we carry on the journey over my thinking process, let’s cover a bit what we’re dealing with hardware-wise. It’s a 650 Roomba, which luckily has a serial port, so that you have some controls eventually. There’s a fancy Robot2 that is there to prog it, so it has a library and a huge support. Mine was a peasant compared to the fancier mate, thus the library came from the community. And thanks to the community for that. The only annoying issue I encountered is there’s a USB input hardcode, so it leads to some issues with restart you have to perform If your connection is in a limbo somehow.

I think I know how to fix it. And probably you do know as well. And so does that gal and her lad as well. Regardless, no one has taken an action so far…

Anyhow, you can control the direction, the speed and the angle. What else do you need then? Well, to read a bunch of sensors which are of the vital importance to the mission.

The creative trajectories approach.

Once I gave in using the AI approach.. No, seriously. It was not an option — the bigger idea was to bring some joy and to heal what needs to be healed and not to multiple the suffering. So I decided to limit autonomy of the creature to certain trajectories over which it would move back and forth recording whatever is on the way.

But first, let’s figure out what are the interfaces at play. Here’s an example how to initialize the bot and do some trivial movement:

>>> import create2api
>>> bot = create2api.Create2()
/dev/ttyUSB0
opened port
Loaded config and opcodes
>>> bot.start()
>>> bot.safe()
>>> bot.drive_straight(20)
>>> bot.drive_straight(0)
>>> bot.destroy()

What happened is we created a bot, started it, initialized the safe mode (stops immediately if something goes wrong, like an obstacle found or an abyss on the way) and drive straight with a speed of 20 millimeters per second, stopped then and destroyed a bot.

Some more API’s we might need:

get_packet — reads sensors
turn_counter_clockwise — named nicely enough to not to elaborate on
drive — the more intelligent way to drive having some radius in mind

Pickup line.

Nothing is creative about straight line movement unless the movement is performed on the top surface of IKEA shelf. That’s right. I needed some nice angle that would mock human in a way. The height of the shelf was just fine. The only problem was how not to collapse. Well, from the real world operation Roomba is well-known of being aware when to stop. How? Sensors. It’s equipped with four of them: two on the sides and two in the front. How did I learn that? By having it collapsed while moving backwards — there’s no sensor in the back since in the real world it moves forward. Very good example for all of us, by the way. Anyhow, to not to have it collapsed I cannot rely on the distance calculation I did (and, again, it was a collapse). I must rely on sensors.

There’s so-called “cliff-found” event which would happen If one of the four “cliff” sensors would read the value below the threshold. I did a few rounds of experiments and found the sweet spot, when the “cliff” is almost there, but the event is not being triggered which, in respect, does not result in a stop. Then I just progged a small algorithm: it moves forward, correcting the direction if the “cliff” is either to the left or to the right and then it stops, when the “front cliff” is found, does a turn around and all over again. Side-effect of such a movement resulted in a very good shot while it does its turnaround.

This is a smaller shelf, but you get the idea.

So the code is quite simple:

import time
import create2apifrom datetime import datetimeCLIFF_LIMIT = 1550
CLIFF_LIMIT_FRONT = 2800def create_bot():
    bot = create2api.Create2()
    bot.start()
    bot.safe()
    return botdef destroy_bot(bot):
    bot.destroy()def pickup_line(timeout):
    bot = create_bot()
    start = datetime.now()
    diff = 0
    speed = 80
    while diff < timeout:
        bot.drive_straight(speed)
        time.sleep(0.01)
        bot.get_packet(100)
        if bot.sensor_state['cliff front left signal'] < CLIFF_LIMIT_FRONT or bot.sensor_state['cliff front right signal'] < CLIFF_LIMIT_FRONT:
            bot.drive_straight(-30)
            time.sleep(2)
            bot.turn_counter_clockwise(-40)
            time.sleep(6.5)
            bot.drive_straight(speed)
        elif bot.sensor_state['cliff left signal'] < CLIFF_LIMIT:
            bot.drive_straight(-25)
            time.sleep(2)
            bot.turn_counter_clockwise(-25)
            time.sleep(1)
        elif bot.sensor_state['cliff right signal'] < CLIFF_LIMIT:
            bot.drive_straight(-25)
            time.sleep(2)
            bot.turn_counter_clockwise(25)
            time.sleep(1)
        else:
            bot.drive_straight(speed)        timediff = datetime.now() - start
        diff = timediff.total_seconds()
        if diff > timeout:
            bot.drive_straight(0)
    
    destroy_bot(bot)

All the constants have been figured out empirically.

The curve.

This one is easy, but again, gives good shots. Just a curved line which is done by changing an angle over time until the thing reaches some point. In my case I just forced it to do the movement by a timeout, so the trajectory wasn't the same every time, which again mocks a human in a brilliant way.

def curve(timeout):
    bot = create_bot()
    start = datetime.now()
    speed = 100
    radius = -450
    diff = 0
    iteration_time = 26
    while diff < timeout:
        bot.drive(speed, radius)
        time.sleep(26)
        timediff = datetime.now() - start
        diff = timediff.total_seconds()
        if diff > timeout:
            bot.drive_straight(0)
            speed = -speed

As above, all the constants are of an empirical nature.

The result.

Well, it went beyond my expectations. Not only did I do what I was in mind, but I got the tool I’ll certainly use in my next videos. But as I got something that moves under my will over the ground, I am starting to think about flying objects. Stay tuned and thanks for your attention!

CODEX

How I turned my vacuum cleaner into a semi-autonomous camera operator.

The AI approach.

Roomba motion.

The creative trajectories approach.

Pickup line.

The curve.

The result.

Links used:

Written by Kirill Matyunin