Adventures in Amateur Motion-Capture

A few years ago, a friend and I embarked on a project that combined the technology of an Xbox Kinect with the power of Processing (a programming language geared towards designers an artist). Our goal was to record a dancer’s movement with the Kinect, and use that data to generate dynamic art via Processing. We could trace the dancer’s movement. We could change color based on the body’s trajectory or speed. We thought the possibilities were endless.

Unfortunately, this was before I knew anything about software, and my friend was only starting his coding journey. We spent a few hours yelling at a Kinect, and the idea faded into memory.

Fast forward a few years, and I found myself in the middle of senior phase at Fullstack Academy, with three days to make a project… on my own… from scratch. It sounded like the perfect opportunity to give this another try.

Little did I know that the Xbox Kinect’s popularity has waned since 2015. Though Open Kinect is still doing good work, there are fewer open-source libraries available to programmers, and any Mac that runs Sierra can’t connect to a Kinect without tearing down a lot of firewalls. I say this partly as a warning for other developers who might want to use a Kinect, and also as a very long set up for what this blog post is actually about…

DIY Motion-Tracking

I didn’t have a Kinect to give me fancy depth data and blob recognition, but I still had a camera on my computer and a loose understanding of p5.js, so I decided to make do with that.

What is a video feed but a series of repeated images? And what is an image, but a long array of information about the pixels that make up that image? (The image file actually also includes metadata about the image itself, with values for image width and padding that help programs render the image correctly, but we’ll leave those for another discussion).

I used p5.js to access my computer’s camera and render it’s video feed on a canvas in my main html file.

The setup function creates a canvas, accesses my computer’s camera with createCapture('VIDEO'), defines its size, and then hides the actual capture feed. The draw function creates an image from that captured video and prints it to the screen (this is why we hid the original capture).

At this point, I was showing the video from my computer’s camera in an html document. Cool! But it was only step one.

The next step was to store a set of pixels so that I could compare all future frames against a “background” image. p5.js’s loadPixels() method pulls the pixel data from an image being displayed into an array that can be manipulated like any other javascript array.

A user’s mouse click could generate that first set of pixels. Calling loadPixels() gave me access to the pixels as an array. I stored it in a variable for later use.

So now I had a way of capturing a single frame of pixels to compare motion against later. But I had to decide what actually determined motion.

I added another loadPixels() to my draw function, and stored that in a new variable, creating a new array of the image’s pixels for each frame of video, and storing that in movingPixels.

That new draw function meant I could compare each new frame’s pixel array to my original array. Comparing each movingPixels array to my firstPixels array would tell me what was in the new frame that hadn’t been there before.

Sounds like an excellent time for a nested for-loop.

I looped over both sets of pixels at once, using one loop for for the image’s width and one for the image’s height. I can access the actual index of the array with var index = (x + y * p.width)*4; . That multiple of four is because each unit of information (pixel) actually exists in four pieces — red, green, blue, and opacity. I can access each individual value by adding 0, 1, 2, and 3. I could also have simply looped through the pixels and dealt with each pixel in its entirety, but I wanted to attempt this conventional way of looping through images.

I also jumped by fours in the for loop itself, instead of just iterating pixel by pixel. This was an optimization. The program comparing fewer pixels from my two images each time it rendered, which helped speed up a very slow process.

The next step was to compare the r, g, and b values for each pixel in the new image (movingPixels) to the values of each corresponding pixel in the original image (firstPixels). I opted not to compare the opacity level. It seemed to just add extra noise.

Each time a moving pixel was different on all three of those scales, I drew an ellipse. Each ellipse is actually an instance of a Tracker class I created elsewhere, which allowed me to manipulate the ellipses with class methods and define the color of the ellipses when I drew them.

The end result of this was a white screen that would display black pixels any time something had moved in front of the background. A person walking across the camera’s view could turn the white screen entirely black! That was cool, but not quite what I wanted.

The final piece of the puzzle was to give a white background to the canvas rendering movement (so that it would persist from frame to frame), and to add a lifespan to the ellipses being rendered. That would allow the ellipses to fade over time, creating a slowly fading shadow of the movement that had occurred in the frame.

I have a video recording of this presentation, that I hope to add to this blog post. That presentation shows a few more features I added and shows how it looks in action!

This was an exciting way for me to combine dance and movement with software, and I think there are a lot of possibilities for this project in the future. It could be used to add dimension to a dance performance. It could be used to create interesting dance film. It could even be used to create and interactive movement tracker that allows people who aren’t in the same room to interact on a screen. It’s not quite the goal I set out with, but I’m pretty proud of it.