Our First Dance

We got married! And then geeked out :)

About a year ago, we got engaged and started planning our wedding. As part of the planning process we were going through the typical traditional wedding agenda and choosing which things we wanted to do — a toast (yup); bouquet toss (nope); a wedding party (yup). We weren’t sure at first if we wanted to do a first dance, but then we had an idea:

A first dance is a chance to celebrate something that reflects our relationship for the first time as a married couple.

So we decided instead of a dance. we would share a first dance project instead. We know it would be a little cringy, but then again so is a first dance if you can’t dance.

Ok, now let’s get technical.

Diagram of a neural network. Image from Medium

A neural network is an artificial intelligence algorithm that allows you to teach your computer how to do things based on a bunch of examples. A classic example, shown above, is “teaching” (or as it’s more commonly called “training”) a neural network how to distinguish a picture of a dog from a picture of a cat. This is done by feeding a bunch of cat and dog pictures into the network, telling it which is which, and allowing it to learn which features of a picture correspond to which animal. There are lots of great descriptions of how neural networks actually work available online, depending on your level of mathematical background.

Rather than having a first dance, our plan was to train a neural network to dance for us!

Plan A: Choreography

Our wedding was on March 2, 2019. Around Christmas of 2018, we decided that we really needed to work on this project in order to have it done in time. So, we sat down at Starbucks and mapped out our first attempt.

Based on a recent paper, we wanted to use a recurrent neural network (RNN) to learn to choreograph a dance for the two of us. The idea here is that the RNN is fed many images of a choreographed dance. It is trained by learning to predict the next frame of the dance based on the previous frame (hence the name “recurrent”).

A diagram of our first attempt. We extract stick figures from professional dancers, and train a neural network to learn to “choreograph” a new dance.

We grabbed some data from a wonderful “cha-cha” training video and extracted stick figures using a pre-trained neural network. We trained it for ~1000 iterations using our laptops (about ~10 minutes). Below is our one of our favorite results:

First attempt at “choreographing” our first dance…Oops

Obviously, this method didn’t work according to plan. But we didn’t give up!

Plan B: Steal Code!

At this point, it was February and our wedding was coming up fast. So we did what we know best: stealing better code from other people.

Luckily, a recent paper that garnered a lot of media attention had done a very similar project to what we wanted. “Everybody Dance Now” uses a special neural network called a generative adversarial network (GAN), which basically pits two neural networks against one another: one which generates fake images, and one which learns to identify real vs simulated images.

Example video from Everybody Dance Now. Note that the woman on the right is not actually dancing! A neural network learned how to make it look like she is dancing just like the source video in the upper left.

The idea is actually pretty simple. We (well, the people we stole the code from) train one neural network system to take an image of a person and extract a human figure from it. They then train a different neural network system to do the opposite (translate a human figure into a real image). So we can extract a human figure from a video of a professional dancer with the first neural network, and we make the other neural network turn these stick figures into videos of us dancing well!

Their paper is very snazzy and has some fun bells and whistles to make their dance videos really smooth. Unfortunately, their code wasn’t public. However, someone released a simplified version (based on the Pytorch framework) to github. This code was built using two really awesome opensource projects: Realtime Multi-Person Pose Estimation (which does the figure extraction) and pix2pixHD (which translates one images to another).

So we were in business!

Well, almost. There was one small hangup. We could no longer use our laptops to train the code, because we did not have accessible graphics processing units (GPUs) on them. Luckily, Ashley has access to Harvard’s supercomputer known as Odyssey. Odyssey has over 78,000 cores and 40 Petabytes (or 40 million GBs) of storage — and lots of GPUs to spare! We honestly don’t remember how many computational hours we used on Odyssey — but we estimate something like 24+ hours.

The Odyssey cluster at Harvard. Very fancy :)

Below is the proof of concept result, using the great Napoleon Dynamite. The top left corner is the “Source Video”, i.e. the dance moves we want to transfer to ourselves. The top right corner is the “Pose Estimation” which is similar to the limb extraction we did in our first attempt. The bottom row is video output of Alex (on the left) and Ashley (on the right). You can see that it’s not ~quite~ as crisp as the “Everybody Dance Now”, but we’re pretty happy given that it actually ran, that we didn’t spend a thesis amount of time doing it, and that we had just learned how to use GPUs.

Proof of concept! Upper left: source video; Upper right: extracted pose; Bottom Row: Alex + Ashley

So without further ado, here is our first dance (set to music):

Our first dance video! Song is “Fresh Eyes”

Ashley Villar & Alex McCarthy

Written by

We apparently only write about ghosts and aliens…statistically.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade