Using Microsoft Cognitive Services to create baby bump timelapses

About 9 months ago, my wife Iris suggested that we create a timelapse of her baby bump.

This was the appropriate time to select one of the many apps in the App Store for this purpose, purchase it, and use it over the course of her pregnancy.

This is not what I did.

Instead, every couple of days over those nine months one of us would remember that we were making the timelapse and I would say, “Okay, stand this way”, quickly take the photo, upload it to a shared album, and we would move on with our lives.

65 baby bump photos and one baby later…

I made myself a cup of coffee and said, “Okay, let’s put this all together.” and started going through the photos.

It turns out that arbitrarily telling someone to stand a certain way over the course of a year at different locations and angles does not produce consistent results.

Everyday app guide lines

As I was going through the photos, I remembered that when I had used apps like Everyday in the past, they had overlaid guide lines onto the camera so you could line up the same features over and over again.

I now see why that would be important.

Almost a year of using Everyday

But surely someone else has had this problem before, I thought. Googling for “align time lapse”, I found a bunch of people discussing the auto-align features in Photoshop and Hugin, which apparently work very well if you take a bunch of photos from the same spot, but not so well with my batch of photos taken all over the world.

So I tried loading all the photos into Photoshop as layers (File -> Scripts -> Load Files into Stack…), which worked, but made my computer slow to a crawl. Next I tried a smaller batch of 5 photos, one on each layer. I chose the first photo as the reference image and started scaling the other layers and overlaying them with different opacity on top of the first layer. After doing this on exactly 3 photos, I gave up. It was slow, frustrating, and error-prone.

Then I had an idea. What if I ran all the photos through a facial detection algorithm, which has been a solved problem in computer science for over a decade now. So I installed OpenCV and some node bindings and ran one of my images through the face detection.

OpenCV Face Detection using Stump-based 24x24 discrete adaboost frontal face detector

Umm, sort of? I started writing a sorting algorithm that picked the square closest to the top middle because that’s where Iris’s face is in most of the photos but decided that was too inaccurate. Then I thought, what if I made a web page where I just clicked on the correct square and recorded the data, as it shouldn’t take that long to cycle through 65 images. But I figured that writing the frontend for that would take as long as finding a better facial detection algorithm.

After more googling, I discovered Face API, which is part of Microsoft Cognitive Services. I ran the same image through their demo and was impressed with the results:

So I signed up for a trial which included $200 in Azure credits, which by my calculations means that I could have detected faces in 133,333 photos, but I only needed to do 65.

I ran the batch of images through their API, and it correctly detected faces with bounding box rectangles for 62 of them and I manually drew the boxes for the remaining 3.

Using this dataset, I picked a baseline size of the amount of the image I wanted the face to represent (150 x 150 pixels), and then resized each image so that the face rectangle matched that size.

For this image, which Face API correctly detected the face boundary, and the face square is 114 x 114. The image itself is 1536 x 2049.

So we can determine a scaling factor of (114 / 150 = 0.76) and increase the size of the image by that factor (1536 / 0.76 = 2021, 2049 / 0.76 = 2696), and the image is now 2021 x 2696.

Next we pick a fixed position where we want all the faces to be, let’s say it’s at 500, 500.

Then we simply extend the canvas so it’s larger than the current size and move the image so that the face lies in that position.

We do this for each image in the series, and crop them all so that most of the images fill the entire space.

Here’s some code that uses GraphicsMagick to do what I just described:

From there it’s just a matter of using ffmpeg to generate an animated GIF with an optimized palette from the cropped frames.

Finally I ran the resulting GIF through ImageMagick to add a short delay at the end before it loops again:

And that’s how I made the GIF at the top of this article and hopefully improved the results for the search term “align time lapse”.