How does Prisma work?

Prisma is a cool mobile app that you can use to apply crazy filters to your photos. For example:

It works by applying a machine learning principle call “convolutional neural networks” (CNN). This Quora post goes into more detail.

I’ve also heard this be referred to as “neural style”. At DevFestDC one of the talks showed this awesome neural-style GitHub project. It shows how you can apply the style from any picture, and apply it to another.

For example, let’s take van Gogh’s “Starry Night”:

…and apply it to a random photo of my kid:

…which gives this:

It looks like Prisma probably use this kind of technology. They’ve probably tested a lots of style images and picked their favorites. Unless they’ve done something hardcore to optimize it, I’m assuming they do the processing in the cloud on a high spec machine with at least one Graphics Processing Unit (GPU), since GPUs will make this run much, much faster.

More detail please

It took about 2 hours to get all the dependencies for the neural-style GitHub project installed on my Mac OS X Yosemite laptop. It was surprisingly painful to get everything set up. One tip: if you’re going to use Pip to install everything, make sure you are running Pip v1.5.4, or just update to the newest to play it safe. I had to carry out a couple of undocumented steps to get TensorFlow working, and it might have been due to using an old Pip version.

I then ran the script for 1000 iterations, which took about 10 hours on my Mid 2013 Macbook Air (no GPU) — more discussion on performance is below.

I set it to take a checkpoint image every 5 iterations so I could see how it progressed. It starts with random noise, and gradually reveals the desired image.

Step by step

Let’s take a look at some of the individual frames.

Initially, it’s all noise:

After 30 iterations it’s still mostly noise, but you can just about see her eyes and outline, but only if you know what you’re looking for:

By 50 iterations it’s much clearer:

By 100 iterations it’s looking quite cool:

200 iterations:

At this point the rate of improvement appears to slow.

300 iterations:

500 iterations:

1000 iterations:

Performance

It took 10 hours to run on my Mid 2013 Macbook Air with no GPU. Apparently, if you have a GPU this would only have taken about 30 minutes to run.

If you’re looking for an excuse to play with AWS, then see if you can get it running in AWS using a machine image (AMI) which already has all the dependencies you need. I raised a GitHub issue suggesting this, as that would make life a lot easier, and you could pick a high spec machine to get results super quickly.

Creating the video and GIF

Once I had the 200 checkpoint images, I really wanted to create a video. I found this useful Stackexchange question explaining how to use the ffmpeg command line utility to create an mp4 file from a bunch of image frames.

I had to manually add some leading zeros onto the checkpoint filenames before I could get ffmpeg to use them in the correct order, but otherwise it was relatively painless (I suspect the neural-style “checkpoint-output” option could have been used to generate them in exactly the right format — maybe I’ll try that next time).

Once I had the mp4, I used this free website to convert it to an animated GIF (since Medium only allows image files to be embedded, not videos).