No king is complete without a royal portrait. Times have changed however, just like the monarchy, art has entered the 21st century! Where before a Royal portrait required months of work for a single “image” today with the help of cutting edge artificial intelligence we can create a digital artist, trained by the Great Masters of the past themselves that is able to paint at over 20 FPS (Filips Per Second).
Interested how we achieved this? Read all about it below.
Some time ago we got word that KU Leuven ICTS was looking for an application to demo the AI capabilities of their new GPU supercomputer called Genius. And who better to build such an application than the newly minted Deep Learning division of an AI prototyping company? That’s right! Brainjar, aiming to bring the iterative and rapid development practices of Craftworkz to cutting edge deep learning projects, worked with the people at ICTS to create a demo that would demonstrate the power of their new GPU powerhouse.
But first, a bit more on Genius: You see, Genius is different from your average, run-of the-mill supercomputer because it makes use of GPUs rather than CPUs. GPUs, or Graphics Processing Units, were originally designed to handle the large amount of parallel computations required to render computer graphics. Individually, a GPU computes slower than a CPU, but it can handle many computations simultaneously compared to a CPU that only handles a few computations at a time.
This makes GPUs ideally suited for tasks that require a lot of parallel computations. And one of the main industries (besides video gaming and cryptocurrency mining) that make eager use of this is artificial intelligence.
Artificial intelligence, or more precisely, the new and exciting AI subdomain called Deep Learning, works by “training” so-called Neural Networks to perform tasks. How it works exactly is out-of-scope for this article (and my brain) but in short:
A Neural Network consists of neurons (the circles in the image below) that are connected to each other in layers. Each of these connections (the lines in the image) has a weight that determines how important a signal coming from a neuron is to a certain neuron in the next layer. Initially these weights are random, but every time the network makes a mistake it can use clever math to slightly modify these weights to make the network work a bit better. As it turns out, if you feed such a network enough data, you can do all kinds of interesting things like detecting objects in images, reading lips, understanding human language or even making (or rather imitating) art.
Explaining Neural Networks without maths in a single paragraph is pretty hard without major omissions, but you can check here if you want a slightly more in depth explanation, or here if you love maths.
After an initial brainstorming session, 3 requirements were defined. It was determined that the demo needed to be:
- AI based
There are many ways to make a GPU sweat: You can perform scientific calculations, render fuzzy balls or even mine cryptocurrencies. But training large scale deep learning networks is definitely the most important use case for GPU superclusters.
Since the application was to be used for the official unveiling event, we determined that it would be nice to have Genius actually do something visually exciting during the event. Therefore it seemed obvious to build a computer vision application, as they have the immediate benefit of producing visual results.
- Supercomputer worthy
This requirement seems straightforward but is actually quite tricky: Aside from memory limitations, all applications that run on a supercomputer also run on a typical desktop machine, only way slower. And while you can certainly show that a certain network can be trained in days or hours, not weeks, it doesn’t exactly carry much of a wow-factor.
So based on these requirements we came up with the following idea: A real-time neural style transfer application that allows people to see the world rendered in the style of one of six different famous paintings. You can check this article if you are interested more in the nitty gritty technical details.
How it works
The keyword here is real-time. Neural style transfers are already quite possible, but the problem is that they typically work on still images, and the process takes a few minutes depending on how big the image is. Even when using our heavy duty workstation (built to handle our semi-large Neural Networks) and an algorithm called “Fast Neural Style Transfer”, it still takes between 1,7 and 2 seconds for a single image.
This is where Genius really shines: Because a single node only needs 0,006 seconds to process a single frame, we can process video at over 160 frames per second, enough to satisfy even the most demanding framerate snobs.
The real time camera feed is handled locally. The computer handling this camera feed takes a frame from the stream and sends it through the internet to the the login node on the supercomputer.
The login node is responsible for making sure the frame gets sent to the correct compute node with the style the user asked for, this style is chosen on the tablet of the Pepper robot. For each style there is an active compute node. When the frame reaches the compute node responsible for the chosen style, the compute node applies the style to the frame and sends it back to the login node.
The login node then sends it back over the internet to the local computer which shows the styled frame on the screen. This all happens almost instantly and many times per second.
P.S. Credits to Deevid De Meyer for helping me write this article but there is no co-authorship in Medium… yet