THE AI AISLE
A Lightweight PyTorch Implementation of Neural Style Transfer
Create your own digital art with convolutional neural networks, in five simple steps!
Art transcends human existence. We see its significance throughout the course of history — from pre-historic times, through some of the greatest river valley civilizations, forts and courts of monarchs, right up to the modern techno age. Art has been a means to express one’s vision of the world. The legendary Pablo Picasso once said,
“It took me four years to paint like Raphael,
but a lifetime to paint like a child.”— Pablo Picasso
Most art follows a pattern — a pattern that is pleasing and stimulates our brain. The next time you see an artwork, try to notice its color theme, or the brush strokes in it. You will see a pattern emerging out of it. We humans are skilled in sub-consciously recognizing these patterns. Now, with the help of neural networks, this ability to recognize and artificially recreate patterns has been developed too.
The Research Paper by Gatys et al.
A research paper titled ‘A Neural Algorithm of Artistic Style’ by Gatys et al, (originally released to ArXiv in 2015 and subsequently accepted by the CVPR conference in 2016), was the first in neural style transfer, and is still considered the most ground-breaking work till date in the domain.
In this article, we will build a lightweight PyTorch implementation of neural style transfer as discussed in this seminary paper, and learn how to transfer popular art styles onto any image in five simple steps. Let’s go!
Table of Contents
- Overview
- How Does it Work?
- Getting Started
– File Description
– Dependencies
– How to use NST on images of your own? - Obtained Output
- Applications of NST
- Acknowledgements
Overview
Neural style transfer (NST) is a technique that takes two images — a content image and a style reference image — and blends them together so that the output looks like the content image, but painted in the style of the style reference image.
It is an example of image stylization (an image processing and manipulation technique) within the broader field of non-photorealistic rendering.
How Does it Work?
- We make use of a pre-trained convolutional neural network (VGG19, in our case) to extract the image details.
– Starting from the network’s input layer, the first few layer activations represent low-level features like colors and textures (the ‘style’).
– As we step through the network, the final few layers represent higher-level features (the ‘content’)— for example the cat’s eyes and ears in this case. - This is why, we begin by taking our content image, feeding it through VGG19, and sampling the network activations at a late convolution layer
(conv4_2)
. - Then, we take our style image, feed it through the same network, and sample the network activations at the early to middle convolution layers (
conv1_1
,conv2_1
,conv3_1
,conv4_1
,conv5_1
).
- These activations are encoded into a Gram matrix representation, and serve to denote the ‘style’ of the image.
- Our goal is to synthesize an output image that exhibits the contents of one image, with the style of another. For this, we calculate the following losses:
– content loss, which is the L2 distance between the content image and the generated image,
– style loss, which is the sum of L2 distances between the Gram matrices of the representations of the content image and the style image, extracted from different layers of VGG19,
– total variation loss, which is used for spatial continuity between the pixels of the generated image, thereby de-noising it and giving it visual coherence, and
– total loss, which is the sum of all of the above losses multiplied by their respective weights. - An iterative optimization technique (L-BFGS, in our case) is then employed to gradually minimize theses losses to achieve the desired results.
Getting Started
➡ File Description
You can have a look at the code for neural style transfer in my repository on GitHub.
- vgg19.py contains the VGG19 model definition, specifying which layers are to be used for style representation and which one for content.
- NST.py is the file that contains steps such as manipulation, generation, optimization, tuning and saving of the final output image.
Neural-Style-Transfer
├── data
| ├── content-images
| ├── style-images
├── models/definitions
│ ├── vgg19.py # VGG19 model definition
├── NST.py # The main python file
├── LICENSE
└── README.md
➡ Dependencies
- Python 3.9+
- Framework: PyTorch
- Libraries: os, numpy, cv2, matplotlib, torchvision
➡ How to use NST on images of your own?
- Clone the repository and move to the downloaded folder:
$ git clone https://github.com/nazianafis/Neural-Style-Transfer $ cd Neural-Style-Transfer
2. Move your content/style image(s) to their respective folders inside the data
folder.
3. Go to NST.py
, and in it, set the PATH
variable to your downloaded folder. Also set CONTENT_IMAGE
, STYLE_IMAGE
variables as your desired images
$ PATH = <your_path>
$ CONTENT_IMAGE = <your_content_image_name>
$ STYLE_IMAGE = <your_style_image_name>
4. Run NST.py
:
$ python NST.py
5. That’s it! Find your generated image in the output-images
folder inside data
.
Obtained Output
The following output images were generated using no image manipulation programs other than the code described in this article. I encourage you to try it!
Practical Applications
There are many real-world use cases for NST. A non-exhaustive list would contain:
- Photo and video editing: With recent advances in style transfer, anyone (artist or not) can create their own artistic masterpieces and share with the world. This trend has also been evident from the rise of notable end-user applications such as DeepArt and Prisma.
- Commercial art: With humongous popular inquisitiveness in NFTs (non-fungible tokens), the market for art creation and consumption is only growing in size. Artists can now lend/sell their art style to others, allowing new and innovative representations of their styles to live alongside original masterpieces.
- Gaming and Virtual Reality: Metaverse is the latest buzzword. We are swiftly moving towards digitalization of the real world, through a combination of augmented reality (AR) and virtual reality (VR) services, and visual computing (image and video manipulation, computer graphics, computer vision) skills are touted to be in high demand in near future.
If you’re thinking of getting into visual computing, the time is now! 😊
Acknowledgements
These are some of the resources I referred to while working on this project. You might want to check them out.
- PyTorch’s tutorial on NST
- Aleksa Gordic’s implementation
- The original paper on neural style transfer by Gatys et al.
- The original paper on VGG19
- Wikimedia, Unsplash for all the content and style images
In the coming days, I plan on learning more about image generation/manipulation via machine learning. I also aim to create more art with AI (I’ve been fascinated with fractals for some time now!). Feel free to contact me if you find bugs(🕷)/have suggestions.
I hope this article was of use to you. You can connect with me on LinkedIn, or follow my writings here.
Until next time! (∗ ・‿・)ノ゛