Turn any video into a Van Gogh with BigDL

Intel

Published in

Intel Tech

6 min readJan 24, 2023

Authors: Ezequiel Lanza, Ruonan Wang

In the last few years, neural networks have been making connections everywhere.

From bacteria to satellite imagery, they work to replicate behavior of neurons in the brain with algorithms.

Video processing is also a great use case. Algorithms developed in the 1980s are only now approaching useful speeds thanks to huge boosts in computing power.

In this blog post we’ll explore advancements in video processing, particularly in video stylization.

Video Stylization: A Definition

If you work as a content creator, your goal may be to create a video with your own signature style, using tools like Adobe Illustrator to manually create images with your own color palette. What style would you say this image is in?

“Two Poplars in the Apilles near Saint-Rémy,” Internet Archive CC0 1.0 Universal

Looks like it came straight from the brush of Vincent Van Gogh, right? Your brain predicted the artist based on the style, drawing from the images of his other artworks stored in your capable brain. If you only had the superpower to extract the style and insert it into your brain, you could create and modify images with the master strokes of the famous impressionist.

We’re not there yet, but with image stylization an algorithm can. These models can modify an input image and turn it into a desired style; open source projects like this based on (Huang & Belongie, 2017) can help you make these artistic transformations.
Check out the results below:

That’s fine for a static image, but what about video?

Videos are a consecutive stream of images, so video stylization is more complex than image stylization. Challenges include how to achieve real-time video stylization (Lu et al., 2018) and how to maintain the continuity of all the frames that make up the video. It also involves video processing (Li et al., 2022), image stylization technology and other fields, due it’s not that simple it has always been a hot academic topic. And currently they are turning towards 3d stylization where the style is propagated to an object in three dimensions (Nguyen-Phuoc et al., 2022) (Liu et al., 2022)(Hauptfleisch et al., 2020)

Video stylization is a great shortcut for creatives who don’t want (or don’t have the time) to manually draw thousands of sequences. Just ask the model to modify the input video emulating the artist (style transfer) and voilà The output will be something like this:

Image: Ezequiel Lanza. Original video: cottonbro studio

As you can see by comparing the input with the output, the model emulates the style of the initial keyframe using the Van Gogh hallmarks. It takes into account parameters like colors or drawing styles to emulate the style.

This model can also perform stylization on any video you want:

Image: Ezequiel Lanza. Original video: MART PRODUCTION

From a business perspective, it allows digital content producers produce interesting content quickly, subtracting the need to manually draw every frame. At the same time, this tech can also be embedded into existing video streaming apps or photo processing apps to provide functions like video stylization and video filters — such as this example to transfer multiple pre-trained styles to a video stream captured from a regular iPhone* camera. The results look great — and the quality keeps improving — but advancements in artificial intelligence can also mean laggy processing times.

How does it work?

Just like for every AI model, there are two stages: Training and inference.

Before training the model, the input video must be converted into images (Group A). The user provides stylized keyframes of the input images (Group B). This is all the data the model needs for training; then it learns from that data to produce the stylized images (Group C).

Once trained, it’s ready to produce an inference with any other input, meaning that during the inference the model will apply the style it learned and apply it to each input image, as seen below.

Image: Ruonan Wang. Original: “Interactive Video Stylization Using Few-Shot Patch-Based Training”

The trained model can modify every input video you need, with just a few images being necessary to create a model in about 20–30 minutes. Refer to (Texler et al., 2020) for the nitty-gritty details on few-shot architectures.

There are key challenges for video stylization, including real-time inference, parallel processing, and continuity of different frames in the video.

How BigDL Can Help

Although the model has been optimized to a certain extent, there’s still an obvious lag in real-time video stylization.

BigDL can help reduce that gap. BigDL-Nano has two functions — quantization and multi-process inference — that can nearly double processing speed with fewer code changes.

It’s always better to show, not tell: The table below documents the speed of video stylization before and after BigDL-Nano acceleration on a Intel® Core™ i9–12900 Processor with different process numbers. (The process number here refers to the number of processes conducting inference at the same time. For example, if the number of processes=1, there’s only one main process needed to render the video frame-by-frame. If the number of processes=4, then four processes launch at the same time, and each process only needs to process about one quarter of the image frames.)
Each latency result is calculated by averaging 20 repeated experiments. Note: The values below will vary on input video length/resolution and type of CPU used.

Check out the BigDL repo for optimized implementation.

Video stylization is a great way to create engaging content, despite the hurdles of long computation times. BigDL-Nano provides powerful video stylization with faster delivery to save time and energy — even making it possible perform on portable mini PCs, like Intel® NUCs.

For more open source content from Intel, check out open.intel

References

Hauptfleisch, F., Texler, O., Texler, A., Krivánek, J., & Sýkora, D. (2020). StyleProp: Real‐time Example‐based Stylization of 3D Models. Computer Graphics Forum, 39(7), 575–586. https://doi.org/10.1111/cgf.14169

Huang, X., & Belongie, S. (2017). Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization. https://doi.org/10.48550/ARXIV.1703.06868

Liu, F.-L., Chen, S.-Y., Lai, Y.-K., Li, C., Jiang, Y.-R., Fu, H., & Gao, L. (2022). DeepFaceVideoEditing: Sketch-based deep editing of face videos. ACM Transactions on Graphics, 41(4), 1–16. https://doi.org/10.1145/3528223.3530056

Nguyen-Phuoc, T., Liu, F., & Xiao, L. (2022). SNeRF: Stylized Neural Implicit Representations for 3D Scenes. https://doi.org/10.48550/ARXIV.2207.02363

Texler, O., Futschik, D., Kučera, M., Jamriška, O., Sochorová, Š., Chai, M., Tulyakov, S., & Sýkora, D. (2020). Interactive Video Stylization Using Few-Shot Patch-Based Training (arXiv:2004.14489). arXiv. http://arxiv.org/abs/2004.14489

About the Authors

Ezequiel Lanza is an open source evangelist on Intel’s Open Ecosystem Team, passionate about helping people discover the exciting world of AI. He’s also a frequent AI conference presenter and creator of use cases, tutorials, and guides to help developers adopt open source AI tools like TensorFlow* and Hugging Face*. Find him on Twitter at @eze_lanza

Ruonan Wang is an AI Frameworks Engineer at Intel AIA, currently focused on developing BigDL-Nano, a Python* package to transparently accelerate PyTorch* and TensorFlow* applications on Intel hardware.