Use BigDL-nano to turn your video into art

Intel
Intel Tech
Published in
7 min readApr 25

--

The code in this tutorial will get you started.

The result of a video in Picasso style

Authors: Ezequiel Lanza, Ruonan Wang

This tutorial will show you how to turn your own video into a work of art. It builds on the implementation of BigDL-Nano for video stylization example in our previous post.

At its core, BigDL-Nano provides two main features::

  • Transparently accelerates PyTorch* and TensorFlow* applications on Intel hardware.
  • Provides a unified and easy-to-use API for optimization techniques and tools, so that with a few lines of code, PyTorch or TensorFlow run faster.

We’ll stylize a video with Picasso, named after the Spanish artist. It’s an implementation of (Texler et al., 2020) in Pytorch. The model is trained and inferenced using acceleration provided by BigDL-Nano.

Prepare Your Data and Environment

Prepare your environment before training. We recommend using a clean environment following these instructions.

The first step to train a supervised model is to gather examples of what you’d like the model to perform. Here, we’ll provide images both stylized and non-stylized to allow the model to learn the style. You can get yours, with the style you choose, using this open source implementation of an Arbitrary Style Transfer.

To make things easier, we’ve provided input (non-stylized) and target (stylized) images (data.zip):

!unzip data.zip

Note: The “!” allows Jupyter* to run terminal commands.

Input images
These are converted to frames from the stream of your training video (video_to_imgs function), listed from 0 to 76 (77 frames). You’ll find them under data/input folder after unzipping.

Target images
In this tutorial these are stylized with Picasso. You don’t need to provide all 76 target images. Just a few images are enough for the model to understand and train. Here, we’ve provided images numbered 0, 35 and 70 which are under data/target folder after unzipping.

Training

Now it’s time to train the model with a script accelerated by BigDL-Nano (nano_train.py).

!python nano_train.py

Note: You can compare the time it takes to train by yourself using the regular PyTorch-Lightning Trainer (train.py).

It will train the network previously defined on the base implementation described on (Nguyen-Phuoc et al., 2022) using BigDL-Nano trainer to accelerate the process. You’ll notice there’s no need to change anything in your code, just replace the regular PyTorch-Lightning Trainer with BigDL-Nano Trainer.

from bigdl.nano.pytorch import Trainer 

PyTorch-Lightning Trainer

 trainer = pl.Trainer( 

gpus=0,

max_epochs=100,

log_every_n_steps=8,

limit_train_batches=1.0,

limit_val_batches=1.0,

limit_test_batches=1.0,

check_val_every_n_epoch=20,

reload_dataloaders_every_n_epochs=1,

profiler=profiler,

logger=logger,

callbacks=callbacks,

# fast_dev_run=True,

)

BigDL-Nano Trainer

 from bigdl.nano.pytorch import Trainer 



trainer = Trainer(

gpus=0,

max_epochs=100,

log_every_n_steps=8,

limit_train_batches=1.0,

limit_val_batches=1.0,

limit_test_batches=1.0,

check_val_every_n_epoch=20,

reload_dataloaders_every_n_epochs=1,

profiler=profiler,

logger=logger,

callbacks=callbacks,

# fast_dev_run=True,

)

After a few minutes, you’ll get a message showing how long the script took to train the model. You’ll find your trained model under: ./data/models/

Note: This is not the only model file saved under that directory. You’ll see three files: “generator.pt”, “discriminator.pt” and “latest.ckpt”. For inferencing, you just need “generator.pt” to produce the new stylized images.

Training

Now it’s time to train the model with a script accelerated by BigDL-Nano (nano_train.py).

!python nano_train.py 

Note: You can compare how long it would take without BigDL accelaration by using the regular PyTorch-Lightning Trainer (train.py).

It will train the network previously defined on the base implementation described on (Nguyen-Phuoc et al., 2022) using BigDL-Nano trainer to accelerate the process. You’ll notice there’s no need to change anything in your code, just replace the regular PyTorch-Lightning Trainer with BigDL-Nano Trainer.

from bigdl.nano.pytorch import Trainer 

PyTorch-Lightning Trainer

trainer = pl.Trainer( 

gpus=0,

max_epochs=100,

log_every_n_steps=8,

limit_train_batches=1.0,

limit_val_batches=1.0,

limit_test_batches=1.0,

check_val_every_n_epoch=20,

reload_dataloaders_every_n_epochs=1,

profiler=profiler,

logger=logger,

callbacks=callbacks,

# fast_dev_run=True,

)

BigDL-Nano Trainer

from bigdl.nano.pytorch import Trainer 



trainer = Trainer(

gpus=0,

max_epochs=100,

log_every_n_steps=8,

limit_train_batches=1.0,

limit_val_batches=1.0,

limit_test_batches=1.0,

check_val_every_n_epoch=20,

reload_dataloaders_every_n_epochs=1,

profiler=profiler,

logger=logger,

callbacks=callbacks,

# fast_dev_run=True,

)

After a few minutes, you’ll get the message showing how long the script took to train the model. You’ll find your trained model under: ./data/models/

Note: This is not the only model file saved under that directory. You’ll see three of them: “generator.pt”, “discriminator.pt” and “latest.ckpt”. For inferencing, you just need “generator.pt” to produce the new stylized images.

Help Functions
These help functions, extracted from multiple open source repositories, will aid with preparing the data.

  • Display video: Shows the video.
  • Imgs_to_video: Converts multiple frames in a video. Since the model makes the inference on frames, you’ll need to convert it back to video.
  • Video_to_img: Converts input video into a sequence of frames. Since you’ll provide a video to run the inference be aware that the model runs on frames, it’s needed to feed the model with images (frames).
import os 

from IPython.display import HTML

from base64 import b64encode

from PIL import Image as PILImage

import cv2

from cv2 import VideoCapture, imwrite

import numpy as np





def display_video(file_path, width=512):

# Source: https://colab.research.google.com/drive/1_kbRZPTjnFgViPrmGcUsaszEdYa8XTpq#scrollTo=DxlIqGfATvvj&line=1&uniqifier=1

compressed_video_path = 'comp_' + file_path

if os.path.exists(compressed_video_path):

os.remove(compressed_video_path)

os.system(f'ffmpeg -i {file_path} -vcodec libx264 -loglevel quiet {compressed_video_path}')



mp4 = open(compressed_video_path, 'rb').read()

data_url = 'data:simul2/mp4;base64,' + b64encode(mp4).decode()

return HTML("""

<video width={} controls>

<source src="{}" type="video/mp4">

</video>

""".format(width, data_url))





def imgs_to_video(output_dir, video_name='demo_output.mp4', fps=24):

# Refer to: https://stackoverflow.com/questions/52414148/turn-pil-images-into-video-on-linux

imgs = []

for image_name in os.listdir(output_dir):

if image_name.endswith('.jpg'):

imgs.append(output_dir + image_name)

imgs.sort(key=lambda img : int(img.split('/')[-1].split('.')[0]))

pil_imgs = []

for file in imgs:

pil_imgs.append(PILImage.open(file))

video_dims = (pil_imgs[0].width, pil_imgs[0].height)

fourcc = cv2.VideoWriter_fourcc(*'DIVX')

video = cv2.VideoWriter(video_name, fourcc, fps, video_dims)

for img in pil_imgs:

tmp_img = img.copy()

video.write(cv2.cvtColor(np.array(tmp_img), cv2.COLOR_RGB2BGR))





def video_to_imgs(video_name='demo_output.mp4', image_dir="./images/"):

video_capture = VideoCapture(video_name)

number = 0

while True:

flag, frame = video_capture.read()

if flag is False:

break

w, h = frame.shape[0], frame.shape[1]

if w % 4 != 0 or h % 4 != 0:

NW = int((w // 4) * 4)

NH = int((h // 4) * 4)

frame = cv2.resize(frame, (NW, NH))

imwrite(image_dir + str(0000+number)+'.jpg', frame)

number += 1

Load model and Acceleration by InferenceOptimizer

Now that the model is trained, it can transfer the style to ANY input video (frames). There’s also a BigDL accelerated version called InferenceOptimizer for this stage.

Before running inference, perform these steps:

  1. Select the device where it will be executed
  2. Load the model in memory
  3. Load the data in a Pytorch-understandable format (Dataloader)

If you’re wondering how the acceleration works, it’s boosted by integrating extra runtimes as inference backend engines or by using quantization methods. Quantization compresses models at a lower precision, but these lighter models try not to sacrifice accuracy. Here are some additional optimizations if you want to try them out.

In this tutorial, you’ll use the most basic one, INT8 precision, without search tunning space to control accuracy drop.

from torch.utils.data import DataLoader 

import torch

from tqdm import tqdm

import torchvision.transforms as transforms

from pathlib import Path

from data import read_image_tensor, write_image_tensor, ImageDataset

from train import data_path, model_save_path



# load model

device = 'cpu'

dtype = torch.float32



generator = torch.load(model_save_path/"generator.pt")

generator.eval()

generator.to(device, dtype)



# prepare calib dataloader

input_dir = data_path/'input'

file_paths = [file for file in input_dir.iterdir()]



params = {'batch_size': 1,

'num_workers': 8,

'pin_memory': True}



dataset = ImageDataset(file_paths, transform=None)

loader = DataLoader(dataset, **params)



from bigdl.nano.pytorch import InferenceOptimizer

model = InferenceOptimizer.quantize(accelerator=None,

model=generator,

calib_dataloader=loader)

Inference with Input Video

Let’s run the inference. You’ll need to provide the input_video file and the num_processes you’d like to use (refer to the previous post for details.)

In addition to accelerating the model, BigDL-Nano offers multi-process inference acceleration to improve the utilization rate of physical cores and increase the throughput of inference.

To use this function, just call the following API and pass in the corresponding number of processes (>1):

multi_instance_model = InferenceOptimizer.to_multi_instance(model, num_processes=num_processes)

You’ll find more details in the BigDL documentation.

from pathlib import Path 

import time



from bigdl.nano.pytorch import InferenceOptimizer





if __name__ == "__main__":

input_video = "demo.mp4"

num_processes = 4 # specify number of processes



image_dir = "./video2pic/"

output_dir = "./video-output/"

os.makedirs(image_dir, exist_ok=True)

os.makedirs(output_dir, exist_ok=True)



video_to_imgs(input_video, image_dir)



img_list = [Path(image_dir, image_name) for image_name in os.listdir(image_dir)]

params = {'batch_size': 1,

}

dataset = ImageDataset(img_list, transform=None)

loader = DataLoader(dataset, **params)



if num_processes > 1:

print("{} processes is used.".format(num_processes))

st = time.perf_counter()

with torch.no_grad():

# call `InferenceOptimizer.to_multi_instance` to get a multi-instance inference model

multi_instance_model = InferenceOptimizer.to_multi_instance(model, num_processes=num_processes)

# collect a input list

inputs_list, names_list = [], []

for inputs, names in loader:

inputs = inputs.to(device, dtype)

inputs_list.append(inputs)

names_list.append(names)

# inference the input list

outputs_list = multi_instance_model(inputs_list)

# handle the output list

for outputs, names in zip(outputs_list, names_list):

for k in range(len(outputs)):

write_image_tensor(outputs[k], Path(output_dir, names[k]))

del outputs

end = time.perf_counter()

print("Generation costs {}s".format(end - st))

else:

st = time.perf_counter()

with torch.no_grad():

for inputs, names in tqdm(loader):

inputs = inputs.to(device, dtype)

# original model

# outputs = generator(inputs)

# accelerated model

outputs = model(inputs)

for k in range(len(outputs)):

write_image_tensor(outputs[k], Path(output_dir, names[k]))

del outputs

end = time.perf_counter()

print("Generation costs {}s".format(end - st))

imgs_to_video(output_dir, "demo_output.mp4", fps=25)

Demo Output

Now you’ll get your stylized video.

display_video("demo_output.mp4") 

References

The video used in the tutorial comes from this paper on “High-Level Video Understanding (HLVU) dataset of open source movies.”

Texler, O., Futschik, D., Kučera, M., Jamriška, O., Sochorová, Š., Chai, M., Tulyakov, S., & Sýkora, D. (2020). Interactive Video Stylization Using Few-Shot Patch-Based Training (arXiv:2004.14489). arXiv. http://arxiv.org/abs/2004.14489

About the Authors

Ezequiel Lanza is an open source evangelist on Intel’s Open Ecosystem Team, passionate about helping people discover the exciting world of AI. He’s also a frequent AI conference presenter and creator of use cases, tutorials, and guides to help developers adopt open source AI tools like TensorFlow* and Hugging Face*. Find him on Twitter at @eze_lanza

Ruonan Wang is an AI Frameworks Engineer at Intel AIA, currently focused on developing BigDL-Nano, a Python* package to transparently accelerate PyTorch* and TensorFlow* applications on Intel hardware.

For more open source content from Intel, check out open.intel

--

--

Intel
Intel Tech

Intel news, views & events about global tech innovation.