Use BigDL-nano to turn your video into art
--
The code in this tutorial will get you started.
Authors: Ezequiel Lanza, Ruonan Wang
This tutorial will show you how to turn your own video into a work of art. It builds on the implementation of BigDL-Nano for video stylization example in our previous post.
At its core, BigDL-Nano provides two main features::
- Transparently accelerates PyTorch* and TensorFlow* applications on Intel hardware.
- Provides a unified and easy-to-use API for optimization techniques and tools, so that with a few lines of code, PyTorch or TensorFlow run faster.
We’ll stylize a video with Picasso, named after the Spanish artist. It’s an implementation of (Texler et al., 2020) in Pytorch. The model is trained and inferenced using acceleration provided by BigDL-Nano.
Prepare Your Data and Environment
Prepare your environment before training. We recommend using a clean environment following these instructions.
The first step to train a supervised model is to gather examples of what you’d like the model to perform. Here, we’ll provide images both stylized and non-stylized to allow the model to learn the style. You can get yours, with the style you choose, using this open source implementation of an Arbitrary Style Transfer.
To make things easier, we’ve provided input (non-stylized) and target (stylized) images (data.zip):
!unzip data.zip
Note: The “!” allows Jupyter* to run terminal commands.
Input images
These are converted to frames from the stream of your training video (video_to_imgs function), listed from 0 to 76 (77 frames). You’ll find them under data/input folder after unzipping.
Target images
In this tutorial these are stylized with Picasso. You don’t need to provide all 76 target images. Just a few images are enough for the model to understand and train. Here, we’ve provided images numbered 0, 35 and 70 which are under data/target folder after unzipping.
Training
Now it’s time to train the model with a script accelerated by BigDL-Nano (nano_train.py).
!python nano_train.py
Note: You can compare the time it takes to train by yourself using the regular PyTorch-Lightning Trainer (train.py).
It will train the network previously defined on the base implementation described on (Nguyen-Phuoc et al., 2022) using BigDL-Nano trainer to accelerate the process. You’ll notice there’s no need to change anything in your code, just replace the regular PyTorch-Lightning Trainer with BigDL-Nano Trainer.
from bigdl.nano.pytorch import Trainer
PyTorch-Lightning Trainer
trainer = pl.Trainer(
gpus=0,
max_epochs=100,
log_every_n_steps=8,
limit_train_batches=1.0,
limit_val_batches=1.0,
limit_test_batches=1.0,
check_val_every_n_epoch=20,
reload_dataloaders_every_n_epochs=1,
profiler=profiler,
logger=logger,
callbacks=callbacks,
# fast_dev_run=True,
)
BigDL-Nano Trainer
from bigdl.nano.pytorch import Trainer
trainer = Trainer(
gpus=0,
max_epochs=100,
log_every_n_steps=8,
limit_train_batches=1.0,
limit_val_batches=1.0,
limit_test_batches=1.0,
check_val_every_n_epoch=20,
reload_dataloaders_every_n_epochs=1,
profiler=profiler,
logger=logger,
callbacks=callbacks,
# fast_dev_run=True,
)
After a few minutes, you’ll get a message showing how long the script took to train the model. You’ll find your trained model under: ./data/models/
Note: This is not the only model file saved under that directory. You’ll see three files: “generator.pt”, “discriminator.pt” and “latest.ckpt”. For inferencing, you just need “generator.pt” to produce the new stylized images.
Training
Now it’s time to train the model with a script accelerated by BigDL-Nano (nano_train.py).
!python nano_train.py
Note: You can compare how long it would take without BigDL accelaration by using the regular PyTorch-Lightning Trainer (train.py).
It will train the network previously defined on the base implementation described on (Nguyen-Phuoc et al., 2022) using BigDL-Nano trainer to accelerate the process. You’ll notice there’s no need to change anything in your code, just replace the regular PyTorch-Lightning Trainer with BigDL-Nano Trainer.
from bigdl.nano.pytorch import Trainer
PyTorch-Lightning Trainer
trainer = pl.Trainer(
gpus=0,
max_epochs=100,
log_every_n_steps=8,
limit_train_batches=1.0,
limit_val_batches=1.0,
limit_test_batches=1.0,
check_val_every_n_epoch=20,
reload_dataloaders_every_n_epochs=1,
profiler=profiler,
logger=logger,
callbacks=callbacks,
# fast_dev_run=True,
)
BigDL-Nano Trainer
from bigdl.nano.pytorch import Trainer
trainer = Trainer(
gpus=0,
max_epochs=100,
log_every_n_steps=8,
limit_train_batches=1.0,
limit_val_batches=1.0,
limit_test_batches=1.0,
check_val_every_n_epoch=20,
reload_dataloaders_every_n_epochs=1,
profiler=profiler,
logger=logger,
callbacks=callbacks,
# fast_dev_run=True,
)
After a few minutes, you’ll get the message showing how long the script took to train the model. You’ll find your trained model under: ./data/models/
Note: This is not the only model file saved under that directory. You’ll see three of them: “generator.pt”, “discriminator.pt” and “latest.ckpt”. For inferencing, you just need “generator.pt” to produce the new stylized images.
Help Functions
These help functions, extracted from multiple open source repositories, will aid with preparing the data.
- Display video: Shows the video.
- Imgs_to_video: Converts multiple frames in a video. Since the model makes the inference on frames, you’ll need to convert it back to video.
- Video_to_img: Converts input video into a sequence of frames. Since you’ll provide a video to run the inference be aware that the model runs on frames, it’s needed to feed the model with images (frames).
import os
from IPython.display import HTML
from base64 import b64encode
from PIL import Image as PILImage
import cv2
from cv2 import VideoCapture, imwrite
import numpy as np
def display_video(file_path, width=512):
# Source: https://colab.research.google.com/drive/1_kbRZPTjnFgViPrmGcUsaszEdYa8XTpq#scrollTo=DxlIqGfATvvj&line=1&uniqifier=1
compressed_video_path = 'comp_' + file_path
if os.path.exists(compressed_video_path):
os.remove(compressed_video_path)
os.system(f'ffmpeg -i {file_path} -vcodec libx264 -loglevel quiet {compressed_video_path}')
mp4 = open(compressed_video_path, 'rb').read()
data_url = 'data:simul2/mp4;base64,' + b64encode(mp4).decode()
return HTML("""
<video width={} controls>
<source src="{}" type="video/mp4">
</video>
""".format(width, data_url))
def imgs_to_video(output_dir, video_name='demo_output.mp4', fps=24):
# Refer to: https://stackoverflow.com/questions/52414148/turn-pil-images-into-video-on-linux
imgs = []
for image_name in os.listdir(output_dir):
if image_name.endswith('.jpg'):
imgs.append(output_dir + image_name)
imgs.sort(key=lambda img : int(img.split('/')[-1].split('.')[0]))
pil_imgs = []
for file in imgs:
pil_imgs.append(PILImage.open(file))
video_dims = (pil_imgs[0].width, pil_imgs[0].height)
fourcc = cv2.VideoWriter_fourcc(*'DIVX')
video = cv2.VideoWriter(video_name, fourcc, fps, video_dims)
for img in pil_imgs:
tmp_img = img.copy()
video.write(cv2.cvtColor(np.array(tmp_img), cv2.COLOR_RGB2BGR))
def video_to_imgs(video_name='demo_output.mp4', image_dir="./images/"):
video_capture = VideoCapture(video_name)
number = 0
while True:
flag, frame = video_capture.read()
if flag is False:
break
w, h = frame.shape[0], frame.shape[1]
if w % 4 != 0 or h % 4 != 0:
NW = int((w // 4) * 4)
NH = int((h // 4) * 4)
frame = cv2.resize(frame, (NW, NH))
imwrite(image_dir + str(0000+number)+'.jpg', frame)
number += 1
Load model and Acceleration by InferenceOptimizer
Now that the model is trained, it can transfer the style to ANY input video (frames). There’s also a BigDL accelerated version called InferenceOptimizer for this stage.
Before running inference, perform these steps:
- Select the device where it will be executed
- Load the model in memory
- Load the data in a Pytorch-understandable format (Dataloader)
If you’re wondering how the acceleration works, it’s boosted by integrating extra runtimes as inference backend engines or by using quantization methods. Quantization compresses models at a lower precision, but these lighter models try not to sacrifice accuracy. Here are some additional optimizations if you want to try them out.
In this tutorial, you’ll use the most basic one, INT8 precision, without search tunning space to control accuracy drop.
from torch.utils.data import DataLoader
import torch
from tqdm import tqdm
import torchvision.transforms as transforms
from pathlib import Path
from data import read_image_tensor, write_image_tensor, ImageDataset
from train import data_path, model_save_path
# load model
device = 'cpu'
dtype = torch.float32
generator = torch.load(model_save_path/"generator.pt")
generator.eval()
generator.to(device, dtype)
# prepare calib dataloader
input_dir = data_path/'input'
file_paths = [file for file in input_dir.iterdir()]
params = {'batch_size': 1,
'num_workers': 8,
'pin_memory': True}
dataset = ImageDataset(file_paths, transform=None)
loader = DataLoader(dataset, **params)
from bigdl.nano.pytorch import InferenceOptimizer
model = InferenceOptimizer.quantize(accelerator=None,
model=generator,
calib_dataloader=loader)
Inference with Input Video
Let’s run the inference. You’ll need to provide the input_video file and the num_processes you’d like to use (refer to the previous post for details.)
In addition to accelerating the model, BigDL-Nano offers multi-process inference acceleration to improve the utilization rate of physical cores and increase the throughput of inference.
To use this function, just call the following API and pass in the corresponding number of processes (>1):
multi_instance_model = InferenceOptimizer.to_multi_instance(model, num_processes=num_processes)
You’ll find more details in the BigDL documentation.
from pathlib import Path
import time
from bigdl.nano.pytorch import InferenceOptimizer
if __name__ == "__main__":
input_video = "demo.mp4"
num_processes = 4 # specify number of processes
image_dir = "./video2pic/"
output_dir = "./video-output/"
os.makedirs(image_dir, exist_ok=True)
os.makedirs(output_dir, exist_ok=True)
video_to_imgs(input_video, image_dir)
img_list = [Path(image_dir, image_name) for image_name in os.listdir(image_dir)]
params = {'batch_size': 1,
}
dataset = ImageDataset(img_list, transform=None)
loader = DataLoader(dataset, **params)
if num_processes > 1:
print("{} processes is used.".format(num_processes))
st = time.perf_counter()
with torch.no_grad():
# call `InferenceOptimizer.to_multi_instance` to get a multi-instance inference model
multi_instance_model = InferenceOptimizer.to_multi_instance(model, num_processes=num_processes)
# collect a input list
inputs_list, names_list = [], []
for inputs, names in loader:
inputs = inputs.to(device, dtype)
inputs_list.append(inputs)
names_list.append(names)
# inference the input list
outputs_list = multi_instance_model(inputs_list)
# handle the output list
for outputs, names in zip(outputs_list, names_list):
for k in range(len(outputs)):
write_image_tensor(outputs[k], Path(output_dir, names[k]))
del outputs
end = time.perf_counter()
print("Generation costs {}s".format(end - st))
else:
st = time.perf_counter()
with torch.no_grad():
for inputs, names in tqdm(loader):
inputs = inputs.to(device, dtype)
# original model
# outputs = generator(inputs)
# accelerated model
outputs = model(inputs)
for k in range(len(outputs)):
write_image_tensor(outputs[k], Path(output_dir, names[k]))
del outputs
end = time.perf_counter()
print("Generation costs {}s".format(end - st))
imgs_to_video(output_dir, "demo_output.mp4", fps=25)
Demo Output
Now you’ll get your stylized video.
display_video("demo_output.mp4")
References
The video used in the tutorial comes from this paper on “High-Level Video Understanding (HLVU) dataset of open source movies.”
Texler, O., Futschik, D., Kučera, M., Jamriška, O., Sochorová, Š., Chai, M., Tulyakov, S., & Sýkora, D. (2020). Interactive Video Stylization Using Few-Shot Patch-Based Training (arXiv:2004.14489). arXiv. http://arxiv.org/abs/2004.14489
About the Authors
Ezequiel Lanza is an open source evangelist on Intel’s Open Ecosystem Team, passionate about helping people discover the exciting world of AI. He’s also a frequent AI conference presenter and creator of use cases, tutorials, and guides to help developers adopt open source AI tools like TensorFlow* and Hugging Face*. Find him on Twitter at @eze_lanza
Ruonan Wang is an AI Frameworks Engineer at Intel AIA, currently focused on developing BigDL-Nano, a Python* package to transparently accelerate PyTorch* and TensorFlow* applications on Intel hardware.
For more open source content from Intel, check out open.intel