Neural Style Transfer with OpenVINO™ Toolkit

Manisha Biswas
Intel Software Innovators
12 min readMay 17, 2019

Wonders of India using Neural Style Transfer

In this article, you will learn how to implement Neural Style transfer using Intel OpenVINO™ toolkit with an end to end application. You will be using VGG 19 for neural style transfer and see how model optimization, as well as model inferencing, is done. You will be using historical monuments images of India and apply neural style transfer to it to visualize it with effects.

Building and end to end deep learning solution with OpenVINO™ toolkit

Building and Implementing a deep learning solution involves 3 steps:

  • Train the Model: A machine learning process by which a neural network increases its capacity to understand a given data set. A neural network model is said to be trained if it has undergone this phase. (You will not be covering the training method here because the network is already being trained.)
  • Prepare the Model: A process by which the Model Optimizer in Intel® Distribution of the OpenVINO™ toolkit performs static model analysis and optimization on a trained model for target devices.
  • Run Inference: A process by which a prepared model is tested in a target environment.

OpenVINO™ Toolkit Workflow:

The entire OpenVINO™ toolkit workflow is shown below:

OpenVINO™ Toolkit Workflow

The basic flow is:

1. Use a framework, such as Caffe, MXNet to create and train a CNN inference model

2. Run the created model through Model Optimizer to produce an optimized Intermediate Representation (IR) stored in files (.bin and .xml) for use with the Inference Engine

3. The User Application then loads and runs models on devices using the Inference Engine and the IR files.

The following figure shows the entire process of how to use OpenVINO™ toolkit:

You can go through this link for more details on Intel OpenVINO™ toolkit

How to use OpenVINO™ Toolkit

Preprocessing Stage

You will be using MXNET for creating the model. You would be using the Zhaw’s neural transfer Github repo.

https://github.com/zhaw/neura_style

You will either download or clone the repo.

Prepare the environment required to work with the cloned repository:

  1. Install packages dependency:
sudo apt-get install python-tk

2. Install Python* requirements:

pip3 install --user mxnetpip3 install --user matplotlibpip3 install --user scikit-image

3. Download the pre-trained VGG19 model and save it to the root directory of the cloned repository because the sample expects the model vgg19.params file to be in that directory.

4. Modify source code files of style transfer sample from cloned repository. You have to make changes to the code so that the output from the python file is easily converted into *.xml and *.bin file for model optimization and inferencing.

a. Go to the fast_mrf_cnn subdirectory.

cd ./fast_mrf_cnn

b. Open the symbol.py file and modify the decoder_symbol() function. Replace the code below:

def decoder_symbol():data = mx.sym.Variable('data')data = mx.sym.Convolution(data=data, num_filter=256, kernel=(3,3), pad=(1,1), stride=(1, 1), name='deco_conv1')

With the following code:

def decoder_symbol_with_vgg(vgg_symbol):data = mx.sym.Convolution(data=vgg_symbol, num_filter=256, kernel=(3,3), pad=(1,1), stride=(1, 1), name='deco_conv1')

c. Save and close the symbol.py file.

d. Open and edit the make_image.py file: Modify the __init__() function in the Maker class. Replace the code below:

decoder = symbol.decoder_symbol()

With the following code:

decoder = symbol.decoder_symbol_with_vgg(vgg_symbol)

e. To join the pre-trained weights with the decoder weights, make the following changes: After the code lines for loading the decoder weights:

args = mx.nd.load('%s_decoder_args.nd'%model_prefix)auxs = mx.nd.load('%s_decoder_auxs.nd'%model_prefix)

f. Add the following line:

arg_dict.update(args)

g. In the decoder.bind() function below you need to replace the parameter args =args to args=arg_dict

self.deco_executor = decoder.bind(ctx=mx.cpu(), args=args, aux_states=auxs)

replace with the following parameter:

self.deco_executor = decoder.bind(ctx=mx.cpu(), args=arg_dict, aux_states=auxs)

h. Replace all mx.gpu with mx.cpu in the decoder.bind() function.

i. To save the resulting model as a .json file, add the following code to the end of the generate() function in the Maker class:

self.vgg_executor._symbol.save('{}-symbol.json'.format('vgg19'))self.deco_executor._symbol.save('{}-symbol.json'.format('nst_vgg19'))

j. Save and close the make_image.py file.

5. Run the sample with a decoder model according to the instructions from the README.md file in the cloned repository.

For example, to run the sample with the pre-trained decoder weights from the models folder and output shape, use the following code:

import make_imagemaker = make_image.Maker('models/13', (1024, 768))maker.generate('output.jpg', '../images/tubingen.jpg')

Where ‘models/13’ string is composed of the following sub-strings:

  • ‘models/’ — path to the folder that contains .nd files with pre-trained styles weights and ‘13’
  • Decoder prefix: the repository contains a default decoder, which is the 13_decoder.

You can choose any style from collection of pre-trained weights. The generate() function generates nst_vgg19-symbol.json and vgg19-symbol.json files for the specified shape. In the code, it is [1024 x 768] for a 4:3 ratio, and you can specify another, for example, [224,224] for a square ratio.

6. Run the Model Optimizer to generate an Intermediate Representation (IR) which will help us to inference the end to end application in a local CPU:

a. Create a new directory. For example:

mkdir nst_model

b. Copy the initial and generated model files to the created directory. For example, to copy the pre-trained decoder weights from the models folder to the nst_model directory, run the following commands:

cp nst_vgg19-symbol.json nst_modelcp vgg19-symbol.json nst_modelcp ../vgg19.params nst_model/vgg19-0000.paramscp models/13_decoder_args.nd nst_modelcp models/13_decoder_auxs.nd nst_model

NOTE: Make sure that all the .params and .json files are in the same directory as the .nd files. Otherwise, the conversion process fails.

Run the Model Optimizer for MXNet. Use the — nd_prefix_name option to specify the decoder prefix and — input_shape to specify input shapes in [N,C,W,H] order. For example:

python3 mo.py — input_symbol <path/to/nst_model>/nst_vgg19-symbol.json — framework mxnet — output

Step by step implementation of VGG 19 and the style transfer

Here we will go through steps of implementation of how style transfer works and the inference part is done using the python plugin of OpenVINO™ Toolkit.

Style transfer, where the style of one image is transferred to another as if recreated using the same style, is performed using a pre-trained network and running it using the OpenVINO™ toolkit Inference Engine.

The inference will be executed using the local CPU.

The pre-trained model to be used for object detection is the “VGG19” which has already been converted to the necessary Intermediate Representation (IR) files needed by the Inference Engine which we have done above.

Inference Engine API Integration Flow

Using the Inference Engine API follows the basic steps outlined briefly below. The API objects and functions will be seen later in the sample code.

1. Load the plugin

2. Read the model IR

3. Load the model into the plugin

4. Prepare the input

5. Run Inference

Input Preprocessing

Often, the dimensions of the input data do not match the required dimensions of the input data for the inference model. A common example is an input video frame. Before the image may be input to the inference model, the input must be preprocessed to match the required dimensions for the inference model as well as channels (i.e. colors) and batch size (number of images present). The basic step performed is to resize the frame from the source dimensions to match the required dimensions of the inference model’s input, reorganizing any dimensions as needed.

This tutorial and the many samples in the OpenVINO™ toolkit use OpenCV to perform resizing of input data. The basic steps performed using OpenCV are:

1. Resize image dimensions form image to model’s input W x H: frame = cv2.resize(image, (w, h))

2. Change data layout from (H x W x C) to (C x H x W) frame = frame.transpose((2, 0, 1))

3. Reshape to match input dimensions frame = frame.reshape((n, c, h, w))

Importing Python Modules

Here we begin by importing all the Python modules that will be used by the sample code:

  • os — Operating system specific module (used for file name parsing)
  • cv2 — OpenCV module(used for Computer Vision)
  • time — time tracking module (used for measuring execution time)
  • numpy — n-dimensional array manipulation
  • openvino.inference_engine — import the IENetwork and IEPlugin objects
  • matplotlib — import pyplot used for displaying output images
import osimport cv2import timeimport numpy as npfrom openvino.inference_engine import IENetwork, IEPluginfrom matplotlib import pyplot as plt%matplotlib inlineprint("Imported Python modules.")

Configuration Parameters

Here we will create and set the following configuration parameters used by the sample:

  • model_xml — Path to the .xml IR file of the trained model to use for inference
  • model_bin — Path to the .bin IR file of the trained model to use for inference (derived from model_xml)
  • input_path — Path to the input image
  • cpu_extension_path — Path to a shared library with CPU extension kernels for custom layers not already included in the plugin
  • device — Specify the target device to infer on, CPU, GPU, FPGA, or MYRIAD is acceptable, however, the device must be present. For this tutorial, we use “CPU” which is known to be present.
  • mean_val_r — Mean value of red channel for mean value addition in postprocessing
  • mean_val_g — Mean value of green channel for mean value addition in postprocessing
  • mean_val_b — Mean value of blue channel for mean value addition in postprocessing

We will set all parameters here only once except for input_path which we will change later to point to different images and video.

# model IR filesmodel_xml = "./nst_vgg19/nst_vgg19-symbol.xml"model_bin = "./nst_vgg19/nst_vgg19-symbol.bin"# input image fileinput_path = "tubingen.jpg"# CPU extension library to usecpu_extension_path = "libcpu_extension.so"# device to usedevice = "CPU"# RGB mean values to add to resultsmean_val_r = 123.68mean_val_g = 116.779mean_val_b = 103.939print("Configuration parameters settings:""\n\tmodel_xml=", model_xml,"\n\tmodel_bin=", model_bin,"\n\tinput_path=", input_path,"\n\tcpu_extension_path=", cpu_extension_path,"\n\tdevice=", device,"\n\tmean_val_r=", mean_val_r,"\n\tmean_val_g=", mean_val_g,"\n\tmean_val_b=", mean_val_b )

Create a Plugin for Device

Here we create a plugin object for the specified device using IEPlugin().
If the plugin is for a CPU device, and the cpu_extensions_path variable has been set, we load the extensions library.

# create plugin for deviceplugin = IEPlugin(device=device)print("A plugin object has been created for device [", plugin.device, "]\n")# if the device is CPU and a path to an extension library is set, load the extension libraryif cpu_extension_path and 'CPU' in device:plugin.add_cpu_extension(cpu_extension_path)print("CPU extension [", cpu_extension_path, "] has been loaded")

Create Network from Model IR files

Here we create an IENetwork object and load the model’s IR files into it. After loading the model, we check to make sure that all the model’s layers are supported by the plugin we will use. We also check to make sure that the model’s input and output are as expected for later when we run inference.

# load network from IR filesnet = IENetwork.from_ir(model=model_xml, weights=model_bin)print("Loaded model IR files [",model_bin,"] and [", model_xml, "]\n")# check to make sure that the plugin has support for all layers in the loaded modelsupported_layers = plugin.get_supported_layers(net)not_supported_layers = [l for l in net.layers.keys() if l not in supported_layers]if len(not_supported_layers) != 0:print("ERROR: Following layers are not supported by the plugin for specified device {}:\n {}".format(plugin.device, ', '.join(not_supported_layers)))if not cpu_extension_path:print("Please try specifying the cpu extensions library path by setting the 'cpu_extension_path' variable")assert 0 == 1, "ERROR: Missing support for all layers in th emodel, cannot continue."

Load the Model into the Device Plugin

Here we load the model network into the plugin so that we may run inference. exec_net will be used later to actually run inference. After loading, we store the names of the input (input_blob) and output (output_blob) blobs to use when accessing the input and output blobs of the model. Lastly, we store the model’s input dimensions into the following variables:

  • n = input batch size
  • c = number of input channels (here 1 channel per colour R, G, and B)
  • h = input height
  • w = input width
# load the model into the pluginexec_net = plugin.load(network=net, num_requests=2)# store name of input and output blobsinput_blob = next(iter(net.inputs))output_blob = next(iter(net.outputs))# read the input's dimensions: n=batch size, c=number of channels, h=height, w=widthn, c, h, w = net.inputs[input_blob].shapeprint("Loaded model into plugin.  Model input dimensions: n=",n,", c=",c,", h=",h,", w=",w)

Prepare Input Image

Here we read and then prepare the input image by resizing and re-arranging its dimensions according to the model’s input dimensions. We define functions the loadInputImage() and resizeInputImage() for the operations so that we may reuse them again later in the tutorial.

# define function to load an input imagedef loadInputImage(input_path, verbose = True):# globals to store input width and heightglobal input_w, input_h# use OpenCV to load the input imagecap = cv2.VideoCapture(input_path)# store input width and heightinput_w = cap.get(3)input_h = cap.get(4)if verbose: print("Loaded input image [",input_path,"], resolution=", input_w, "w x ",input_h,"h")# load the input imageret, image = cap.read()del capreturn image# define function for resizing input imagedef resizeInputImage(image, verbose = True):# resize image dimensions form image to model's input w x hin_frame = cv2.resize(image, (w, h))# Change data layout from HWC to CHWin_frame = in_frame.transpose((2, 0, 1))# reshape to input dimensionsin_frame = in_frame.reshape((n, c, h, w))if verbose: print("Resized input image from {} to {}".format(image.shape[:-1], (h, w)))return in_frame# load imageimage = loadInputImage(input_path)# resize the input imagein_frame = resizeInputImage(image)# display input imageprint("Input image:")plt.axis("off")plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

Run Inference

Now that we have the input image in the correct format for the model, we now run inference on the input image.

# save start timeinf_start = time.time()# run inferenceres = exec_net.infer(inputs={input_blob: in_frame})# calculate time from start until nowinf_time = time.time() - inf_startprint("Inference complete, run time: {:.3f} ms".format(inf_time * 1000))

Process and Display Results

Now we process the inference results by sorting all the possible classes by the probability assigned during inference, then selecting and displaying the top report_top_n items. We define the function processAndDisplayResults() so that we may use it again later in the tutorial to process results.

# create function to process inference resultsdef processResults(res):# get outputresult = res[output_blob][0]# Change layout from CxHxW to HxWxCresult = np.swapaxes(result, 0, 2)result = np.swapaxes(result, 0, 1)# add RGB mean values toresult = result[::] + (mean_val_r, mean_val_g, mean_val_b)# Clip RGB values to [0, 255] rangeresult[result < 0] = 0result[result > 255] = 255# Matplotlub expects normilized image with pixel RGB values in range [0,1].result = result / 255return result# create function to process and display inference resultsdef processAndDisplayResults(res, orig_input_image, orig_input_path, verbose = True):# display original input imageplt.figure()plt.axis("off")im_to_show = cv2.cvtColor(orig_input_image, cv2.COLOR_BGR2RGB)plt.imshow(im_to_show)# get outputresult = processResults(res)# Show styled imageif verbose: print("Results for input image: {}".format(orig_input_path))plt.figure()plt.axis("off")plt.imshow(result)processAndDisplayResults(res, image, input_path)print("Processed and displayed inference output results.")

Exercise #1: Run a Different Image

Now that we have seen all the steps, let us run them again on a different image. We also define inferImage() to combine the input processing, inference, and processing and displaying results so that we may use it again later in the tutorial.

# define function to prepare input and run inferencedef inferImage(image, verbose = True):# prepare inputin_frame = resizeInputImage(image, verbose)​# run inferenceres = exec_net.infer(inputs={input_blob: in_frame})return res# define function to prepare input, run inference, and process inference resultsdef inferAndDisplayImage(image, verbose = True):# run inferenceres = inferImage(image)# process inference resultsprocessAndDisplayResults(res, image, input_path, verbose)# set path to differnt input imageinput_path="starry_night.jpg"# load input imageimage = loadInputImage(input_path)# infer image and display resultsinferAndDisplayImage(image)

Exercise #2: (Optional) Run Your Own Image

Here you may run any image you would like by setting the input_path variable which may be set to a local file or URL. A sample URL is provided as an example

# input_path may be set to a local file or URLinput_path = "https://github.com/dmlc/web-data/raw/master/mxnet/neural-style/input/IMG_4343.jpg"# load input imageimage = loadInputImage(input_path)# infer image and display resultsinferAndDisplayImage(image, input_path)
Results with Taj Mahal

Conclusion:

In this article, you have seen an end to end application using OpenVINO™ toolkit and covered an entire Neural Style transfer process.

As you started with MXNET framework through which the model file was generated using VGG 19.

From the model file generated we converted the file to *.xml and *.bin format. These files were converted using the Model Optimizer method from Intel OpenVINO™ toolkit and are used for inferencing purpose for Intel-based architectures.

The article showed us the way how you can apply for the style transfer easily to historical monuments and regenerated the image with the effects.

--

--