How to get background blur using Deep Learning?

Prathmesh Patil
Analytics Vidhya
Published in
9 min readNov 15, 2020

The background blur effect which is also known as “bokeh” is a well known effect which is used by many of us mainly for close up shots it adds a sense of depth to our image as we only concentrate on the a particular part of our image.
To get this kind of effect we generally use some photo editing applications like Photoshop, Gimp, Picsart, Snapseed, etc. As time progressed we made significant improvements in terms of computer vision and image processing using deep learning. So there arises a question can we get this bokeh effect using deep learning? the answer is yes we can, in the following blog I will walk you through the complete implementation along with the code and some theoretical aspects for better understanding.


  1. How it is achieved?
  2. Deep Learning model that we will be using
  3. ReLu6
  4. Implementation
  5. Credits
  6. Conclusion

1. How it is achieved?

Basically the whole objective is based on the advance implementation of a convolution neural networks called as image segmentation.
We all are familiar with CNNs which are used for classification of images, based on the number of input labels we have. But suppose we have to identify a particular object in an given image for this we have to use the concept of object detection and then image segmentation.


This is the classic example of image classification and detection where if there are multiple classes of object are available in a single image then we go for object detection, the given image goes under goes region of interest pooling once we find the coordinates of multiple objects in an image, after that these objects are classified and bounding boxes are drawn around every identified object.

Once all this is done then we procced to next step of segmentation of image because the bounding boxes only show where the object is located inside the image but it does not give any information about the shape of the object.
In simple terms image segmentation is the process dividing the image pixels into small parts or segments and group them based on similar information or attributes and assigning them a label. This helps to capture very small details on the pixel level. Segmentation creates a pixel wise mask for every identified object in image, please have a look at the picture below. The main aim is to train the neural network in such a way that it can given as pixel wise mask of the image. To understand this in more detail then click here.

Source -

2. The deep learning model that we will be using:

Once we are clear with image segmentation now lets have a look at the model that we will be using, which is mobilenetv2 which is trained on coco dataset.
The mobilenetv2 is a light weight model which can be used on low powered devices like mobile phones, this is the second version of mobilenetv1 model which came out in 2017.
Now let us briefly understand the model architecture.

Source- towardsdatascience

The v2 is also based on v1 so it inherits the same depth wise separable convolution which consist of depth wise convolution and point wise convolution which reduces the cost of convolution operation.

The depth wise convolution simply means that, suppose an image contains 3 channels then the each kernel will iterate over the each channel respectively.

for example you have an image of (10 x 10 x 3) and 3 filters of (3 x 3 x 1) then the resultant output will be (8 x 8 x 1) of one such filter after which outputs of all the other filters are stacked up together and forming feature map consisting of (8 x 8 x 3).
In point wise convolution we take the previous feature map of (8 x 8 x 3) and apply a filter of size (1 x 1 x 3). If 15 such filters are applied then final result will be stacked up to form a feature map of (8 x 8 x 15).

The mobilenetv2 has some improvements over v1 like implementation of inverted residuals, linear bottlenecks and the residual connection.


The v2 comes total with 3 convolution layers, in which the first one is the expansion layer and the second one is the depth wise layer and the third one is projection layer.

Expansion Layer: this layer takes the input data and expands the lower dimension data into higher dimensions so that the important information is preserved and gives it’s output to the depth wise layer, the factor of expansion is a hyperparameter which can be tuned depending upon number of trials.

Depth-wise Layer: this layer receives the input from the expansion layer and perform the depth wise and point wise convolution, gives the feature map to the projection layer.

Projection Layer: this layer is responsible to shrink down the dimension of data so that only limited amount of the data is passed further in the network, at this point of time the input dimension matches the output dimension and it is also known as “bottleneck layer”.


Residual connection is a new addition to the network, which is based on the ResNet and helps to control the flow of gradients through the network. It is used when the dimension of input data is same as the output data.

3. ReLu6:


Each layer in this network comes with ReLu6 rather than ReLu along with Batch Normalization. The ReLu6 limits the range of values between 0 to 6, which is a linear activation function. It also helps to hold precision to the right of the decimal point by limiting 3 bits of information to the left of decimal point.

The output of the last layer i.e. projection layer do not have an activation function as it’s output is a low dimension data, according to the researchers adding any non-linear function to last layer may cause loss of useful information.

4. Implementation

Now as we have a brief idea about image segmentation and the mobilenetv2 which we will be using lets go with the implementation part.

Prerequisite:- the code uses tensorflow version 1.x so you need to have version 1.x for it to work, if you are using 2.x then it will get errors while executing so I would suggest to simply use Google Collab for executing it.

I will be going through quick walkthrough of all the important aspects of code and the complete implementation with line by line explanation will given in my notebook on GitHub.

For demonstration we will be using the below image of size (596 x 900)

Step 1: Downloading the pre-trained model.

As the model is pre-trained only need to download it and pass our image to it,
and it will return the segmented image.

MODEL_NAME = 'mobilenetv2_coco_voctrainaug'  # @param ['mobilenetv2_coco_voctrainaug', 'mobilenetv2_coco_voctrainval', 'xception_coco_voctrainaug', 'xception_coco_voctrainval']_DOWNLOAD_URL_PREFIX = ''
_TARBALL_NAME = 'deeplab_model.tar.gz'
model_dir = tempfile.mkdtemp()
download_path = os.path.join(model_dir, _TARBALL_NAME)
print('downloading model, this might take a while...')
urllib.request.urlretrieve(_DOWNLOAD_URL_PREFIX + _MODEL_URLS[MODEL_NAME],
print('download completed! loading DeepLab model...')
MODEL = DeepLabModel(download_path)
print('model loaded successfully!')

Step 2: Function for visualizing the segmented image taken from input.

def run_visualization():
"""Inferences DeepLab model and visualizes result."""
original_im =
except IOError:
print('Cannot retrieve image. Please check url: ' + url)
print('running deeplab on image')
resized_im, seg_map =
vis_segmentation(resized_im, seg_map)
return resized_im, seg_map

2.1: Calling the above function with image shown previously.

IMAGE_NAME = 'download2.jpg'
resized_im, seg_map = run_visualization()
Output after segmentation.

2.2: Now we will read and convert the input image into numpy array.

numpy_image = np.array(resized_im)

Step 3: Sepereation of background and foreground.

In this step we will create a copy of image then separate the background and foreground from segmented image by replacing the values by 0 in background and keeping 255 where the mask has been created. Here 7 denotes the car class.

person_not_person_mapping = deepcopy(numpy_image)
person_not_person_mapping[seg_map != 7] = 0
person_not_person_mapping[seg_map == 7] = 255

3.1: Visualizing the separated masked image


As we can clearly see that the background is replaced with black color and the masked car is turned to white color as explained in previous step, also we didn’t loose any significant information by replacing the values.

3.2: Resizing the masked image equal to original image.

After the process of segmentation the size of the image is reduced and in our case its reduced to dimension of (300 x 500) so we will resized image to its original dimension i.e. (900 x 596).

orig_imginal =
orig_imginal = np.array(orig_imginal)
mapping_resized = cv2.resize(person_not_person_mapping,

3.3: Binarization.

Due to the resizing the image generated values ranging from 0,1,2…255, to limit the values again in between 0–255 we have to binarize the image using the Otsu’s Binarization technique.
In brief, Otsu’s Binarization is an adaptive way of finding the threshold values
of a gray scaled image. It goes through all possible threshold values from the range of 0-255 and finds the best possible threshold for the given image.
Internally its based on some statistical concepts like variance, to find out the classes based on selected threshold.
Once an optimal threshold is selected then pixel value greater than the threshold will be considered as white pixel and values lesser than threshold are considered as black pixel. To know more about check this article out.

gray = cv2.cvtColor(mapping_resized, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray,(15,15),0)
ret3,thresholded_img = cv2.threshold(blurred,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)

The output will remain the same there wont be any much difference compared to previous one. And this is gray scaled image which of 2 dimensions.

Step 4: Adding colors to threshold image.

Now we are done with binarization it’s time to convert the gray scaled image to a RGB image.

mapping = cv2.cvtColor(thresholded_img, cv2.COLOR_GRAY2RGB)

In the output, after applying the colors to the image the it contains two unique pixel values i.e. 0,255 using this map we will apply background blur in upcoming steps.

4.1: Applying blur to original image.

Moving up next, let’s apply blur effect to our original input image.

blurred_original_image = cv2.GaussianBlur(orig_imginal,

4.2: Obtaining the background blur.

This is the step where when we actually blur to the background of input image with a simple line of code snippet.

layered_image = np.where(mapping != (0,0,0), 

In above snippet what we are doing is simply filling out blurred image where the pixel intensity values are 0 i.e. filling all black pixels and filling the original image where the pixel intensity values are 255 which is white pixels, Based on segmentation map.
This results in nice looking bokeh effect show below.


4.3: Finally saving the image.

Now only thing left to do is to save the bokeh image, and we are done!

im_rgb = cv2.cvtColor(layered_image, cv2.COLOR_BGR2RGB)
cv2.imwrite("Potrait_Image.jpg", im_rgb)

5. Credits:

This article was written with reference to Bhavesh Bhatt’s video regarding the same on YouTube so hats off to him, also all the above code snippets given is only the important ones, the complete code with line by line comments is available on my GitHub page.

6. Conclusion:

To summarize this is just one of the things that Deep Learning can do, well there many such thing that can achieved using it. As we make progress the models are getting better and better from classification to generating deep fakes as, all of us are looking forwards towards it!.



Analytics Vidhya
Analytics Vidhya

Published in Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem

Prathmesh Patil
Prathmesh Patil

Written by Prathmesh Patil

ML enthusiast, Data Science, Python developer, Google Cloud & Serverless. LinkedIn:

No responses yet