Image Convolution From Scratch

Published in

Analytics Vidhya

8 min readDec 1, 2019

Mathematical operation on two functions that produces a third function representing how the shape of one is modified by the other.

The idea behind convolution is to study how one function when mapped with another function brings out a new modified function. When the same is applied to signals it is called convolution 1d, to images — convolution 2d, and to videos — convolution 3d. This article focuses mainly on convolution 2d.

Overview

We can think of an image as a 2Dimensional matrix containing pixel color values in the range of 0 to 255. Mathematically we can manipulate this matrix by applying various matrix operations.

We will be using OpenCV (a flexible library for image processing), NumPy for matrix and array operations, and Matplotlib for plotting the images.

Example

We use imread() object to read the image. By default cv2.imread() reads the image in the format of Blue, Green, and Red. We need to convert it into Red, Blue, and Green format, that makes sense.

# Image url ==> https://upload.wikimedia.org/wikipedia/en/7/7d/Lenna_%28test_image%29.png
# Download the image and save it as 'lena.png' in your directoryimport cv2
import numpy as np
import math
import matplotlib.pyplot as pltsrc = cv2.imread('lena.png')
img = cv2.cvtColor(src, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(8, 5))
plt.axis("off")
plt.imshow(img)

Image to matrix

We take matrix values of a GRAY scale image where each pixel contains values in between 0 and 255. The problem with the color image is that each pixel value is a combination of 3 values probably the form of [R, G, B] or [B, G, R] which can make the computation complicated. So, to keep things simple we take a GRAY scale image.

img = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY)
plt.imsave('lena_gray.png', img, cmap='gray')
gsrc = cv2.imread('lena_gray.png', 0)
img_mat = []
for i in range(0, gsrc.shape[0]):
    row = []
    for j in range(0, gsrc.shape[1]):
        pixel = gsrc.item(i, j)
        row.append(pixel)
    img_mat.append(row)

If we view the matrix, we see that it contains pixel values in the range of 0 to 255.

>>> img_mat = np.array(img_mat)
>>> print(img_mat)
[[142 149 145 ...  94  94  94]
 [145 149 142 ...  97  97  97]
 [149 138 149 ...  97  94  94]
 ...
 [113 117 121 ...  32  32  32]
 [113 113 117 ...  28  28  32]
 [100 113 113 ...  28  32  36]]
>>> print(img_mat.shape)
(512, 512)

Let’s transpose the above matrix and see if the image gets transposed.

>>> img_tran_mat = img_mat.T
>>> print(img_tran_mat)
[[142 145 149 ... 113 113 100]
 [149 149 138 ... 117 113 113]
 [145 142 149 ... 121 117 113]
 ...
 [ 94  97  97 ...  32  28  28]
 [ 94  97  94 ...  32  28  32]
 [ 94  97  94 ...  32  32  36]]
>>> print(img_tran_mat.shape)
(512, 512)

Do you see the difference between the original matrix and the transposed matrix? Now save the matrix as an image using imwrite() method — which reads the matrix and numbers and writes as an image.

>>> cv2.imwrite('lena_gray_tran.png', img_tran_mat)

Let’s see the difference and get to know.

fig = plt.figure(figsize=(16, 25))orig = cv2.imread('lena_gray.png')
tran = cv2.imread('lena_gray_tran.png')ax1 = fig.add_subplot(2,2,1)
ax1.axis("off")
ax1.title.set_text('Original')
ax1.imshow(orig)ax2 = fig.add_subplot(2,2,2)
ax2.axis("off")
ax2.title.set_text('Transposed')
ax2.imshow(tran)

We get an image that is totally transposed and it’s because of the transposed matrix that we performed earlier.

Code for Image Convolution from scratch

For convolution, we require a separate kernel filter which is operated to the entire image resulting in a completely modified image.

g(x, y) = w * f(x, y); w = kernel, g = result and f = input

In image processing; kernel, convolution matrix, or mask is a small matrix used for blurring, sharpening, embossing, edge detection, and more. This is accomplished by doing a convolution between a kernel and an image.

Steps for image convolution

Convert the image into grayscale and obtain the matrix.
Obtain a giant matrix containing sub-matrices of size kernel from the original matrix.
Perform a convolution by doing element-wise multiplication between the kernel and each sub-matrix and sum the result into a single integer or floating value. By doing so, obtain a transformed or filtered matrix.
Convert the transformed or filtered matrix into an image.
End.

1st Step

def convert_image_matrix(img_name):
    src = cv2.imread(img_name)
    img = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY)
    name, ext = img_name.split('.')
    plt.imsave(str(name + '_gray.' + ext), img, cmap='gray')
    
    gray_img = cv2.imread(str(name + '_gray.' + ext), 0)
    gimg_shape = gray_img.shape
    gimg_mat = []
    for i in range(0, gimg_shape[0]):
        row = []
        for j in range(0, gimg_shape[1]):
            pixel = gray_img.item(i, j)
            row.append(pixel)
        gimg_mat.append(row)
    gimg_mat = np.array(gimg_mat)
    return gimg_mat

The above function returns a 2Dimentional NumPy array containing the pixel values.

2nd Step

def get_sub_matrices(orig_matrix, kernel_size):
    width = len(orig_matrix[0])
    height = len(orig_matrix)
    if kernel_size[0] == kernel_size[1]:
        if kernel_size[0] > 2:
            orig_matrix = np.pad(orig_matrix, kernel_size[0] - 2, mode='constant')
        else: pass
    else: pass
    
    giant_matrix = []
    for i in range(0, height - kernel_size[1] + 1):
        for j in range(0, width - kernel_size[0] + 1):
            giant_matrix.append(
                [
                    [orig_matrix[col][row] for row in range(j, j + kernel_size[0])]
                    for col in range(i, i + kernel_size[1])
                ]
            )
    img_sampling = np.array(giant_matrix)
    return img_sampling

The above function returns a giant matrix containing sub-matrices of the size kernel which will again be used later. The resultant matrix can also be called a sampled matrix.

In the function, the method np.pad() is used in order to preserve the data which are present along the edges by adding 0s, and thus while applying convolution there will not be any data lost.

3rd Step

def get_transformed_matrix(matrix_sampling, kernel_filter):
    transform_mat = []
    for each_mat in matrix_sampling:
        transform_mat.append(
            np.sum(np.multiply(each_mat, kernel_filter))
        )
    reshape_val = int(math.sqrt(matrix_sampling.shape[0]))
    transform_mat = np.array(transform_mat).reshape(reshape_val, reshape_val)
    return transform_mat

The giant matrix or the sampled matrix is passed as the argument along with the kernel filter in the above function to perform the convolution.

4th Step

def original_VS_convoluted(img_name, kernel_name, convoluted_matrix):
    name, ext = img_name.split('.')
    cv2.imwrite(str(name + '_' + kernel_name + '.' + ext), convoluted_matrix)
    orig = cv2.imread(str(name + '_gray.' + ext))
    conv = cv2.imread(str(name + '_' + kernel_name + '.' + ext))
    
    fig = plt.figure(figsize=(16, 25))
    ax1 = fig.add_subplot(2,2,1)
    ax1.axis("off")
    ax1.title.set_text('Original')
    ax1.imshow(orig)
    ax2 = fig.add_subplot(2,2,2)
    ax2.axis("off")
    ax2.title.set_text(str(kernel_name).title())
    ax2.imshow(conv)
    return True

The above function is a plotting function that compares the original image with the transformed image after convolution.

Types of Convolutions

We have several types of convolution operations that can be applied to an image. Few of them are

Identity operation: Function that returns the same value which is used as an argument.

f(x) = x; kernel = [[0, 0, 0], [0, 1, 0], [0, 0, 0]]

>>> img_name = 'lena.png'
>>> img_mat = convert_image_matrix(img_name)
>>> identity_kernel = np.array([[0,0,0],[0,1,0],[0,0,0]])
>>> img_sampling = get_sub_matrices(img_mat, identity_kernel.shape)
>>> transform_mat = get_transformed_matrix(img_sampling, identity_kernel)
>>> original_VS_convoluted(img_name,'identity', transform_mat)

From the above result, it is clear that there is no difference between the original and the transformed image.

Edge detection operation: Function includes a variety of mathematical methods that aim at identifying points in a digital image for which the image brightness changes. Canny’s edge detector technique works effectively.

kernel = [[-1, -1, -1], [-1, 8, -1], [-1, -1, -1]]

>>> img_name = 'lena.png'
>>> img_mat = convert_image_matrix(img_name)
>>> identity_kernel = np.array([[-1,-1,-1],[-1,8,-1],[-1,-1,-1]])
>>> img_sampling = get_sub_matrices(img_mat, identity_kernel.shape)
>>> transform_mat = get_transformed_matrix(img_sampling, identity_kernel)
>>> original_VS_convoluted(img_name,'canny_edge', transform_mat)

From the above result, we can say that the edges are being highlighted by white and the rest all is black. The algorithm was able to identify the edges of specific details like eyes and hair. However, there are other kinds of edge detecting algorithms.

Sharpen operation: Function increases the contrast between bright and dark regions of the image.

kernel = [[0, -1, 0], [-1, 5, -1], [0, -1, 0]]

>>> img_name = 'lena.png'
>>> img_mat = convert_image_matrix(img_name)
>>> identity_kernel = np.array([[0,-1,0],[-1,5,-1],[0,-1,0]])
>>> img_sampling = get_sub_matrices(img_mat, identity_kernel.shape)
>>> transform_mat = get_transformed_matrix(img_sampling, identity_kernel)
>>> original_VS_convoluted(img_name,'sharpen', transform_mat)

From the above result, it is clear that the transformed image persists some sort of noise and we also see that the brighter areas got even brighter and also the darker areas got even darker.

Box blur operation: Function is a kind of linear filter in which each pixel in the resulting image has a value equal to the average value of its neighboring pixels in the input image.

kernel = (1 / 9) * [[1, 1, 1], [1, 1, 1], [1, 1, 1]]

>>> img_name = 'lena.png'
>>> img_mat = convert_image_matrix(img_name)
>>> identity_kernel = (1/9)*np.array([[1,1,1],[1,1,1],[1,1,1]])
>>> img_sampling = get_sub_matrices(img_mat, identity_kernel.shape)
>>> transform_mat = get_transformed_matrix(img_sampling, identity_kernel)
>>> original_VS_convoluted(img_name,'box_blur', transform_mat)

From the result, we notice that the transformed image is slightly smooth compared with the original image. As we now know that when the kernel is operated with the sub-matrices the sum result is immediately averaged and thus leaving it with a normalized value.

Gaussian blur operation: Function is also known as Gaussian smoothing function typically used to reduce the noise from the image.

kernel = (1 / 16) * [[1, 2, 1], [2, 4, 2], [1, 2, 1]]

>>> img_name = 'lena.png'
>>> img_mat = convert_image_matrix(img_name)
>>> identity_kernel = (1/16)*np.array([[1,2,1],[2,4,2],[1,2,1]])
>>> img_sampling = get_sub_matrices(img_mat, identity_kernel.shape)
>>> transform_mat = get_transformed_matrix(img_sampling, identity_kernel)
>>> original_VS_convoluted(img_name,'gaussian3', transform_mat)

The gaussian algorithm works well to reduce the image noise and represents the image in a more beautiful way. The transformed image actually seems smoother than the original.

My custom convolution by random tweaking in the kernel matrix.

Conclusion

Convolution is a simple mathematical operation that is fundamental to many common image processing operators.
It has various applications in the field of mathematics such as probability and statistics, linear systems, etc.

PS: Although convolution is the concept that deals with image manipulation, it was good to be able to implement and understand the mathematics of it. In the next article, I will explain why I am using the default kernels for achieving a certain transformation.

If you liked it, you can buy coffee for me from here.