Looking through the eyes of a Computer:

Maria L Rodriguez
9 min readAug 10, 2021

--

Part I: Tensors, filters and convolution

* Photo by Alessandro Ranica on Unsplash

It is a blessing to have numerous software libraries and packages made by masters in the field. These enable us to build on the present knowledge and technology without having to start from scratch every time.

However, it is also essential to know the concepts on which the technology was built. By having this fundamental knowledge, one can better follow technology development and judge appropriate and promising uses.

In this blog, we will see some foundations of Computer Vision. It follows some part of Lesson 13 Convolutions in the Fast.ai Fastbook.

We will use this outline:

Part I. Convolution Components

A. Set-up

B. Illustrate Concepts

B.1. Tensors

B.2. Convolution

B.2.a. Top Edge

B.2.b. Filter

B.2.c. One convolution step

B.2.d. Feature Map

B.2.e. Stride

B.2.f. Develop the Feature Map

C. Application

D. Visualize

Part II. Convolution Application

E. Other Edges

F. Curved Shapes

G. Aggregating Kernels

H. Application

So, if you want to know how a computer sees an image, open your Notebook and let’s start!

Part I. Convolution Components: Tensors, filters and dot product

A. Set-up on Colab.

!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()
from fastbook import *
#!pip install fastai -U
import fastai
from fastai.vision.all import *
#follow the prompt for signing-in and authorization
!pip install jmd_imagescraper
from jmd_imagescraper.core import *
from pathlib import Path

If you need detailed instructions on setting-up, please refer to Step 1 a-b here.

B. Illustration of Concepts

B.1. Tensors

Tensors are essentially sets of numbers that pay particular attention to dimensionality. This facilitates organization of data, instead of being arbitrary collections of numbers. This is why ‘shape’ is important in deep learning — it gives the user an idea of how data is organized with regards to the number of dimensions and the size of each dimension.

The number of dimensions reflect the tensor rank. A rank-0 tensor is a scalar, for example [100]. A rank-1 tensor is a vector, for example [0, 1, 2]. A rank-2 tensor is frequently a matrix, for example a 2x2 matrix: [[0, 1], [2, 3]].

A tensor with a shape [6, 3, 128, 128] is 4-dimensional (ie, rank-4). The first dimension (tensor[0]) has a size of 6. To illustrate, this might mean a collection of 6 images, with each image having 3 colour channels, and with a height and width size of 128 x 128 pixels.

The steps below will illustrate how an image is converted to a tensor.

B.1.a. Gather some simple images.

root = Path().cwd()/'shapes_basic_sq_circ'search = duckduckgo_searchsearch(root, 'square','square shape ', max_results = 5, img_layout=ImgLayout.All)
search(root, 'circle','circle ', max_results = 5, img_layout=ImgLayout.All)
# visual check and cleaning, if needed
from jmd_imagescraper.imagecleaner import *
display_image_cleaner(root2)

If you need an intro/ refresher for this step, refer to Steps 1c, 2 and 3 here.

If you are running into errors, I would suggest to use ImgLayout.Square, to make the shape more friendly to use.

B.1.b. Identify one simple shape to dissect.

path = Path('/content/shapes_basic_sq_circ/')
path.ls() # to show you the subfolders
(path/'square').ls() # using a subfolder to identify a file
from PIL import Image, ImageOps
square = Image.open('/content/shapes_basic_sq_circ_a/square/002_b9f6c377.jpg')
show_image(square);
* note that this image has a white frame surrounding the green centre

A coloured image is composed of 3 channels: red, green and blue. We will convert the images to grayscale which will enable us to focus on 1 channel for now.

matplotlib.rc('image', cmap='Greys')from PIL import Image, ImageOps
square = Image.open('/content/shapes_basic_sq_circ_a/square/002_b9f6c377.jpg')
square = ImageOps.grayscale(square)
show_image(square);

The dark (green) portion in the coloured image now appears light. And the light (white) portion in the coloured image now looks dark. Grayscale does not specify a specific colour, but rather a brightness or luminescence (‘L’) or luminance.

B.1.c. Seeing the numbers

i. Convert the image to tensor

square_t = tensor(square)

ii. Check the shape

square_t.shape

The square image, now a tensor, has a shape of [613, 474] which correspond to 613 pixel rows and 474 pixel columns.

iii. Check the content.

Let’s the what can be found in one of the middle rows.

square_t[250]  # one row

This long tensor represents a small linear portion in our square image above. This is how a computer sees an image.

Grayscale luminescence can be quantified between 0 to 255 (or 0 to 1). 0 corresponds to absence of luminescence, and 255 corresponds to full luminescence. In the tensor above, the values near the right and left edges are in the 200’s (light/ the white portion in the original image). The middle values are in the 70’s (relatively dark/ the green portion in the original image). We will be focusing on the intersection of the obviously light and obviously dark portions, that is, the edges.

B.2. Convolution

Convolution is a systematic way by which an image is processed one portion at a time. We will discuss the components of the convolution (Steps B.2.a to f), so that we can understand the convolution code in Step C.

B.2.a. Identifying Edges

Focus on the top portion of the image to appreciate the border.

df = pd.DataFrame(square_t)
df_top = df.iloc[30:50, 40:]
df_top.style.set_properties().background_gradient('gist_heat')
* The top-most numbers are the column names, and the left-most are the row names.

In this dataframe, colours were utilized to help visualize the border, they are arbitrary and does not reflect the luminescence. I chose the ‘gist_heat’ because the colour scheme almost correlates with the numerical representation: 0 as dark and 255 as light.

The border is a strip of high values jumping to low values. However, it is indistinct. To better define the border or edge feature, a filter is used.

B.2.b. Filters

A filter is designed by the user based on his/her/its idea on how to best recognize the existence of a feature. It is usually a 3x3 matrix, with the values arranged such that a specific feature or contour is made to stand out.

For our purposes, we want to isolate one edge from the other. Imagining horizontal layers of numbers, we can create a filter for the top edge as follows:

top_edge = tensor( [ [1,1,1] ,
[0,0,0],
[-1,-1,-1] ] ).float() #

A filter is also called a weight or a kernel.

B.2.c. Convolution step

The filter is applied to the image tensor one small window at a time. To illustrate:

i. Identify a window

Going back to the figure above, we see layering from rows 38–45. Let us choose a 3x3 window that gives us a good representation of high-mid-low values.

df_top = df.iloc[42:45, 100:103]
df_top.style.set_properties().background_gradient('gist_heat')

ii. Multiply this 3x3 window with the top_edge filter in an element-wise superimposed pattern (dot product).

square_t_top = square_t[42:45, 100:103]
square_t_top * top_edge

iii. Add the resulting tensors

(square_t_top * top_edge).sum()

The sum, 144, will be placed in a Feature Map.

B.2.d. Feature Map

The result of each convolution step is registered on its corresponding position on a feature map.

* used to illustrate, not to scale

B.2.e. Stride

The convolution process moves in a right-to-left top-to-bottom manner.

In the previous step, we used columns 100:103, the next convolution will be done one column to the right (this will reflect stride = 1). The stride enables the convolution to move along the image, and at the same time establish connectivity between the cells.

square_t_top_2 = square_t[42:45, 101:104] #
square_t_top_2 * top_edge
(square_t_top_2* top_edge).sum()

The sum is again 144, and the Feature Map is updated.

* not to scale

We can now appreciate that the values are forming a band that corresponds to the top edge of the square.

B.2.f. Developing the Feature Map

See the effect of the top_edge filter on the right portion of the image tensor where a vertical border was seen during initial visualization.

df_right = df.iloc[100:103, 434:437]
df_right.style.set_properties().background_gradient('gist_heat')
square_t_right = square_t[100:103, 434:437]square_t_right * top_edge
(square_t_right * top_edge).sum()

The sum will be zero.

For illustration, if we move one row down and do rows 101:104, using the same columns, we will also get a sum of zero. This gives us an idea that the filter that was tailored to detect a horizontal edge, will not detect a vertical edge.

* not to scale

C. Apply the Convolution.

Now that we are familiar with the components of the convolution, we will apply it to the image in a systematic manner.

1. Run the filter to each 3x3 window of the image tensor.

def apply_kernel(data, row, col, kernel):
return (data [ row-1 : row+2 , col-1 : col+2 ] * kernel).sum()
rng = range(1, 473) #
top_square = tensor([[apply_kernel(square_t, i, j, top_edge) for j in rng] for i in rng])

Reminder: Filters are also called weights or kernels. We will use the filter and kernel terms to familiarize ourselves with the jargon.

The range is based on the pixel size of the image.

The kernel performs the convolution systematically from the top left to the bottom right of the image tensor, in a left-to-right top-to-bottom sequence.

2. Check the shape of the Feature Map developed by the convolution.

top_square.shape

This will give us [472, 472], which is 2 column pixels less than the 474 image tensor columns.

The convolution process needs the filter to be of the same shape as the window. The 3x3 filter will start at the left-uppermost 3x3 portion and stride down to the right. This will leave the left-most 3x1 mini-window unprocessed. As the convolution reaches the right-most edge, it also cannot process the right-most 3x1 mini-window. Thus, the 2 pixels lost reflect the left-most and right-most unprocessed columns.

D. Visualize Results of the Convolution

  1. Feature map generated by the top_edge filter.

a. Looking at the top portion of the map when the top_edge filter was used.

top_square[30:50, 200:220]

We can see that after applying the filter, the edge is now more defined.

b. Looking at the right portion of the Feature Map where the top_edge filter was used.

top_square[100:110, 420:450]

We can see that a horizontally oriented filter will not detect a vertical band.

2. Visualize the edges formed by the top_edge filtering.

show_image(top_square);

The filter gave us a shadowing edge effect on the horizontal filters. It is unable to detect vertical features.

Summary for Part I:

Images are converted to tensors to enable organization of its numerical representations. Convolutions utilize filters designed to detect specific features, such as a horizontal edge.

Looking Forward:

In Part II, we will extend our skill in defining vertical and diagonal edges. We will then collect all the filters and apply them in an efficient manner. We will utilize more complex shapes to see how our designed filters perform.

Thank you for sharing Part I with me.

If you want to dig into the code, you can find it at the GitHub repo for Convolution .

See you in Part II: Edges, Curves and Complex Shapes.

:)

Maria

--

--