Convolutional Neural Networks — Part 1: Edge Detection

Brighton Nkomo
The Startup
Published in
8 min readOct 1, 2020

--

This is the first part of my blog post series on convolutional neural networks. Here are the subsequent parts of this series:

While doing an online convolutional neural network (CNN) course from the deep learning specialization on Coursera by Andrew Ng, I noticed that there are no slides, there are no lecture notes given and there is no prescribed textbook (besides, a deep learning textbook would be convoluted, no pun intended, for some of the deep learning newbies). This has inspired me to make a series of blog posts that aim to clarify and summarize what was discussed in the CNN course 4/5 in the specialization.

So what is a convolutional neural network? A convolutional neural networks (CNN or ConvNet) is a type of deep learning neural network, usually applied to analyzing visual imagery whether it’s detecting cats, faces or trucks in an image. CNNs are applied in image and video recognition, recommender systems, image classification, medical image analysis, natural language processing, and financial time series. In particular, CNNs have promising applications and are thought to be the future in self-driving cars, robots that can mimic human behavior, aides to human genome mapping projects, predicting earthquakes and natural disasters, and they could maybe even make self-diagnoses of medical problems possible.

1. Vertical Edge Detection

Notice how the light intensity differs at the edges of the cyber-truck and the environment. In particular, at the top of the truck, the truck has brighter pixels and the environment has darker pixels. Whereas at the bottom, the floor has brighter pixels than the rocker panel of the truck. So where there are drastic changes in brightness, there are edges of the object (vertical edges, horizontal edges, 45 degree, 78 degree, 62 degree angled edges etc). The purpose of detecting sharp changes in image brightness is to capture important events and changes in properties of the world. For now lets focus on detecting vertical edges.

Suppose that you want to detect vertical edges on the cyber-truck image. How would you detect vertical edges? To simplify this problem, it would be best to consider a simpler problem.

FIGURE 1: Vertical edge detection. From the deep learning specialization CNN course on Coursera by Andrew Ng and deeplearning.ai .

The grid on the left in figure 1 above represents a gray scale image with a 6 by 6 resolution. The numbers are the intensity values. Yes, intensity is a measurable quantity in physics and, in this context of light, intensity is just a measure of brightness. There is an actual mathematical formula to calculate light intensity, basically, the the brighter the image the higher the intensity values on the left 6 by 6 grid/matrix. Since this is a grayscale image, this is just a 6 by 6 by 1 matrix rather than 6 by 6 by 3 when it’s a color image with 3 separate channels (the red, green and blue channels).

In order to detect edges or lets say vertical edges in his image, what you can do is construct a 3 by 3 matrix and in the terminology of convolutional neural networks, this is going to be called a filter (sometimes research papers will call this a kernel instead of a filter but I am going to use the filter terminology in this blog post).

And what you are going to do is take the 6 by 6 image and convolve it (the convolution operation is denoted by this asterisk) and convolve it with the 3 by 3 filter.

FIGURE 2: Computing the first entry of the 4 by 4 output.

The output of convoluting the 6 by 6 matrix with a 3 by 3 matrix will be a 4 by 4 matrix. The way you compute this 4 by 4 output is as follows, to compute the first elements, the upper left element of this 4 by 4 matrix, what you are going to do is take the 3 by 3 filter and paste it on top of the 3 by 3 region of your original input image. Notice the convolution matrix entries (1, 1, 1, 0, 0, 0, -1, -1, -1) are written in the top right corners of the blue region and circled in green.

And what you should do is take the element wise product of the entries in the blue 3 by 3 region and corresponding the filter matrix entries which are circles in green. Then add them all up and you should get -5. This -5 value will be the first entry of the 4 by 4 output as shown in figure 2 on the right.

Next, to figure out what is this second entry is, you are going to take the blue square and shift it one step to the right like so and you are going to do the same element wise product and then addition. You do the same for the third , fourth entries and so on as illustrated by GIF 1 below.

GIF 1: Convolution of a 6 by 6 image by a 3 by 3 filter to get the output entries.

So why is this doing vertical edge detection? Lets look at another example.

FIGURE 3: Simplified example

To illustrate vertical edge detection, we are going to use a simplified image in figure 3 on the left. The 10s, give you brighter pixel intensive values and the right half gives you darker pixel intensive values (Andrew Ng used a shade of gray to denote zeros, although maybe it could also be drawn as black). When you convolve the 6 by 6 input matrix with the 3 by 3 filter and so this 3 by 3 filter can be visualized as follows, where it is lighter, brighter pixels on the left and then this mid tone zeroes in the middle and then darker on the right (the small image with 3 shades below the filter in figure 4 below) and what you get is this matrix on the right, as shown in figure 4 below.

FIGURE 4: convolution of a 6 by 6 image with a 3 by 3.

Now, if you plot this right most matrix’s image it will look like that where there is this lighter region right in the middle and that corresponds to having detected this vertical edge down the middle of your 6 by 6 image, as illustrated by GIF 2 below.

GIF 2: The 30s give the lighter region right in the middle and that corresponds to having detected the vertical edge (the white and gray image on the left).

One intuition to take away from vertical edge detection is that a vertical edge is the middle in the 6 by 6 image is really where there could be bright pixels on the left and darker pixels on the right and that is why the vertical edge detector thinks its a vertical edge over there. The convolution operation gives you a convenient way to specify how to find these vertical edges in an image.

Unsurprisingly, there are some questions that arise from the ideas mentioned in this section so far. The remaining sections answer some of the key questions.

2. First Question: What happens in an image where the colors are flipped, where it is darker on the left and brighter on the right?

So the 10s are now on the right half of the image and the 0s on the left. If you convolve it with the same edge detection filter, you end up with negative 30s, instead of 30 down the middle, and you can plot that as a picture that maybe looks like that in GIF 3 below. So because the shade of the transitions is reversed, the 30s now gets reversed as well. And the negative 30s shows that this is a dark to light rather than a light to dark transition.

GIF 3: Convolving the vertical edge filter with an image where the colors are flipped, where it is darker on the left and brighter on the right.

3. Second Question: What About Horizontal Edges ?

FIGURE 5: Vertical and Horizontal Edge Filters.

It should be unsurprising that the three by three filter on the right in figure 5 will allow you to detect horizontal edges. So similarly, a horizontal edge would be a three by three region where the pixels are relatively bright on top and relatively dark in the bottom row.

4. Third Question: What are the best set of numbers to use for the vertical and horizontal features?

You can either use

  1. The Sobel filter, shown in figure 5 below. And the advantage of this Sobel filter is it puts a little bit more weight to the central row, the central pixel, and this makes it maybe a little bit more robust.
  2. The Scharr filter as well.

Note: These 2 filters in figure 5 are representing vertical edge filters. You can flip them 2 by 90 degrees to get the horizontal edge filters.

FIGURE 5: The Sobel and Sharr filter. For vertical edges.

With the rise of deep learning, the right numbers can be learned and one of the things we learned is that when you really want to detect edges in some complicated image, maybe you don’t need to have computer vision researchers handpick these nine numbers. Maybe you can just learn them and treat the nine numbers of this matrix as parameters, which you can then learn using back propagation. The goal is to learn nine parameters so that when you take the image, the six by six image, and convolve it with your three by three filter, that this gives you a good edge detector, figure 6 shows the values to be learned (w1, w2, w3, … w9).

FIGURE 6

And rather than just vertical and horizontal edges, maybe it can learn to detect edges that are at 45 degrees or 70 degrees or 73 degrees or at whatever orientation it chooses.

And so by just letting all of these numbers be parameters and learning them automatically from data, we find that neural networks can actually learn low level features, can learn features such as edges, even more robustly than computer vision researchers are generally able to code up these things by hand.

Thank you for your attention. Clap and share if you liked this post. Feel free to comment if you have feedback or questions.

--

--