Let There Be Light- Intro In Computer Vision

Peter Ma
Peter Ma
Dec 28, 2018 · 5 min read

Artificial intelligence came about in the 50s-60s. The idea of a machine built like a human-triggered something within us. And because of its complexity, its black box appearance, and odd familiarity, it wasn’t a coincidence that the public saw this as a threat.

TV show in the 60’s called Dr.Smith Vs The Robot

And today, although the fear hasn’t disappeared, the field of AI underwent a massive revamp. Back in the 1990s-2000s the AI community took a big step forward. You see in the pursuit of re-creating anything close to a human, we first need to mimic the most basic abilities: and that’s perception. How do you teach computers to see?

Why’s It So Difficult?

Traditional SLR Camera Records can record a standard 1080p resolution

This is a camera, it helps computers see and each frame of video has a resolution of 1920x 1080 meaning there are 2 million pixels in each frame, each pixel represents a color and together the mosaic of pixels forms a picture.o us this could mean an apple or a tree but to a computer, it’s just a map of numbers essentially.

What if you take a picture of the same object but with different light? The computer sees it as two different objects, but in reality, it’s the same. How do you get the computer to understand the true relationship between images and its classification?

Introducing The Convolutional Neural Network.

Data scientists realized that we need to first simplify the computer’s job and we do so by extracting the valuable information from pictures. When detecting images a super important feature are edges. Edges define the shape of objects, and we detect edges through a filter. This filter would be applied to a snippet of the image to outputs all the useful information. The filter runs across the entire picture detecting the edges. The detection would filter through the entire image leaving behind an empty image with only its vertical lines “highlighted”.

Notice the edge is highlighted everything else is blacked-out since its redundant information

In technical terms, a filter is a 3x3 matrix that is multiplied with a snippet of the original image. The output is then stitched together forming a fully filtered image.

Example of a Convolution

After the crucial data is extracted, the image can be compressed to half its original size while retaining all the key info with a function called max pooling. Max pooling takes the largest value out of the snippets, ignoring the excess data.

This process is called convolution and it helps decrease the complexity of the task at hand.

In most cases, a layer of convolution is applied multiple times. Then, the image is flattened and each pixel is feed into a fully connected neural network. All you need to know about a fully connected artificial neural network (ANN) is that the neural network is a series of nodes connecting to other nodes with different weights that activate specific areas of the model depending on the input. The neural net replicates a brain where different ideas spark activation in different areas of the brain. The fully connected layer crunches the numbers and spits out a classification. The model is then trained with backpropagation allowing the model to develop and mature the more training you run on the model.

Building A CNN In Code

Here are all the packages and libraries I need to import for the project pretty simple. Then I load my data and converted Image to a matrix/ array of pixel values. Then I label my dataset used to train my data set. I also reshuffle the data so the computer doesn’t cheat.

Here I set up by Convolutional Neural Network (CNN). First I had a convolutional layer with 32 filters, and a stride of 3 and activation after it. Followed by one more convolutional layer before I flatten the image to a vector and feeding it into my fully connected layer which is my softmax. Backpropagation is applied to minimize loss referred to as “model.compile”. Then we run and train the model.

Finally, we get the computer code to output the images we wish to classify. For example sakes, I loaded the image “rider-20.jpg” and it outputs the correct classification!

Notice the Vertical/Horizontal Edges Detected!

This is the line of code helps us visualize what the computer sees with each filter applied. Notice the vertical edges detected. Since in the first layer we applied 32 filters here are the 32 results! The blacked out areas of the picture would be ignored by the model and thus the model focuses on what's really important. Check out my GitHub code here! Also, check out my Youtube video where I talk all about CNN’s!


So what does all of this really mean to us? Currently, computer vision serves as the basis of a lot of self-driving cars. Other implications are cancer detection through medical imaging. This is really just the first step, as the more we progress through the field of AI the more profound the discoveries are. No matter the innovation, computer vision serves as the foundation to build upon.

Key Takeaways

  • Computers can’t understand the things they “see”
  • CNN’s help computers classify images and understand what they are looking at
  • A series of layers help simplify the task

Before You Go

Connect with me on LinkedIn
Feel free to reach out to e-mail me with any questions: peterxiangyuanma@gmail.com
And clap the article if you enjoyed 😊

Data Driven Investor

from confusion to clarity not insanity

Peter Ma

Written by

Peter Ma

A.I Enthusiast | Passionate about Astronomy

Data Driven Investor

from confusion to clarity not insanity

More From Medium

More from Data Driven Investor

More from Data Driven Investor

More from Data Driven Investor

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade