# Computer Vision-Basic Building blocks and step by step beginner guide for model building.

Computer vision is growing in popularity! It enables the computers to see, analyze and process objects in images /videos, in the same way of human does. Object detection are immensely used in robotics, text extraction, self-driving cars, theft detection, satellite analytics etc. In this Article, we’ll focus on the image basics , building blocks of Convolutional neural network and step by step approach of model building.

**Image Basics and Manipulations**

Pixels are the smallest element of digital image. It’s values ranges from 0 to 255, where 0 represents black and 1 indicate white.

Images have different channels like RGB(Red-Green-Blue), BGR(Blue-Green-Red) and Grey scale. Image Filtering is used to transfer images with operations like sharpening, smoothing and edge enhancement. Some of the image transformation techniques are rotate, mirror, identify, reflection, scaling.

In above piece of code, scikit-image read and shown in each RGB channel image.

**Feature Extraction from Images**

This can be solved either with open CV techniques like SIFT & HOG methods or with Convolution. Here we’ll focus on Convolution.

Convolution is the process of extracting the layers from an image with the use of **Kernel**. This process works similar to **Human** **Eye Retina, **it means retina see 2D image layer by layer. Convolution provide good accuracy but it is computationally high.

**Kernel**

kernel is a matrix that moves over the input data by the **Stride **value** **and** **performs the dot product with the sub-region of input data. It’s of different types like Identity, Edge detection and Sharpen, which are shown in Figure 3.

Convolution with different kernel are used for image transformations but Edge detection kernel is predominantly used to extract high-level features like edges from the image.

**Pooling**

Instead of applying multiple kernels, pooling applied to model. It will reduce the size of input data by half.

Pooling doesn’t change the depth of input but affect length and width.

**Benefits of pooling layer**:

- Reduce overfitting
- Increase efficiency by reducing computation of model.

**Types of pooling layers**:

**Max Pooling :**Maximum value are selected from each region of feature map.**Average Pooling:**Computes

**Padding**

Padding is the process of adding zeros to the input data symmetrically as shown in figure 5. For instance, if you’re training an autoencoder, the output size of the autoencoder resultant image should have same size as input. Here, padding come in to play. Padding add “extra space” in input data to avoids the loss of spatial dimensions.

**Fully Connected Layer**

What is Fully connected layer?

It is also known as “Feed Forward neural Network”. Fully Connected layers are those layers where all the inputs from one layer are connected to every activation unit of the next layer. It has three layers,

input layerrepresents the dimensions of input vector,hidden layertakes set of weighted inputs and process output with activation functions andoutput layerrepresents the output of neural network.

Output from the final Pooling or Convolutional Layer will be the input to fully connected layer . These outputs are flatten,* *that is to unroll all its values i*nto a *vector and then fed into the fully connected layer.

**Activation Function**

Activation function determine whether a neuron should be activated or not, by calculating the weighted sum . It also applies non-linearity to avoid the output layer to be a linear function, which is a polynomial of one degree.

**Types of Activation functions**

**1. Sigmoid Function**:

The sigmoid function used in the output layer of the deep learning models and is used for predicting probability-based outputs. The sigmoid function is represented as:

**2. Tanh Function:**

Tanh ranges between -1 to 1,it is a smoother, zero-centered function. The tanh function is represented as:

**3. Rectified Linear Unit (ReLU) Function:**

It is one of the popular activation function. Compared to other functions like the sigmoid and tanh, this function performs better. ReLU helps to solve the vanishing gradients problem by taking negative values when z<0, which allows to have an average output closer to 0. The ReLU function performs a threshold operation on each input element where all values less than zero are set to zero. The ReLU function represented as:

**4. SoftMax Function:**

SoftMax output ranges between 0 and 1 and with the sum of the probabilities equal to 1.This function will be used as final layer in multi-class classification problems. The SoftMax function is represented as follows:

**Convolution Neutral Network Architecture**

How to compute output shape of CNN ?

Output shape of CNN is calculated by the formulae

[(W−K+2P)/S]+1. WhereW= input Size, K =Kernel size, P=Padding, S=Stride.

Figure 6 Summary,

**Convolution layer**: Input size of given image is 28*28*1 where 1 represents channel of image & applied kernel size is 5*5 . Output size determined as (28–5)+1 = 24*24*n1(n1 channel applied).**Pooling layer:**Here, Max pooling layer is used . Input size =24*24*n1(n channel), Output size determined as 24/2=12*12*n1**Convolution layer**: Input size of given image is 12*12*n1(n channel) & applied kernel size is 5*5 . Output size determined as (12–5)+1 = 8*8*n2 (n2 channel applied).**Pooling layer:**Here, Max pooling layer is used . Input size =8*8*n2 (n2 channel applied), Output size determined as 8/2=4*4*n2- Flatten the outputs (i.e. 4*4*n2) and pass it to fully connected layer with activation function as ReLU and classifies the image.

**Problem solving with computer vision**

We start with MNIST Dataset. The MNIST dataset is one of the most common datasets used for image classification and it contains 60,000 training images and 10,000 testing images collected from American Census Bureau employees and American high school students. TensorFlow and Keras allow us to import and download the MNIST dataset directly.

Below code shows how to import data from MNIST dataset and how to shuffle, split the data as train and test set.

How to install TensorFlow?

!pip install tensorflow

Next, visualize data from both train and test set.

Before feeding the data to model, the data needs to be preprocessed as shown in Figure 9.

- The input data need to reshape because the model expects input in specific format (number of data's, height, width, data channels).
- Normalization is the required step of preprocessing/feature engineering in neural network because we prefer data in the range of [0,1]. It is attain by dividing data by 255 because the whole range of data is in [0,1]. Before normalization all values needs to be in float data type, So that we can get decimal points after division.

One-hot encoding converts class vector to binary class matrix. As we are dealing with multi-class classification dataset, so it need to converts the label to categorical as done in step 8.Now the data is ready to feed in to the model.

To start with model building, import Sequential model from keras.model and layers from keras.layers as shown in step 9.

- Define Sequential model
- Add one convolution layer with 32 filters, 3*3 kernel size , activation layer as ReLU and input size of 28*28*1 where 1 represent grey scale channel. output shape will be (28–3)+1 = 26*26*32
- Add one max pool layer. output shape will be (26/2) = 13*13*32
- Add flatten layer that flattens 2D arrays to 1D array before building the fully connected layers.
- Add Dense layer with number of neurons as 128 and activation function as ReLU.
- Add Dense layer with number of neurons as 10 (i.e. number of output class) and activation function as SoftMax. The final dense layer should contain number of output class as number of neurons.

Compile the model with categorical_crossentropy as loss function, metrics defined as accuracy and Adam is used as a optimizer.

Fit the model with x, y, batch size as 32, epochs as 5 (i.e. model train for 5 cycles) and validation split as 0.2 (i.e. 80:100)

Train and test accuracy are above 95% . The predicted output shown in the step 13.

# Resources

- TensorFlow: https://pypi.org/project/pyod/
- TensorFlow keras Model: https://www.tensorflow.org/api_docs/python/tf/keras/Model
- TensorFlow keras Layer: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer
- TensorFlow keras losses: https://www.tensorflow.org/api_docs/python/tf/keras/losses
- TensorFlow keras Optimizers: https://www.tensorflow.org/api_docs/python/tf/keras/optimizers

*I would love to hear some feedback on my first work. Thank you for your time!*