Basic CNN: A Significant Architecture in Deep Learning

Mohd Usama
4 min readMar 3, 2023

--

Basic CNN (Source)

Convolutional Neural Networks (CNNs) are a type of artificial neural network that is commonly used in image and video processing tasks. They are designed to automatically and adaptively learn spatial hierarchies of features from the images.

CNNs consist of several layers, each of which performs a specific task in the overall process. The main layers of a CNN are:

  1. Input layer: This layer takes in the input image, which is a matrix of pixel values.

The input layer simply takes in the input image, which is represented as a matrix of pixel values. Let x be the input image, then:

x = [x1, x2, …, xn]

where n is the total number of pixels in the image.

Convolutional layer: This layer applies a set of filters (also called kernels) to the input image. Each filter extracts a specific feature from the image by sliding over the image and performing element-wise multiplication and addition operations.

Let w be the filter, b be the bias term and f be the activation function, then the output feature map y can be calculated as follows:

y = f(w * x + b)

where * denotes the convolution operation.

After this, we apply a nonlinear activation function to the output of the convolutional layer. This helps to introduce nonlinearity into the model and allows it to capture complex patterns in the input data.

Some common activation functions include ReLU, sigmoid and tanh. Let g be the activation function, then the output of the activation layer can be written as:

z = g(y)

Pooling layer: This layer reduces the spatial dimensions of the output from the previous layer by down sampling the feature maps. This helps to make the model more efficient and reduces the number of parameters that need to be learned.

The most common pooling operation is max pooling. Let M be the pooling operator and s be the pooling size, then the output of the pooling layer can be written as:

p = M(z, s)

where M(z, s) selects the maximum value in each pooling window of size s.

Fully connected layer: This layer takes the output from the previous layer and applies a set of weights to it to produce a final output. This layer is used to map the learned features to the desired output.

Let W be the weight matrix, b be the bias term, and g be the activation function, then the output of the fully connected layer can be calculated as:

y = g(W * x + b)

where * denotes the matrix multiplication operation.

These mathematical equations capture the fundamental operations that are performed in a convolutional neural network. By stacking these layers together, a deep neural network can be created that is capable of learning complex features from input images.

CNN Training Procedure:

The training procedure of CNN involves iteratively updating the weights of the network to minimize the loss function while ensuring that the network generalizes well to new, unseen data. The detail training procedure involves the following steps:

  1. Dataset Preparation: The first step in training a CNN is to prepare the dataset. The dataset is typically divided into two parts: the training set and the validation set. The training set is used to train the CNN, while the validation set is used to evaluate its performance.
  2. Initialization: The weights of the CNN are initialized randomly. This ensures that CNN does not have any prior knowledge about the dataset and allows it to learn from scratch.
  3. Forward Propagation: During forward propagation, the input image is passed through the CNN, and the output of each layer is calculated. This output is then passed to the next layer as input.
  4. Loss Calculation: The output of the CNN is compared to the ground truth label, and the loss is calculated using a loss function, such as mean squared error or cross-entropy loss.
  5. Backward Propagation: During backward propagation, the gradient of the loss function with respect to the weights of the CNN is calculated using the chain rule of differentiation. This gradient is then used to update the weights of the CNN using an optimization algorithm, such as stochastic gradient descent or Adam.
  6. Training Loop: Steps 3–5 are repeated for a fixed number of iterations or until convergence is reached. During each iteration, a batch of images is randomly selected from the training set, and the weights of the CNN are updated based on the gradient of the loss function calculated using the current batch.
  7. Validation: After each epoch, the performance of CNN is evaluated on the validation set. This helps to monitor the progress of the training and prevent overfitting.
  8. Testing: Once the training is complete, the performance of CNN is evaluated on a separate test set that was not used during training. This gives an estimate of CNN’s performance on new, unseen data.

Applications of CNN:

CNNs have a wide range of applications in various fields including:

  1. Image and Video Classification
  2. Object Recognition and Detection
  3. Semantic Segmentation
  4. Natural Language Processing
  5. Style Transfer
  6. Medical Diagnosis
  7. Robotics

--

--

Mohd Usama

ML Researcher @Umea University, Sweden| Write about AI, ML, DL, and others.